NVCC Options兼容性 (v1.4)
Option | Description | Status |
"-o, --output-file <file>" | Specify name and location of the output file. | ✅ |
"-objtemp, --objdir-as-tempdir" | Create all intermediate files in the same directory as the object file. | ✅ |
"-include, --pre-include <file,…>" | Specify header files that must be pre-included during preprocessing. | ✅ |
"-l, --library <library,…>" | Specify libraries to be used in the linking stage without the library file extension. | ✅ |
"-D, --define-macro <def,…>" | Define macros to be used during preprocessing. | ✅ |
"-U --undefine-macro <def,…>" | Undefine an existing macro during preprocessing or compilation. | ✅ |
"-I, --include-path <path,…>" | Specify include search paths. | ✅ |
"-isystem, --system-include <path,…>" | Specify system include search paths. | ✅ |
"-L, --library-path <path,…>" | Specify library search paths. | ✅ |
"-odir, --output-directory <directory>" | Specify the directory of the output file. | ✅ |
"-MF, --dependency-output <file>" | Specify the dependency output file. | ✅ |
"-MP, --generate-dependency-targets" | Add an empty target for each dependency. | ✅ |
"-ccbin, --compiler-bindir <directory>" | Specify the directory in which the default host compiler executable resides. | ✅ |
"-allow-unsupported-compiler, --allow-unsupported-compiler" | Disable nvcc check for supported host compiler versions. | ✅ |
"-arbin, --archiver-binary <executable>" | Specify the path of the archiver tool used create static librarie with --lib | ✅ |
"-cudart, --cudart <none|shared|static>" | Specify the type of CUDA runtime library to be used: no CUDA runtime library, shared/dynamic CUDA runtime library, or static CUDA runtime library. | ✅ |
"-cudadevrt, --cudadevrt <none|static>" | Specify the type of CUDA device runtime library to be used: no CUDA device runtime library, or static CUDA device runtime library. | ❌ |
"-ldir, --libdevice-directory <directory>" | Specify the directory that contains the libdevice library files. | ✅ |
"-target-dir, --target-directory <dir>" | Specify the subfolder name in the targets directory where the default include and library paths are located. | ✅ |
"-link, --link" | Specify the default behavior: compile and link all input files. | ✅ |
"-lib, --lib" | Compile all input files into object files, if necessary, and add the results to the specified library output file. | ✅ |
"-dlink, --device-link" | Link object files with relocatable device code and .ptx, .cubin, and .fatbin files into an object file with executable device code, which can be passed to the host linker. | ✅ |
"-dc, --device-c" | Compile each .c, .cc, .cpp, .cxx, and .cu input file into an object file that contains relocatable device code. It is equivalent to --relocatable-device-code=true --compile. | ✅ |
"-dw, --device-w" | Compile each .c, .cc, .cpp, .cxx, and .cu input file into an object file that contains executable device code. It is equivalent to --relocatable-device-code=false --compile. | ✅ |
"-cuda, --cuda" | Compile each .cu input file to a .cu.cpp.ii file. The option is not supported when --relocatable-device-code=true. | ✅ |
"-c, --compile" | Compile each .c, .cc, .cpp, .cxx, and .cu input file into an object file. | ✅ |
"-fatbin, --fatbin" | Compile all .cu, .ptx, and .cubin input files to device-only .fatbin files. nvcc discards the host code for each .cu input file with this option. | ✅ |
"-cubin, --cubin" | Compile all .cu and .ptx input files to device-only .cubin files. nvcc discards the host code for each .cu input file with this option. | ✅ |
"-ptx, --ptx" | Compile all .cu input files to device-only .ptx files. nvcc discards the host code for each .cu input file with this option. | ✅ |
"-E, --preprocess" | Preprocess all .c, .cc, .cpp, .cxx, and .cu input files. | ✅ |
"-M, --generate-dependencies" | Generate a dependency file that can be included in a Makefile for the .c, .cc, .cpp, .cxx, and .cu input file. | ✅ |
"-MM, --generate-nonsystem-dependencies" | Same as --generate-dependencies but skip header files found in system directories (Linux only). | ✅ |
"-MD, --generate-dependencies-with-compile" | This option cannot be specified together with -E. The dependency file name is computed as follows: If -MF is specified, then the specified file is used as the dependency file name. If -o is specified, the dependency file name is computed from the specified file name by replacing the suffix with '.d'. Otherwise, the dependency file name is computed by replacing the input file names's suffix with '.d'. | ✅ |
"-MMD, --generate-nonsystem-dependencies-with-compile" | Same as --generate-dependcies-with-compile, but skip header files found in system directories(Linux only). | ✅ |
"-run, --run" | Compile and link all input files into an executable, and executes it. | ✅ |
"-pg, --profile" | Instrument generated code/executable for use by gprof. | ❌ |
"-g, --debug" | Generate debug information for host code. | ✅ |
"-G, --device-debug" | Generate debug information for device code. | ✅ |
"-ewp, --extensible-whole-program" | Generate extensible whole program device code, which allows some calls to not be resolved until linking with libcudadevrt. | ❌ |
"-no-compress, --no-compress" | Do not compress device code in fatbinary. | ❌ |
"-lineinfo, --generate-line-info" | Generate line-number information for device code. | ✅ |
"-opt-info, --optimization-info <kind,…>" | Provide optimization reports for the specified kind of optimization. The following tags are supported: inline Emit remarks related to function inlining. Inlining pass may be invoked multiple times by the compiler and a function not inlined in an earlier pass may be inlined in a subsequent pass. | ❌ |
"-O, --optimize <level>" | Specify optimization level for host code. | ✅ |
"-dlto, --dlink-time-opt" | Perform link-time optimization of device code. The option '-lto' is also an alias to '-dlto'. | ✅ |
"-ftemplate-backtrace-limit, --ftemplate-backtrace-limit <limit>" | Set the maximum number of template instantiation notes for a single warning or error to limit. | ✅ |
"-ftemplate-depth, --ftemplate-depth <limit>" | Set the maximum instantiation depth for template classes to limit. | ✅ |
"-noeh, --no-exceptions" | Disable exception handling for host code. | ✅ |
"-shared, --shared" | Generate a shared library during linking. | ✅ |
"-x, --x <c|c++|cu>" | Explicitly specify the language for the input files, rather than letting the compiler choose a default based on the file name suffix. | ✅ |
"-std, --std <c++03|c++11|c++14|c++17>" | Select a particular C++ dialect. | ✅ |
"-nohdinitlist, --no-host-device-initializer-list" | Do not consider member functions of std::initializer_list as __host__ __device__ functions implicitly. | ✅ |
"-expt-relaxed-constexpr, --expt-relaxed-constexpr" | Experimental flag: Allow host code to invoke __device__constexpr functions, and device code to invoke __host__constexpr functions. | ✅ |
"-extended-lambda, --extended-lambda" "-expt-extended-lambda, --expt-extended-lambda" | Allow __host__, __device__ annotations in lambda declarations. | ✅ |
"-m, --machine <32|64>" "-m32, --m32" "-m64, --m64" | Specify 32-bit vs. 64-bit architecture. | ✅ |
"-hls, --host-linker-script <use-lcs|gen-lcs>" | Use the host linker script (GNU/Linux only) to enable support for certain CUDA specific requirements, while building executable files or shared libraries. | ✅ |
"-aug-hls, --augment-host-linker-scipt" | Enables generation of host linker script that augments an existing host linker script (GNU/Linux only). | ✅ |
"-dopt, --dopt <=on>" | Enable device code optimization. When specified along with -G, enables limited debug information generation for optimized device code (currently, only line number information). When -G is not specified,-dopt=on is implicit. | ✅ |
"-r, --host-relocatable-link" | When used in combination with -hls=gen-lcs, controls the behaviour of -hls=gen-lcs and setsit to generate host linker script that can be used in host relocatable link (ld -r linkage). | ✅ |
"-gen-opt-lto, --gen-opt-lto" | Run the optimizer passes before generating the LTO IR. | ✅ |
"-Xcompiler, --compiler-options <options,…>" | Specify options directly to the compiler/preprocessor. | ✅ |
"-Xlinker, --linker-options <options,…>" | Specify options directly to the host linker. | ✅ |
"-Xarchive, --archive-options <options,…>" | Specify options directly to the library manager. | ✅ |
"-Xptxas, --ptxas-options <options,…>" | Specify options directly to ptxas, the PTX optimizing assembler. | ✅ |
"-Xnvlink,--nvlink-options <options,…>" | Specify options directly to nvlink, the device linker. | ✅ |
"-forward-unknown-to-host-compiler, --forward-unknown-to-host-compiler" | Forward unknown options to the host compiler. | ✅ |
"-forward-unknown-to-host-linker, --forward-unknown-to-host-linker" | Forward unknown options to the host linker. | ✅ |
"-noprof, --dont-use-profile" | Do not use configurations from the nvcc.profile file for compilation. | ❌ |
"-t, --threads number" | Specify the maximum number of threads to be used to execute the compilation steps in parallel. This option can be used to improve the compilation speed when compiling for multiple architectures. | ✅ |
"-dryrun, --dryrun" | List the compilation sub-commands without executing them. | ✅ |
"-v, --verbose" | List the compilation sub-commands while executing them. | ✅ |
"-keep, --keep" "-save-temps, --save-temps" | Keep all intermediate files that are generated during internal compilation steps. | ✅ |
"-keep-dir, --keep-dir <directory>" | Keep all intermediate files that are generated during internal compilation steps in this directory. | ✅ |
"-clean, --clean-targets" | Delete all the non-temporary files that the same nvcc command would generate without this option. | ✅ |
"-run-args, --run-args <arguments,…>" | Specify command line arguments for the executable when used in conjunction with --run. | ✅ |
"-idp, --input-drive-prefix <prefix>" | Specify the input drive prefix. | ❌ |
"-ddp, --dependency-drive-prefix <prefix>" | Specify the dependency drive prefix. | ❌ |
"-dp, --drive-prefix <prefix>" | Specify the drive prefix. | ❌ |
"-MT, --dependency-target-name <target>" | Specify the target name of the generated rule when generating a dependency file. | ✅ |
"--no-align-double" | Specify that -malign-double should not be passed as a compiler argument on 32-bit platforms. | ❌ |
"-nodlink, --no-device-link" | Skip the device link step when linking object files. | ❌ |
"-allow-unsupported-compiler, --allow-unsupported-compiler" | Disable nvcc check for supported host compiler versions. | ✅ |
"-default-stream, --default-stream <legacy|null|per-thread>" | Specify the stream that CUDA commands from the compiled program will be sent to by default. | ✅ |
--gpu-architecture | Specify the name of the class of NVIDIA virtual GPU architecture for which the CUDA input files must be compiled. | ✅ |
"-arch, --gpu-architecture <arch|all|all-major>" | Specify the name of the class of NVIDIA virtual GPU architecture for which the CUDA input files must be compiled. | ✅ |
"-code, --gpu-code <code,…>" | Specify the name of the NVIDIA GPU to assemble and optimize PTX for. | ✅ |
"-gencode, --generate-code <specification>" | Provides a generalization for the above two. | ✅ |
"-rdc, --relocatable-device-code <true|false>" | Enable or disable the generation of relocatable device code. If disabled, executable device code is generated. Relocatable device code must be linked before it can be executed. | ✅ |
"-e, --entries <entry,…>" | Specify the global entry functions for which code must be generated. PTX generated for all entry functions, but only the selected entry functions are assembled. Entry function names for this option must be specified in the mangled name. | ✅ |
"-maxrregcount, --maxrregcount <amount>" | Specify the maximum amount of registers that GPU functions can use. | ✅ |
"-use_fast_math, --use_fast_math" | Make use of fast math library. --use_fast_math implies --ftz=true --prec-div=false --prec-sqrt=false --fmad=true. | ✅ |
"-ftz, --ftz <true|false>" | Control single-precision denormals support. --ftz=true flushes denormal values to zero and --ftz=false preserves denormal values. | ✅ |
"-prec-div, --prec-div <true|false>" | This option controls single-precision floating-point division and reciprocals. -prec-div=true enables the IEEE round-to-nearest mode and --prec-div=false enables the fast approximation mode. | ✅ |
"-prec-sqrt, --prec-sqrt <true|false>" | This option controls single-precision floating-point square root. --prec-sqrt=true enables the IEEE round-to-nearest mode and --prec-sqrt=false enables the fast approximation mode. | ✅ |
"-fmad, --fmad <true|false>" | This option enables (disables) the contraction of floating-point multiplies and adds/subtracts into floating-point multiply-add operations (FMAD, FFMA, or DFMA). | ✅ |
"-extra-device-vectorization, --extra-device-vectorization" | This option enables more aggressive device code vectorization. | ✅ |
"-astoolspatch, --compile-as-tools-patch" | Compile patch code for CUDA tools. Implies --keep-device-functions. May only be used in conjunction with --ptx or --cubin or --fatbin. Shall not be used in conjunction with -rdc=true or -ewp. | ✅ |
"-keep-device-functions, --keep-device-functions" | In whole program compilation mode, preserve user defined external linkage __device__ function definitions in generated PTX. | ✅ |
"-w, --disable-warnings" | Inhibit all warning messages. | ✅ |
"-src-in-ptx, --source-in-ptx" | Interleave source in PTX. | ✅ |
"-restrict, --restrict" | Assert that all kernel pointer parameters are restrict pointers. | ✅ |
"-Wno-deprecated-gpu-targets, --Wno-deprecated-gpu-targets" | Suppress warnings about deprecated GPU target architectures. | ✅ |
"-Wno-deprecated-declarations, --Wno-deprecated-declarations" | Suppress warning on use of a deprecated entity. | ✅ |
"-Wreorder, --Wreorder" | Generate warnings when member initializers are reordered. | ✅ |
"-Wdefault-stream-launch, --Wdefault-stream-launch" | Generate warning when an explicit stream argument is not provided in the <<<...>>> kernel launch syntax. | ✅ |
"-Wext-lambda-captures-this, --Wext-lambda-captures-this" | Generate warning when an extended lambda implicitly captures this. | ❌ |
"-Werror, --Werror <kind,…>" | Make warnings of the specified kinds into errors. The following is the list of warning kinds accepted by this option: all-warnings Treat all warnings as errors. cross-execution-space-call Be more strict about unsupported cross execution space calls. The compiler will generate an error instead of a warning for a call from a __host____device__ to a __host__ function. reorder Generate errors when member initializers are reordered. default-stream-launch Generate error when an explicit stream argument is not provided in the <<<...>>> kernel launch syntax. ext-lambda-captures-this Generate error when an extended lambda implicitly captures this. deprecated-declarations Generate error on use of a deprecated entity. | ✅ |
"-err-no, --display-error-number" | This option displays a diagnostic number for any message generated by the CUDA frontend compiler (note: not the host compiler). | ❌ |
"-no-err-no, --no-display-error-number" | This option disables the display of a diagnostic number for any message generated by the CUDA frontend compiler (note: not the host compiler). | ❌ |
"-diag-error, --diag-error <errNum,…>" | Emit error for specified diagnostic message(s) generated by the CUDA frontend compiler (note: does not affect diagnostics generated by the host compiler/preprocessor). | ❌ |
"-diag-suppress, --diag-suppress <errNum,…>" | Suppress specified diagnostic message(s) generated by the CUDA frontend compiler (note: does not affect diagnostics generated by the host compiler/preprocessor). | ❌ |
"-diag-warn, --diag-warn <errNum,…>" | Emit warning for specified diagnostic message(s) generated by the CUDA frontend compiler (note: does not affect diagnostics generated by the host compiler/preprocessor). | ❌ |
"-res-usage, --resource-usage" | Show resource usage such as registers and memory of the GPU code. | ✅ |
"-h, --help" | Print help information on this tool. | ✅ |
"-V, --version" | Print version information on this tool. | ✅ |
"-optf, --options-file <file,…>" | Include command line options from specified file. | ✅ |
"-time, --time <filename>" | Generate a comma separated value table with the time taken by each compilation phase, and append it at the end of the file given as the option argument. If the file is empty, the column headings are generated in the first row of the table. If the file name is '-', the timing data is generated in stdout. | ✅ |
"-qpp-config, --qpp-config <config>" | Specify the configuration ([[compiler/]version,][target]) when using q++ host compiler. The argument will be forwarded to the q++ compiler with its -V flag. | ❌ |
"-code-ls, --list-gpu-code" | List the gpu architectures (sm_XX) supported by the tool and exit. | ✅ |
"-arch-ls, --list-gpu-arch" | List the virtual device architectures (compute_XX) supported by the tool and exit. | ✅ |
--Wmissing-launch-bounds, -Wmissing-launch-bounds | Generate warning when a __global__ function does not have an explicit __launch_bounds__ annotation. | ✅ |
"-allow-expensive-optimizations, --allow-expensive-optimizations" | Enable (disable) to allow compiler to perform expensive optimizations using maximum available resources (memory and compile-time). If unspecified, default behavior is to enable this feature for optimization level >= O2. | ❌ |
"-c, --compile-only" | Generate relocatable object. | ✅ |
"-dlcm, --def-load-cache" | Default cache modifier on global/generic load. Default value: ca. | ✅ |
"-dscm, --def-store-cache" | Default cache modifier on global/generic store. | ✅ |
"-g, --device-debug" | Generate debug information for device code. | ✅ |
"-disable-optimizer-consts, --disable-optimizer-consts" | Disable use of optimizer constant bank. | ✅ |
"-e, --entry <entry,…>" | Specify the global entry functions for which code must be generated. | ✅ |
"-fmad, --fmad" | This option enables (disables) the contraction of floating-point multiplies and adds/subtracts into floating-point multiply-add operations (FMAD, FFMA, or DFMA). | ✅ |
"-flcm, --force-load-cache" | Force specified cache modifier on global/generic load. | ✅ |
"-fscm, --force-store-cache" | Force specified cache modifier on global/generic store. | ✅ |
"-lineinfo, --generate-line-info" | Generate line-number information for device code. | ✅ |
"-arch, --gpu-name <gpuname>" | Specify name of NVIDIA GPU to generate code for. This option also takes virtual compute architectures, in which case code generation is suppressed. This can be used for parsing only. Allowed values for this option: compute_35, compute_37, compute_50, compute_52, compute_53, compute_60, compute_61, compute_62, compute_70, compute_72, compute_75, compute_80, compute_86, compute_87, lto_35, lto_37, lto_50, lto_52, lto_53, lto_60, lto_61, lto_62, lto_70, lto_72, lto_75, lto_80, lto_86, lto_87, sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86, sm_87 | ✅ |
"-h, --help" | help | ✅ |
"-m, --machine" | Specify 32-bit vs. 64-bit architecture. | ✅ |
"-maxrregcount, --maxrregcount <amount>" | Specify the maximum amount of registers that GPU functions can use. | ✅ |
"-O, --opt-level <N>" | Specify optimization level. Default value: 3. | ✅ |
"-optf, --options-file <file,…>" | Include command line options from specified file. | ✅ |
"-preserve-relocs, --preserve-relocs" | This option will make ptxas to generate relocatable references for variables and preserve relocations generated for them in linked executable. | ❌ |
"-sp-bound-check, --sp-bound-check" | Generate stack-pointer bounds-checking code sequence. This option is turned on automatically when --device-debug or --opt-level=0 is specified. | ❌ |
"-v, --verbose" | Enable verbose mode which prints code generation statistics. | ✅ |
"-V, --version" | version. | ✅ |
"-Werror, --warning-as-error" | Make all warnings into errors. | ✅ |
"-warn-double-usage, --warn-on-double-precision-use" | Warning if double(s) are used in an instruction. | ✅ |
"-warn-lmem-usage, --warn-on-local-memory-usage" | Warning if local memory is used. | ✅ |
"-warn-spills, --warn-on-spills" | Warning if registers are spilled to local memory. | ✅ |
"-astoolspatch, --compile-as-tools-patch" | Compile patch code for CUDA tools. Shall not be used in conjunction with -Xptxas -c or -ewp. | ✅ |
"-pic, --position-independent-code" | Generate position-independent code. | ✅ |
"-maxntid, --maxntid" | Specify the maximum number of threads that a thread block can have. | ✅ |
"-minnctapersm, --minnctapersm" | Specify the minimum number of CTAs to be mapped to an SM. | ✅ |
"-w, --disable-warnings" | Inhibit all warning messages. | ✅ |
"-preserve-relocs, --preserve-relocs" | Preserve resolved relocations in linked executable. | ❌ |
"-v, --verbose" | Enable verbose mode which prints code generation statistics. | ✅ |
"-Werror, --warning-as-error" | Make all warnings into errors. | ✅ |
"-suppress-arch-warning, --suppress-arch-warning" | Suppress the warning that otherwise is printed when object does not contain code for target arch. | ❌ |
"-suppress-stack-size-warning, --suppress-stack-size-warning" | Suppress the warning that otherwise is printed when stack size cannot be determined. | ✅ |
"-dump-callgraph, --dump-callgraph" | Dump information about the callgraph and register usage. | ❌ |