Glossary
What are CUDA Cores?: rawcompute.in Glossary
CUDA Cores are the general-purpose parallel processing units within NVIDIA GPUs that execute scalar floating-point and integer operations across thousands of parallel threads.
CUDA (Compute Unified Device Architecture) cores are the fundamental parallel processing units in every NVIDIA GPU. Each CUDA core can execute one floating-point or integer operation per clock cycle, and modern data-centre GPUs contain thousands of them. The A100 has 6,912 and the H100 has 16,896 CUDA cores. These cores handle general-purpose parallel computation including element-wise operations, activation functions, data preprocessing, and custom CUDA kernels that do not map to tensor core operations.
While CUDA cores are important for overall GPU versatility, they are not the primary performance driver for deep learning. The heavy lifting in training, large matrix multiplications, is handled by tensor cores. CUDA cores complement tensor cores by executing the non-matrix portions of the computation graph. In workloads like scientific simulation (CFD, molecular dynamics) or rendering, CUDA cores often play a larger role since these workloads involve more general-purpose computation.
Why it matters when buying hardware
Do not select GPUs based solely on CUDA core counts. A consumer GPU with more CUDA cores may perform worse than a data-centre GPU with fewer CUDA cores but superior tensor core performance, HBM bandwidth, and sustained thermal design. For AI workloads, prioritise tensor core throughput and memory bandwidth. For HPC or simulation workloads, CUDA core count and clock speed become more relevant. Rawcompute.in can advise on the right GPU based on your specific workload profile.