What is TFLOPS?: rawcompute.in Glossary

TFLOPS (Tera FLOPS) measures a processor's ability to perform one trillion floating-point operations per second, serving as the primary benchmark for GPU compute performance.

TFLOPS stands for Tera Floating-Point Operations Per Second. One trillion (10^12) floating-point calculations per second. It is the standard unit for comparing the raw computational throughput of GPUs and AI accelerators. The NVIDIA A100 delivers 312 TFLOPS at FP16, while the H100 SXM5 delivers 1,979 TFLOPS at FP16 and 3,958 TFLOPS at FP8. These figures represent peak theoretical throughput on tensor core operations.

It is important to distinguish between tensor core TFLOPS and CUDA core TFLOPS. Tensor core TFLOPS measures matrix operation throughput (the dominant operation in deep learning), while CUDA core TFLOPS measures general-purpose floating-point throughput. Additionally, TFLOPS figures vary by precision. The same GPU will report different numbers at FP64, FP32, FP16, and FP8. Always compare at the same precision level.

Why it matters when buying hardware

TFLOPS is useful for initial GPU comparison but should not be the sole decision criterion. Real-world training throughput depends on memory bandwidth, inter-GPU communication bandwidth, software optimisation, and workload characteristics. A GPU with higher TFLOPS but lower memory bandwidth may underperform a GPU with balanced compute and bandwidth. When evaluating quotes from rawcompute.in, ask for TFLOPS at your target precision (FP16, BF16, or FP8) and also consider memory bandwidth and VRAM capacity to get a complete performance picture.

Need hardware advice?

Tell us your requirements and we'll recommend the right setup.

WhatsApp Us

What is TFLOPS?: rawcompute.in Glossary

Why it matters when buying hardware

Related Terms

Need hardware advice?