Glossary

What is FP16?: rawcompute.in Glossary

FP16 is a 16-bit floating-point format (half precision) that offers a good balance between numerical range and computational throughput for deep learning training and inference.

FP16, or half-precision floating point, uses 16 bits to represent a number. 1 sign bit, 5 exponent bits, and 10 mantissa bits. In deep learning, FP16 has been the workhorse precision format since the introduction of mixed-precision training with NVIDIA Volta GPUs. Tensor cores perform matrix operations in FP16 (or BF16, which trades mantissa bits for exponent range) while accumulating results in FP32, preserving training stability. The NVIDIA A100 delivers 312 TFLOPS at FP16 with sparsity, while the H100 SXM5 delivers 1,979 TFLOPS at FP16.

BF16 (bfloat16) is a closely related format that uses 8 exponent bits and 7 mantissa bits, matching FP32’s dynamic range. Most modern training frameworks default to BF16 mixed precision because it avoids the loss-scaling hacks that IEEE FP16 requires. Both A100 and H100 GPUs support BF16 natively on their tensor cores.

Why it matters when buying hardware

FP16/BF16 throughput is the most universally comparable performance metric across GPU generations. Unlike FP8 (which requires Hopper or later), FP16 tensor core performance is available on Ampere, Hopper, and Ada GPUs. When comparing an A100 to an H100 for your workload, start with FP16 TFLOPS as the baseline. If your framework and model support FP8, the H100’s advantage widens further. Ensure that the VRAM capacity of your chosen GPU can hold your model at the target precision. A 70B-parameter model in FP16 requires roughly 140 GB of VRAM just for weights.

Need hardware advice?

Tell us your requirements and we'll recommend the right setup.

WhatsApp Us

Get a Quote

We respond within 4 business hours

Same-day responseNo spam, everGST invoice