Glossary
What is HBM3?: rawcompute.in Glossary
HBM3 is the third generation of High Bandwidth Memory, a vertically stacked DRAM technology that delivers memory bandwidth exceeding 2 TB/s for data-centre GPUs and AI accelerators.
HBM3 (High Bandwidth Memory 3) is a type of memory designed for workloads that demand massive data throughput. Unlike traditional GDDR memory used in consumer GPUs, HBM3 stacks multiple DRAM dies vertically on top of a base logic die using through-silicon vias (TSVs). This stacking approach dramatically increases bandwidth while reducing power consumption per bit transferred. An HBM3 stack operating at 6.4 Gbps per pin achieves over 800 GB/s per stack, and a GPU like the NVIDIA H100 SXM5 uses five stacks to reach approximately 3.35 TB/s of aggregate bandwidth.
The JEDEC HBM3 standard supports up to 64 GB per stack and operates at significantly lower voltage (1.1 V) compared to GDDR6X (1.35 V). This power efficiency is critical in dense GPU server configurations where thermal and power budgets are tight. The successor, HBM3e, pushes per-pin speeds to 9.6 Gbps, as seen in the NVIDIA H200 with 4.8 TB/s bandwidth and 141 GB capacity.
Why it matters when buying hardware
When evaluating GPU servers for LLM training or large-batch inference, HBM3 bandwidth directly determines how fast model weights and activations move between memory and compute units. A GPU bottlenecked on memory bandwidth will leave its tensor cores idle regardless of raw TFLOPS. For Indian enterprises building AI training clusters, choosing HBM3-equipped GPUs (H100 SXM5 or later) over PCIe variants with lower bandwidth ensures you are not leaving performance on the table. Always compare memory bandwidth alongside TFLOPS when sizing your GPU fleet.