What is InfiniBand?: rawcompute.in Glossary

InfiniBand is a high-throughput, low-latency network fabric designed for data-centre computing, widely used to interconnect GPU nodes in AI training clusters at 200-400 Gb/s per port.

InfiniBand is a switched-fabric network architecture originally developed for high-performance computing (HPC) and now the dominant interconnect for multi-node AI training clusters. The current generation, NDR (Next Data Rate), operates at 400 Gb/s per port with sub-microsecond latency. InfiniBand natively supports Remote Direct Memory Access (RDMA), allowing GPUs in different servers to read and write each other’s memory without involving the CPU, dramatically reducing communication overhead during distributed training.

NVIDIA (via its Mellanox acquisition) is the primary supplier of InfiniBand hardware, including ConnectX-7 host channel adapters (HCAs) and Quantum-2 switches. A typical AI training cluster uses a fat-tree or rail-optimised topology with InfiniBand switches to connect tens or hundreds of GPU nodes. NCCL, NVIDIA’s communication library, is optimised for InfiniBand and uses GPUDirect RDMA to move gradient data directly between GPUs across nodes.

Why it matters when buying hardware

If you are building a multi-node GPU training cluster, InfiniBand is the recommended network fabric. Ethernet-based alternatives (RoCE) can work for smaller clusters but typically add latency and require more tuning. The cost of InfiniBand switches and cables is significant. Budget for spine/leaf switches, optical transceivers, and cabling alongside your GPU servers. Rawcompute.in supplies complete InfiniBand fabric solutions including NVIDIA Quantum-2 switches and ConnectX-7 adapters, and can design the topology for your cluster size.

Need hardware advice?

Tell us your requirements and we'll recommend the right setup.

WhatsApp Us

What is InfiniBand?: rawcompute.in Glossary

Why it matters when buying hardware

Related Terms

Need hardware advice?