NVIDIA L40S
High-throughput inference and rendering with 48 GB GDDR6X
VRAM
48 GB
Bandwidth
864 GB/s
FP16
733 TFLOPS
TDP
350W
Technical Specifications
| VRAM | 48 GB GDDR6X |
| Memory Bandwidth | 864 GB/s |
| FP16 Performance | 733 TFLOPS |
| BF16 Performance | 733 TFLOPS |
| FP32 Performance | 91.6 TFLOPS |
| INT8 Performance | 1,466 TOPS |
| TDP | 350W |
| Form Factor | PCIe Gen4 Dual-Slot |
| PCIe Interface | PCIe Gen4 x16 |
| Max GPUs per Server | Up to 8 |
Prices vary with supply and import costs. Contact for current India pricing.
Best For
Not Ideal For
- Large-scale training (GDDR6X bandwidth is lower than HBM)
- Multi-node training clusters requiring NVLink
Overview
The NVIDIA L40S is a versatile Ada Lovelace GPU that bridges inference, rendering, and generative AI workloads. With 48 GB of GDDR6X memory, it can handle large model inference (including LLaMA 70B with 4-bit quantization) while also providing hardware ray tracing via RT cores.
For inference, the L40S delivers strong price-to-performance, particularly for models in the 7B-70B parameter range. Its 48 GB VRAM exceeds the L4 (24 GB) and costs significantly less than an H100. For organizations deploying generative AI applications, the L40S is often the sweet spot.
The L40S also excels in professional visualization. VFX studios, architectural visualization firms, and animation pipelines benefit from its fourth-generation RT cores and support for NVIDIA Omniverse. If your workload mixes inference with rendering, the L40S eliminates the need for separate GPU pools.
Get NVIDIA L40S pricing for your setup
Tell us your workload and cluster size. We'll quote the complete solution including servers, networking, and colocation.