Glossary
What is an LLM?: rawcompute.in Glossary
An LLM (Large Language Model) is a deep learning model with billions of parameters, trained on massive text corpora to understand and generate human language. LLMs require substantial GPU compute for both training and inference.
A Large Language Model (LLM) is a neural network, typically based on the transformer architecture, trained on vast amounts of text data to perform natural language understanding and generation tasks. Models like GPT-4, LLaMA 3, and Mistral range from 7 billion to over 1 trillion parameters. Training an LLM from scratch requires enormous compute resources: training a 70B-parameter model can consume 10^24 or more floating-point operations, requiring hundreds or thousands of GPUs running for weeks.
LLM inference (serving the model for predictions) also demands significant GPU resources, though less than training. A 70B-parameter model in FP16 requires at least 140 GB of VRAM just for weights, meaning it needs at least two 80 GB GPUs to serve. Quantisation (reducing to INT8 or INT4) can halve or quarter the memory requirement but may affect output quality. Batched inference throughput depends heavily on GPU memory bandwidth and KV-cache capacity.
Why it matters when buying hardware
LLMs are the primary driver of GPU server demand in India. Whether you are training a foundation model, fine-tuning an open-source model for your domain, or deploying inference endpoints, the hardware decisions directly impact cost and performance. For training, you need multi-GPU nodes with NVLink and multi-node connectivity via InfiniBand. For inference, the optimal setup depends on model size, target latency, and throughput requirements. Rawcompute.in helps Indian AI companies spec the right GPU infrastructure for their LLM workloads, from single-node fine-tuning setups to multi-rack training clusters.