The Modern AI Stack:
Hardware:
NVIDIA H100/H200/B200 GPUs
Google TPU v5p/v6
Custom chips: Groq (LPU), Cerebras (WSE)
Apple Neural Engine, Qualcomm NPU
Training:
PyTorch (dominant), JAX (Google)
DeepSpeed, FSDP, Megatron-LM
Weights & Biases, MLflow (tracking)
Inference:
vLLM, TensorRT-LLM, llama.cpp
Ollama (local), Together AI (cloud)
PagedAttention, speculative decoding
Orchestration:
LangChain/LangGraph, LlamaIndex
MCP (tool integration)
Vector DBs: Pinecone, Weaviate, Chroma
Evaluation:
LM-Eval, HELM, Chatbot Arena (LMSYS)
Human preference (ELO ratings)
RAG: Retrieval-Augmented Generation
LLMs have knowledge cutoffs and hallucinate. RAG fixes this: embed your documents into a vector database, retrieve relevant chunks at query time, and include them in the prompt. The model generates answers grounded in your data. RAG is the most common production pattern for enterprise AI.
Context engineering > prompt engineering. The new skill is not writing better prompts — it’s building systems that provide the right context at the right time. Dynamic context discovery, semantic search, tool results, and memory management are what make AI agents effective.