Closed-source (API access):
GPT-4, Claude, Gemini
Best performance, no weight access
Pay per token, vendor lock-in
Open-weight:
Llama 3, Mistral, Qwen, DeepSeek
Download and run locally
Fine-tune for your domain
Full control, privacy
Fine-tuning approaches:
Full fine-tune: update all weights (expensive)
LoRA: update low-rank adapters (~1% params)
QLoRA: LoRA + 4-bit quantization (fits on 1 GPU)
Extending LLMs
RAG (Retrieval-Augmented Generation):
Query → search knowledge base → inject
relevant docs into prompt → generate
Reduces hallucination, adds fresh knowledge
Tool Use / Function Calling:
LLM decides when to call external tools
Calculator, search, database, APIs
Grounds the model in real data
Agents:
LLM + tools + planning + memory
Multi-step task execution
ReAct, AutoGPT, coding agents
The trend: LLMs are becoming platforms, not just models. RAG adds knowledge, tools add capabilities, agents add autonomy. The model is the reasoning engine; everything else is infrastructure around it.