model_training

LLM Fine-Tuning & Alignment — Deep Dive

From base models to production-ready, aligned LLMs. Each chapter: visual journey overview + under-the-hood deep dive.
Co-Created by Kiran Shirol and Claude
Core Stack Fine-Tuning LoRA / PEFT RLHF / DPO HuggingFace PyTorch
home Learning Portal play_arrow Start Learning summarize Key Insights dictionary Glossary 10 chapters · Each with High Level + Under the Hood
Foundations

Architecture, Data & When to Fine-Tune

The decision framework, transformer internals, and dataset preparation.
1
lightbulb
What Is Fine-Tuning & When to Use It
Fine-tuning vs prompting vs RAG — the decision framework and the PEFT spectrum.
2
memory
Transformer Architecture for Fine-Tuning
Attention heads, Q/K/V projections, model formats, precision, and memory footprint.
3
dataset
Dataset Preparation & Curation
Alpaca, ShareGPT, ChatML formats, quality over quantity, synthetic data, and tokenization.
Techniques

LoRA, PEFT & Distributed Training

Parameter-efficient methods and scaling to multi-GPU setups.
4
tune
LoRA & Parameter-Efficient Fine-Tuning
LoRA, QLoRA, rank/alpha tuning, DoRA, IA3, and prefix tuning.
5
dns
Full Fine-Tuning & Distributed Training
DeepSpeed ZeRO, FSDP, multi-GPU, gradient checkpointing, and mixed precision.
Alignment

RLHF, DPO & Modern Alignment

Making models helpful, harmless, and honest through human preferences.
6
thumb_up
Alignment: RLHF & Reward Models
SFT → reward model → PPO pipeline, InstructGPT, and the TRL library.
7
compare_arrows
DPO, ORPO & Modern Alignment
Direct Preference Optimization, ORPO, SimPO, KTO — skip the reward model.
Production

Tools, Evaluation & Deployment

Training infrastructure, benchmarks, model merging, and serving.
8
build
Training Infrastructure & Tools
TRL, Unsloth, Axolotl, LLaMA-Factory, cloud GPUs, and experiment tracking.
9
assessment
Evaluation & Benchmarks
MMLU, HumanEval, MT-Bench, AlpacaEval, Chatbot Arena, and custom evaluation.
10
rocket_launch
Production Deployment & Serving
Model merging, quantization, vLLM/TGI/Ollama, LoRA hot-swapping, and cost analysis.