model_training

LLM Fine-Tuning & Alignment — Deep Dive

From base models to production-ready, aligned LLMs. Each chapter: visual journey overview + under-the-hood deep dive.

Co-Created by Kiran Shirol and Claude

Core Stack Fine-Tuning LoRA / PEFT RLHF / DPO HuggingFace PyTorch

home Learning Portal play_arrow Start Learning summarize Key Insights dictionary Glossary 10 chapters · Each with High Level + Under the Hood

Foundations

Architecture, Data & When to Fine-Tune

The decision framework, transformer internals, and dataset preparation.

lightbulb

What Is Fine-Tuning & When to Use It

Fine-tuning vs prompting vs RAG — the decision framework and the PEFT spectrum.

visibility High Level code Deep Dive

memory

Transformer Architecture for Fine-Tuning

Attention heads, Q/K/V projections, model formats, precision, and memory footprint.

visibility High Level code Deep Dive

dataset

Dataset Preparation & Curation

Alpaca, ShareGPT, ChatML formats, quality over quantity, synthetic data, and tokenization.

visibility High Level code Deep Dive

Techniques

LoRA, PEFT & Distributed Training

Parameter-efficient methods and scaling to multi-GPU setups.

tune

LoRA & Parameter-Efficient Fine-Tuning

LoRA, QLoRA, rank/alpha tuning, DoRA, IA3, and prefix tuning.

visibility High Level code Deep Dive

dns

Full Fine-Tuning & Distributed Training

DeepSpeed ZeRO, FSDP, multi-GPU, gradient checkpointing, and mixed precision.

visibility High Level code Deep Dive

Alignment

RLHF, DPO & Modern Alignment

Making models helpful, harmless, and honest through human preferences.

thumb_up

Alignment: RLHF & Reward Models

SFT → reward model → PPO pipeline, InstructGPT, and the TRL library.

visibility High Level code Deep Dive

compare_arrows

DPO, ORPO & Modern Alignment

Direct Preference Optimization, ORPO, SimPO, KTO — skip the reward model.

visibility High Level code Deep Dive

Production

Tools, Evaluation & Deployment

Training infrastructure, benchmarks, model merging, and serving.

build

Training Infrastructure & Tools

TRL, Unsloth, Axolotl, LLaMA-Factory, cloud GPUs, and experiment tracking.

visibility High Level code Deep Dive

assessment

Evaluation & Benchmarks

MMLU, HumanEval, MT-Bench, AlpacaEval, Chatbot Arena, and custom evaluation.

visibility High Level code Deep Dive

rocket_launch

Production Deployment & Serving

Model merging, quantization, vLLM/TGI/Ollama, LoRA hot-swapping, and cost analysis.

visibility High Level code Deep Dive

LLM Fine-Tuning & Alignment — Deep Dive

Architecture, Data & When to Fine-Tune

LoRA, PEFT & Distributed Training

RLHF, DPO & Modern Alignment

Tools, Evaluation & Deployment

Explore Related Courses