Hands-on: implement PAL-style Python execution for GSM8K-style problems; add self-consistency; try a tiny ToT over a puzzle domain; log tool calls in an agent demo.
Theory: revisit
How LLMs Work for limits of next-token prediction;
Prompt Engineering for prompting patterns;
LLM Evaluation for holistic metrics.
Frontier track: continue with
Multi-Agent Systems when it lands on the portal — multi-agent coordination is the natural sequel to single-model reasoning. You now have a map from the
reasoning gap to the
full stack: CoT, search, test-time training, verification, tools, benchmarks, and the trends reshaping the field.