Part I: Foundations
Mathematical prerequisites, neural network fundamentals, and the path to Transformers
-
1Mathematical FoundationsLinear algebra, calculus, probability, and information theory for deep learning58 KB
-
2Neural Networks Deep DivePerceptrons to deep networks, backpropagation, activation functions, optimization73 KB
-
3Sequence ModelingRNNs, LSTMs, GRUs, seq2seq, and the attention revolution32 KB
-
4The Transformer ArchitectureSelf-attention, multi-head attention, positional encoding, encoder-decoder80 KB
Part II: Language Model Training
From raw text to pretrained language models
-
5TokenizationBPE, WordPiece, Unigram, SentencePiece, multilingual tokenization60 KB
-
6Language ModelingAutoregressive models, masked LMs, training objectives, perplexity70 KB
-
7Training InfrastructureDistributed training, data/tensor/pipeline parallelism, ZeRO, mixed precision67 KB
-
8Scaling LawsKaplan/Chinchilla laws, compute-optimal training, data scaling, inference scaling60 KB
Part III: Reinforcement Learning & Alignment
From RL fundamentals to RLHF, DPO, and modern alignment techniques
-
9Reinforcement Learning FoundationsMDPs, value functions, Bellman equations, policy gradient, REINFORCE39 KB
-
10Policy OptimizationTRPO, PPO, clipping, GAE, GRPO, reward shaping58 KB
-
11RLHFReward modeling, the RLHF pipeline, KL penalties, InstructGPT, process rewards47 KB
-
12DPO and Alignment AlternativesDPO derivation, GRPO, Constitutional AI, RLAIF, KTO, IPO75 KB
Part IV: Model Families
The evolution of frontier LLMs from GPT to DeepSeek
-
13The GPT SeriesGPT-1 through GPT-4, architectural evolution, scaling milestones82 KB
-
14LLaMA and Open-Source LLMsLLaMA 1-3, Mistral, Qwen, DeepSeek, the open-source ecosystem45 KB
-
15Frontier ModelsGPT-4, Claude, Gemini, o-series reasoning models, DeepSeek V3/R167 KB
Part V: Efficiency & Optimization
Making large models practical: attention, quantization, sparsity, and adaptation
-
16Attention OptimizationFlash Attention, GQA, MQA, KV cache, PagedAttention, linear attention77 KB
-
17Quantization and EfficiencyINT8/INT4, GPTQ, AWQ, GGUF, pruning, distillation, speculative decoding46 KB
-
18Mixture of ExpertsSparse MoE, routing algorithms, Mixtral, Switch Transformer, load balancing40 KB
-
19Fine-Tuning and LoRAFull fine-tuning, LoRA, QLoRA, adapters, prompt tuning, PEFT methods68 KB
Part VI: Advanced Capabilities
Multimodal understanding, reasoning, agents, and tool use
-
20Multimodal ModelsVision-language models, CLIP, LLaVA, audio, video, Sora64 KB
-
21Reasoning and Chain-of-ThoughtCoT prompting, reasoning models (o1, R1), test-time compute, verification65 KB
-
22Agents and Tool UseReAct, RAG, multi-agent systems, MCP, computer use, memory systems144 KB
Part VII: Safety, Interpretability & the Future
Understanding, aligning, and evolving language models
-
23Interpretability and SafetyMechanistic interpretability, probing, alignment, red-teaming, governance79 KB
-
24Continual Learning FoundationsCatastrophic forgetting, EWC, progressive networks, replay methods34 KB
-
25Continual Learning for LLMsKnowledge editing, model merging, online learning, temporal adaptation43 KB
Part VIII: Practice & Production
Data pipelines, inference systems, evaluation, prompt engineering, and synthetic data
-
26Pre-training Data PipelinesWeb crawling, deduplication, PII scrubbing, quality filtering, decontamination50 KB
-
27Inference Systems and ServingKV cache, PagedAttention, speculative decoding, vLLM, production deployment50 KB
-
28Evaluation and BenchmarksMMLU, Chatbot Arena, LLM-as-judge, safety evaluation, leaderboards50 KB
-
29Prompt EngineeringFew-shot, chain-of-thought, structured prompting, system prompts, best practices46 KB
-
30Synthetic Data and Self-ImprovementSelf-Instruct, distillation, RLAIF, self-play, model collapse, data augmentation50 KB