MoE | Rajat Patel

This document is my personal deep-dive into how modern open-source GPT-style language models are actually built. It’s not meant to be a full survey of the literature; instead, it captures the pieces that helped me build an intuition for these models: how text becomes vectors, how positional information is injected, how attention is optimized, and how FFN variants like MoE increase capacity without blowing up compute. 1. Tokenization & Embeddings The journey from human-readable text to a format the model understands begins here. ...