Deep dives into ML research, mathematical intuition, systems design, and software engineering — written for practitioners who enjoy the details.
Re-deriving scaled dot-product attention from first principles, with annotated PyTorch code, mermaid architecture diagrams, and a discussion of why positional encoding works the way it does.