Lior Sinai

DeepSeek's Multi-Head Latent Attention

A deep dive into DeepSeek’s Multi-Head Latent Attention, including the mathematics and implementation details. The layer is recreated in Julia using Flux.jl.

22 February, 2025 · 32 mins read · 4845 words

Notes on the Martinez-Rueda Polygon Clipping algorithm

The Martinez-Rueda algorithm computes boolean operations between polygons. It can be used for polygon intersections (polygon clipping), unions, differences and XORs. I recently implemented it by following a comprehensive guide at https://sean.fun/a/polygon-clipping-pt2/. However, it was slightly lacking in some complex...

11 January, 2025 · 6 mins read · 1396 words

MicroGrad.jl: Part 5 MLP

A series on automatic differentiation in Julia. Part 5 shows how the MicroGrad.jl code can be used for a machine learning framework like Flux.jl. The working example is a multi-layer perceptron trained on the moons dataset.

19 August, 2024 · 16 mins read · 1885 words

MicroGrad.jl: Part 4 Extensions

A series on automatic differentiation in Julia. Part 4 extends part 3 to handle maps, getfield and anonymous functions. It creates a generic gradient descent and uses this to fit a polynomial.

17 August, 2024 · 15 mins read · 1746 words

MicroGrad.jl: Part 3 Automation with IRTools

A series on automatic differentiation in Julia. Part 3 uses metaprogramming based on IRTools.jl to generate a modified (primal) forward pass and to reverse differentiate it into a backward pass. This is a more robust approach than the expression based...

10 August, 2024 · 25 mins read · 3291 words

MicroGrad.jl: Part 2 Automation with expressions

A series on automatic differentiation in Julia. Part 2 uses metaprogramming to generate a modified (primal) forward pass and to reverse differentiate it into a backward pass. This post uses an expression based approach which can be brittle. Part 3...

03 August, 2024 · 29 mins read · 3705 words

MicroGrad.jl: Part 1 ChainRules

A series on automatic differentiation in Julia. Part 1 provides an overview and defines explicit chain rules.

27 July, 2024 · 23 mins read · 3560 words

Covering all birthdays

Quantifying how likely each birthday is present (covered) in some large group of people.

09 July, 2024 · 12 mins read · 2504 words

Generative transformer from first principles in Julia

A transformer for generating text in Julia, trained on Shakespeare’s plays. This model can be used as a Generative Pre-trained Transformer (GPT) with further work. This post was inspired by Andrej Karpathy’s Zero to Hero course.

23 March, 2024 · 47 mins read · 7325 words

Radix Tree in Julia

A radix tree in Julia, built following Test Driven Development (TDD).

21 March, 2024 · 21 mins read · 2632 words