Skip to main content

Tag: programming

Numerical Stability in Flash Attention

Flash attention, a recent implementation of attention which makes less calls to high-bandwidth memory, uses a version of the softmax function which is numerically stable. In this post, I’ll briefly showcase how this is done and an example of an unstable softmax. The softmax function is used in machine learning to convert a vector of real numbers to a vector of probabilities which sum to 1, and is defined as:

Introducing Kittyplot

Kittyplot is a program designed to plot experiment data in the kitty terminal using the kitty graphics protocol, primarily for use on HPC clusters. Plots are rendered using matplotlib, and users can zoom into different regions of the plots by setting x and y limits using their editor. I use prompt_toolkit to accept regexp input and I override the tab-completion to instead display a list of all metrics that are matched by the current regexp.

Unexpected Benefits of Testing Code

Matthew Carlson’s blog post “Fighting Distraction With Unit Tests” inspired me to share some extra benefits of writing test code I’ve discovered during my PhD program. I’m working on a weird project that’s constantly changing as I try new things, and naturally, debugging and ensuring correctness was a nightmare. So I started writing tests, cursing myself for needing to write so much code I’ll likely throw away soon. But as it turns out, testing can be pretty helpful in a few other ways: