Skip to main content

Tag: Programming

Emergent Trade and Tolerated Theft Using Multi-Agent Reinforcement Learning

I’ve been an author on a few papers before, but I recently published the first research project where I was responsible for most of the work and direction. It’s in the first 2024 issue of the journal Artificial Life, which you can find here. You can find a non-paywalled version here Below, I tell the chronology of the project and summarize our findings.

We explore the conditions under which trade can emerge between four deep reinforcement learning agents that pick up and put down resources in a 2D foraging environment. Agents are rewarded for having both resources once, but the resources are distributed far apart from each other. To maximize reward, agents need to split up the work - agent 1 goes to resource A, agent 2 goes to resource B, etc, and then they meet to exchange resources, since meeting halfway can get them the most of each resource in the shortest amount of time.

Take the Road Most Documented

How great would it be if the solution to most errors you face were in the first place you looked? That’s what the Arch Wiki has been for me: a massive wealth of information and troubleshooting resources to help me navigate the various configuration and installation issues I’ve encountered. Some people claim Arch Linux is too difficult for new users, but for me it’s been the only distribution I’ve been able to get consistently working, and it’s all thanks to the detailed documentation and known workarounds.

Numerical Stability in Flash Attention

Flash attention, a recent implementation of attention which makes less calls to high-bandwidth memory, uses a version of the softmax function which is numerically stable. In this post, I’ll briefly showcase how this is done and an example of an unstable softmax.

The softmax function is used in machine learning to convert a vector of real numbers to a vector of probabilities which sum to 1, and is defined as:

Introducing Kittyplot

Kittyplot is a program designed to plot experiment data in the kitty terminal using the kitty graphics protocol, primarily for use on HPC clusters.

Plots are rendered using matplotlib, and users can zoom into different regions of the plots by setting x and y limits using their editor. I use prompt_toolkit to accept regexp input and I override the tab-completion to instead display a list of all metrics that are matched by the current regexp.

Unexpected Benefits of Testing Code

Matthew Carlson’s blog post “Fighting Distraction With Unit Tests” inspired me to share some extra benefits of writing test code I’ve discovered during my PhD program.

I’m working on a weird project that’s constantly changing as I try new things, and naturally, debugging and ensuring correctness was a nightmare. So I started writing tests, cursing myself for needing to write so much code I’ll likely throw away soon. But as it turns out, testing can be pretty helpful in a few other ways: