Newsletter October 2025: Computer Enhance!

Home / 2025 / 11 / 02 / #newsletter / #rlmeta / #rlselect / #rlworkbench

This month I discovered Casey Muratori. This opened up a whole new world of programming to me, and I have been consuming many hours of Youtube videos from this new world and also started new programming projects to explore these ideas. It feels like I have unlocked the next level of programming. So who is Casey? And what have I been doing? Let's see.

Casey cares about performance. He has a bunch of educational resources where he talks about how to become aware of performance and how to improve performance. A few of the first things I came across from him are the following:

After becoming a little more aware of performance, I got an idea for how to improve the performance of rlselect. I started implementing rlselect2 with a focus on new ideas for better performance. Not that rlselect desperately needs better performance, but I wanted to learn. So maybe next time I write something, I don't default to bad performance.

Casey also has a subscription-based course called Computer, Enhance! which I also enrolled in. You learn about how computers (CPUs) work and how to get the most out of them. Casey explains things very well, so I look forward to working my way through his course.

The first homework in the course is to write an instruction decoder for 8086 instructions. That is, you should parse machine code and disassemble it into assembly instructions.

Then I got sidetracked. Sidetracked with rlmeta2, which is a new implementation of rlmeta. What is rlmeta?

RLMeta is a programming language in which you write grammars. Grammars have rules that specify how to match objects from an input stream and specify what should happen when objects are matched. It can be used to write lexers, parsers, tree transformers, code generators, and similar tools.

Why did I start a second implementation? Maybe because I had new ideas for how to get better performance. Maybe because I like the architecture of rlmeta and wanted to try and bootstrap it again.

The first step was to implement rlmeta2 i Python. I was able to do that in a weekend. Then, because of my new ideas about performance, I wanted to implement it in C instead. I spent much time thinking about how to implement some of Python's concepts in C for the runtime support that rlmeta2 needs.

Then I watched the Wookash podcast Casey Muratori on Legendary Handmade Hero! where Casey mentioned Better Software Conference. The conference features speakers that seem to share Casey's philosophy about performance and the handmade spirit.

I watched the talk Vjekoslav Krajačić – File Pilot: Inside the Engine – BSC 2025. In it, he talks about arenas. Arenas is a memory allocation strategy that can be used in C that feels like garbage collection.

That was a clue for me in my work on implementing rlmeta2 in C. Python has automatic memory management and garbage collection. So I've never really thought about memory. But rlmeta2 needs to allocate memory dynamically. How do you do that in C in a good way?

I found a blog post on the subject by Ryan Fleury (also a speaker at BSC): Untangling Lifetimes: The Arena Allocator. He also covers the topic in the video Enter The Arena: Simplifying Memory Management (2023). And in Everyone is doing memory management wrong. feat. Ryan Fleury | S2 E02 (at around 36:02) Ryan talks about how he learned about arenas from Casey's Handmade Hero.

Another topic that came up in this (to me) new world was data-oriented design and writing programs that make effective use of the CPUs caches. One talk about that that I watched was CppCon 2014: Mike Acton "Data-Oriented Design and C++". By making assumptions about your data, you can write more efficient programs.

So my head has been spinning with high frequency on these ideas. Learning more low level programming. Learning how high level concepts map to low level concepts. Sometimes working on a high abstraction level is nice. But sometimes all layers of abstractions get in the way. Certainly for performance. But sometimes a problem might actually be harder to solve with more abstractions. So I am excited to learn more about low level concepts to be able to write better software.

At the end of the month I had actually made some real good progress on my new projects. I managed to write the instruction decoder for the first homework assignment in rlmeta2. Here is what it looks like:

main = decodeInstruction*:xs !. -> {
    "bits 16\n"
    "\n"
    xs
};

decodeInstruction =
    | opFromByte:opcode peekRegByte:source rmByte:destination -> {
        opcode " " destination ", " source "\n"
    }
    | opFromWide:opcode peekRegWide:source rmWide:destination -> {
        opcode " " destination ", " source "\n"
    }
    ;

peekRegByte =
    | &0b_xx_000_xxx -> { "al" }
    | &0b_xx_001_xxx -> { "cl" }
    | &0b_xx_010_xxx -> { "dl" }
    | &0b_xx_011_xxx -> { "bl" }
    | &0b_xx_100_xxx -> { "ah" }
    | &0b_xx_101_xxx -> { "ch" }
    | &0b_xx_110_xxx -> { "dh" }
    | &0b_xx_111_xxx -> { "bh" }
    ;

peekRegWide =
    | &0b_xx_000_xxx -> { "ax" }
    | &0b_xx_001_xxx -> { "cx" }
    | &0b_xx_010_xxx -> { "dx" }
    | &0b_xx_011_xxx -> { "bx" }
    | &0b_xx_100_xxx -> { "sp" }
    | &0b_xx_101_xxx -> { "bp" }
    | &0b_xx_110_xxx -> { "si" }
    | &0b_xx_111_xxx -> { "di" }
    ;

rmByte =
    | 0b_xx_xxx_000 -> { "al" }
    | 0b_xx_xxx_001 -> { "cl" }
    | 0b_xx_xxx_010 -> { "dl" }
    | 0b_xx_xxx_011 -> { "bl" }
    | 0b_xx_xxx_100 -> { "ah" }
    | 0b_xx_xxx_101 -> { "ch" }
    | 0b_xx_xxx_110 -> { "dh" }
    | 0b_xx_xxx_111 -> { "bh" }
    ;

rmWide =
    | 0b_xx_xxx_000 -> { "ax" }
    | 0b_xx_xxx_001 -> { "cx" }
    | 0b_xx_xxx_010 -> { "dx" }
    | 0b_xx_xxx_011 -> { "bx" }
    | 0b_xx_xxx_100 -> { "sp" }
    | 0b_xx_xxx_101 -> { "bp" }
    | 0b_xx_xxx_110 -> { "si" }
    | 0b_xx_xxx_111 -> { "di" }
    ;

opFromByte = &0b_xxxxxx_0_0 opcode:x -> { x };
opFromWide = &0b_xxxxxx_0_1 opcode:x -> { x };

opcode =
    | 0b_100010_d_w -> { "mov" }
    ;

This program gets compiled to a C program by the meta compiler (a version of rlmeta2 implemented in C) which can then be compiled using a C compiler:

$ ./meta <decoder.meta >decoder.c
$ gcc -o decoder decoder.c

This program is a decoder for 8086 machine instructions, and when run on the machine code example

$ xxd machine_code_example
00000000: 89d9 88e5 89da 89de 89fb 88c8 88ed 89c3  ................
00000010: 89f3 89fc 89c5                           ......

it spits out this:

$ ./decoder <machine_code_example
bits 16

mov cx, bx
mov ch, ah
mov dx, bx
mov si, bx
mov bx, di
mov al, cl
mov ch, ch
mov bx, ax
mov bx, si
mov sp, di
mov bp, ax

TODO

This is an experiment to make public a TODO list with my current programming interests. In every newsletter, I will report what I did and what next steps I'm most interested in pursuing. So here is the list going into next month.

References: