Matt Glassman

sentences to ponder about tokenmaxxing from Zac Hill and Nate Meyvis

Here's Zac Hill's excellent new substack, Manual Transmission, and here's Nate Meyvis' Bear blog. These are two of my favorite writers on the internet right now. Highly recommended.1

Both of them generally write in a world I am largely ignorant about—AI in some broad sense, Nate closer to the programming-level and Zac more in the big-picture deployment level—but usually present ideas with much wider application. Zac is longform and Nate is shortform, but I find insights about politics, for instance, in many of their paragraphs.

By chance, both of them wrote about tokenmaxxing this week. Here's Zac, in Goodhart's Law adjacent mode:

Andy’s approach passes muster because it isn’t coming from a place of fear.

He isn't afraid of being left behind. He isn't afraid of being replaced. He isn't afraid of being the guy who didn't see what was coming. He is a guy who runs fourteen pizza shops and got tired of jamming his business into software designed for online t-shirt vendors. So he sat down with Claude Code on the Max plan, broke a bunch of stuff, rebuilt it from the ground up, and can now get out ahead of the Friday crunch. He knew what the work was, because he had a job to do.

Tokenmaxxing is the opposite. It's what you do when you don't have a problem to solve but you've been told you'd better look busy anyway.

It's the same thing we see in the social/civic sector all the time: outcomes are not the same as activities. The actual outcome you want is murky and multi-factor and complicated to parse, so the temptation is to create vanity metrics around what you can measure and can prove. It’s a form of ‘legibility cope’: artificial rigor papering over an intrinsically difficult-to-model question.

This whole phenomenon embodies what I like to call Cosplay Epistemology: the belief that the performance of a thing is a version of the thing itself. And it fails for the reason the prototypal cargo-cult example fails: a conflation of cause and effect.

In Cosplay Epistemology, the mask fits. The vocabulary is right. The token counts get tracked and visualized. The badges say Token Legend and Cache Wizard and Session Immortal, and the “AI strategy” gets socialized successfully at the offsite. None of it touches the mechanics of any actual problem, because the problem has been defined as the performance.

And here's Nate, thinking nuts-and-bots but also philosophically:

The connection between token usage and getting-work-done is a lot looser than many people seem to be assuming. This is for a lot of reasons, including:

  1. Token usage is very sensitive to how often you /clear, because overall token usage is approximately quadratic in session length.

  2. If you initialize MCP servers or load long AGENTS.md files with every session, you'll use a lot more tokens in ways that on average get little (or no, or negative) work done.

People discussing tokenmaxxing generally understand that the connection between tokens and output is loose, but they seem not to understand quite how loose it is. Some of my most intensive and productive AI-assisted coding sessions come when I'm making many smaller plans, addressing many small issues, and /clearing a lot. These often consume fewer tokens than leisurely (and useful, but less productive) planning sessions over lunch.

The implications of these insights for public policy are pretty obvious, but that’s a different post.


  1. For some entry-point Zac posts, maybe start here and here. For Nate, you might start here and here.