Beyond Tokenmaxxing

May 28, 2026

tl;dr Token usage data was made available for billing and not for measuring developer output. We should create a simple heuristic by weighing frequent developer activities (merged code changes, skill & mcp invocations, etc), which can reflect purposeful spend rather than raw consumption.

Example = Σ(activity × weight) / cost ($)

Over the last year, AI adoption has accelerated. Engineers are using it for all sorts of things – from writing code to managing their digital lives. It feels like an entire generation of dormant builders suddenly came back to life.

And like every technological shift before it, enthusiasm quickly found a metric. Managers and leaders began building dashboards and internal tools to measure how thoroughly their teams had adopted it. Usage in token and $ spend became visible, and visibility became competition. Before long, people weren't just building; they were measuring how much AI they consumed while doing it.

“Meta also introduced internal dashboards to track employees’ consumption of “tokens,” a unit of A.I. use that is roughly equivalent to four characters of text, four people said. Some said the dashboards were a pressure tactic to encourage competition with colleagues. That led some employees to make so many A.I. agents that others had to introduce agents to find agents, and agents to rate agents, two people said¹”

“At Anthropic, a single user of the company’s A.I. coding system, Claude Code, racked up a bill of more than $150,000 in a month²”

While consumption became measurable, impact on the other hand did not. A developer running an agent loop and a developer debugging production incident could look identical on the spend dashboard. They spent the same tokens but the value they got out is completely different.

Token (or $ spend) data was made available to answer the question “how much did we use,” it was built for billing, not management or leaderboards and some companies³ are already starting to realize this disconnect. According to Aishwarya Sankar of Entillegence AI⁴, 82% of the token spent never makes it to the product and over 44% is spent on bug fixes, potentially created by AI agents on the loose.

The challenge is no longer getting engineers to use AI, it is understanding whether the usage is creating meaningful leverage.

Measuring Leverage

Instead of tracking consumption, we should track how much leverage it creates for developer work. One approach may be to weigh developer activities based on two variables: how frequently they occur and how much time or friction they remove.

The most valuable workflows are often not the most expensive ones, but the ones that repeatedly remove friction from high-frequency tasks.

By weighing activities that represent real developer work (code changes, context gathered, investigations) and dividing by cost, we get a signal that reflects purposeful spend rather than raw consumption. It doesn’t penalize high usage. A developer doing meaningful work at scale should score well. What it penalizes is waste: agent loops, redundant scans, and tokenmaxxing that appears productive on a dashboard but quietly is not.

Activity	Frequency	Time Saved	Weight	Data
Lines added	Extremely high	Extremely high	1	GitHub
Lines removed	High	Extremely high	0.875	GitHub
Refactoring (updates, deletes)	Medium	Extremely high	0.75	GitHub
Skill invocations	Extremely high	Extremely high	1	Vendor
MCP invocations	High	Extremely high	0.875	Vendor

Where:
Weight = ( Frequency + Time Saved ) / 2
Using: Low = 0.25, Medium = 0.5, High = 0.75, Extremely high = 1

Merged code changes

This is one of the most common use cases. Something developers do almost every day. In my opinion, lines removed are almost as valuable as lines added but most metrics ignore it entirely. Deletions and renames feel cheap when delegated to Claude because a single instruction can touch dozens of files. But the decision to delete still requires judgment, and the value of that judgment is high.

Skills and MCP invocations

Skills are scoped, reusable, context-aware workflows that make repeated developer tasks faster, more consistent, and less manual. MCPs (or connectors) are especially valuable because they gather and reconstruct context across systems. In many engineering workflows, context retrieval may be harder and more time-consuming than content generation itself. Some vendors⁵ already expose parts of this data through analytics APIs, which can serve as a useful starting point for measuring leverage in developer workflows.

A simple heuristic

Using the above signals, we can construct a simple heuristic for estimating leverage relative to cost.

Leverage Score = Σ(activity × weight) / cost ($)

Let’s look at the example:

Activity	Developer A	Developer B
Lines added	2000	1500
Lines removed	50	75
Refactoring (updates, deletes)	20	10
Skill invocations	2	10
MCP invocations	4	8
Cost	$500	$200
Score	4.13	7.95

Both developers produced similar output, but Developer B achieved significantly higher leverage relative to spend. The difference was not raw activity, but how effectively AI was used to reduce friction, reuse workflows, and gather context.

A step further

Tools like Traces could become an observability layer for human-agent collaboration. Sessions, attachments, tool invocations, and execution traces help reconstruct how work actually happened rather than simply measuring how many tokens were consumed.