Token is time.
Pre-context (skippable): I have not written here for around 8 years — a lot of had to do switching to a management career where topics are much more sensitive to discuss and the proliferation of LinkedIn posts has both drive traffic away from “blog posts” and makes one not want to be grouped with the self promotion groups. However, lately one topic just keeps coming back to me as I have came back to work the startup world again to have a technical perspective on a topic that I can still freely discuss: building great products fast in the LLM era — this is the Part 1 where we discuss the importance of parallel progressive disclosure, and then in another post discuss how we build an app that is comparable in complexity with Sora for Android.
Thesis: Tokens take time, so less tokens means faster delivery — provided your parallel capacity stays in the same range.
OpenAI’s “Sora for Android” sprint is a great case study in both speed and scale: from October 8 to November 5, 2025, a small team shipped the app while “consuming roughly 5 billion tokens,” often running multiple agent sessions in parallel. The payoff was velocity; the cost was a giant token bill and a new bottleneck: human review and integration.
This post is about getting the same speed benefits without paying for a giant, repeated context tax.
The key idea is progressive disclosure: agents start with a thin “index” of rules and routes, and only load deeper context when it’s actually needed. Done right, this cuts token usage and preserves parallel throughput, so wall‑clock delivery goes down.
Why “tokens = time” is the constraint you should design around
When you use coding agents seriously, you’re operating a compute pipeline:
- Tokens processed (inputs + tool definitions + retrieved context + outputs)
- Throughput (tokens/second for your model/runtime)
- Parallel capacity (how many independent tasks you can run at once)
- Coordination overhead (human review, merge conflicts, design decisions)
A useful mental model:
Wall‑clock time ≈ tokens on the critical path ÷ (tokens/sec × parallel sessions) + coordination overhead
So yes: fewer tokens tends to mean faster — if you don’t reduce success rate or force more retries, and if you don’t collapse parallelism by creating integration chaos.
Progressive disclosure is the engineering pattern that makes “less tokens” compatible with “many parallel workers.”
Progressive disclosure in 3 layers
Think of context as a hierarchy:
- Always‑on, tiny: repo rules, commands, invariants → instruction files
- Loaded only when relevant: repeatable workflows → Skills
- Fetched on demand: docs, tickets, logs, dashboards → MCP tools
You’re trying to avoid this anti-pattern:
“Paste the entire world into every prompt so the agent won’t get lost.”
Instead, build a discoverable system so agents can find what they need when they need it.
Layer 1: Instruction files (fast startup alignment)
Instruction files are your “contract” with the agent. They replace long onboarding prompts.
Codex: AGENTS.md as an instruction chain
Codex supports an instruction chain that walks from repo root down to the current directory, layering guidance. Practically, that means:
- Put stable, repo‑wide guidance in the root
AGENTS.md - Put area‑specific overrides in nested directories (e.g.,
mobile/AGENTS.md) - Keep each file small — Codex enforces a default size cap (32 KiB) across discovered files
Codex teams often use lots of these small files so parallel sessions stay consistent across the repo.
Claude Code: CLAUDE.md for “memory” and startup context
Anthropic’s guidance is similar in spirit: Claude Code auto-pulls CLAUDE.md into context at the start of a session, and they explicitly note that “context gathering consumes time and tokens.”
Claude’s memory system is hierarchical (user‑level, project‑level, etc.) and supports imports, so you can keep base guidance small and route to deeper docs only when needed.
What belongs in instruction files (and what doesn’t)
✅ Put in AGENTS.md / CLAUDE.md (token‑efficient, stable):
- Repo map (top folders and what they mean)
- “How to run checks” (lint, tests, typecheck)
- Coding conventions that cause CI failures if ignored
- PR/branch etiquette
- Definition of Done (what “complete” means here)
- “Where to find deeper docs” (links or file paths)
❌ Keep out of instruction files (token-expensive, volatile):
- Full architecture docs
- API references
- Incident runbooks
- Every edge case and historical decision
Those go into Skills and retrievable docs.
A minimal, shareable template
Create both files (they can be similar). Keep them short.
# AGENTS.md / CLAUDE.md
## Repo map (fast orientation)
- /apps/mobile: Android app
- /services/api: backend services
- /packages/ui: shared UI components
- /docs: architecture, ADRs, runbooks
## How to run checks (always)
- Unit: `./gradlew test` (or `npm test` / `pytest`)
- Lint/format: `./gradlew detektFix && ./gradlew detekt`
- Typecheck: `npm run typecheck` (if applicable)
## Working agreements
- Keep diffs small and scoped.
- Update tests for bug fixes.
- Don’t change public APIs without updating docs and adding tests.
## Progressive disclosure pointers
- For large work: write/update a plan file in `.agent/PLANS.md`.
- For CI failures: use skill `$ci_autofix`.
- For external docs: use MCP tools (don’t paste long docs into chat).
Layer 2: Skills (workflow modules with built‑in progressive disclosure)
Skills are the biggest multiplier for token efficiency because they turn repeated prompting into reusable modules.
Codex Agent Skills
Codex Skills are explicitly designed for progressive disclosure:
- At startup, Codex loads only each skill’s name + description
- When invoked (explicitly or implicitly), Codex loads full instructions and any referenced files
That means you can have lots of skills available without paying their full token cost on every run.
Anthropic Agent Skills (Claude)
Anthropic’s Agent Skills follow the same principle:
- Load only metadata (name/description) at startup
- Load
SKILL.mdonly when relevant - Bundle additional resources (docs/scripts) that can be read/run on demand
Their guidance is blunt and correct: every paragraph has to justify its token cost once it’s loaded.
What a “high ROI” Skill looks like
A good skill:
- Encodes the workflow (not a generic explanation)
- Calls out repo-specific commands, locations, constraints
- Includes an explicit verification loop
- Produces a PR-ready summary
Example: a CI auto-fix skill ($ci_autofix)
.codex/skills/ci_autofix/
SKILL.md
scripts/
run_ci_locally.sh
references/
ci-matrix.md
A simple SKILL.md (illustrative):
---
name: ci_autofix
description: Diagnose and fix CI failures with minimal diffs. Use when tests/lint/typecheck fail in CI or locally.
---
# CI Auto-fix workflow
## Goal
Make CI green with the smallest safe change.
## Steps
1) Reproduce locally using the canonical commands from AGENTS.md.
2) Identify the *first failing* test/lint error.
3) Patch minimally (no refactors unless required).
4) Re-run the same checks.
5) Summarize: root cause, fix, files changed, how verified.
## Guardrails
- Don’t change public APIs unless CI failure forces it.
- Prefer adding/adjusting tests over weakening assertions.
Once this exists, you stop spending tokens re-explaining CI behavior in every conversation.
Layer 3: MCP (tool-backed context instead of prompt dumps)
MCP (Model Context Protocol) is the standard way to connect an agent to external tools and context—documentation, dashboards, issue trackers, internal services—without pasting everything into the prompt.
Both OpenAI and Anthropic support MCP in their coding workflows.
Team sharing: check in .mcp.json
Claude Code supports a project-scoped .mcp.json at your repo root, designed to be checked into version control so everyone shares the same tool connections.
The point is not “more tools.” The point is less prompt bloat because tools can fetch the right info at the right time.
The hidden token killer: tool definition bloat
Anthropic’s advanced tool use writeup shows how bad this can get:
- Loading dozens of MCP tool definitions up front can burn tens of thousands of tokens before any work begins
- Their “Tool Search Tool” approach defers tool loading and can reduce that overhead dramatically while keeping the full library available
Design rule: treat tools like Skills:
- Load a small “router/search” tool up front
- Discover and load specific tools only when needed
- Keep tool results concise and structured
MCP patterns that reduce tokens and retries
Search → fetch → decide
Don’t fetch full docs; fetch top hits, then open only what matters.Return structured outputs
Prefer JSON payloads over long prose. Let the agent summarize.Keep results out of context when possible
If a tool can write to a file (artifact), do that and return a pointer + small summary.Cache at the tool layer
If a build matrix or service map changes weekly, cache it server-side.
Parallel capacity: speed only happens if your codebase can absorb concurrency
The Sora Android team described running multiple Codex sessions in parallel—playback, search, error handling, tests/refactors—like managing a team. Their key lesson: the bottleneck shifted from writing code to decisions, feedback, and integrating changes.
That’s the core constraint for “less tokens → faster delivery”:
- you need parallel workers
- but you also need low-friction integration
Make your repo “parallel-task friendly”
If you want many agents working at once, design for it:
1) Reduce hot files
- Global registries and mega “index” files cause constant merge conflicts
- Prefer local registration and convention-based wiring
2) Strengthen module boundaries
- Organize around “things that change together,” not just technical layers
- Publish stable interfaces; isolate feature work behind them
3) Make Definition of Done executable
- Deterministic format/lint
- Fast unit tests per module
- Clear smoke tests
4) Provide a “golden path”
- Agents are far better with examples than with abstract rules
- Maintain 1–2 representative, end-to-end implementations per major feature type
5) Isolate work with branches/worktrees
- Claude Code docs recommend git worktrees for running parallel sessions with full code isolation
- The same principle applies for any multi-agent workflow: isolate, then merge
Automation: make the agents do the loops
If you want to actually realize parallel capacity, stop using humans as the retry engine.
Codex: non-interactive codex exec for CI and scripts
Codex supports non-interactive mode (codex exec) specifically for scripts and CI:
- Run in pipelines
- Emit JSONL events for automation
- Use minimal permissions
- Implement common patterns like CI auto-fix workflows that open PRs
This is the practical bridge from “agent assistant” to “agent automation.”
Claude: use CLI workflows and headless usage patterns
Claude Code docs show patterns like:
- running parallel sessions (worktrees)
- using Claude as a unix-style utility (pipe in/out)
- adding Claude to verification (lint-like checks)
The important part isn’t which agent you use. It’s this principle:
Automation should make the agent responsible for re-running checks until green; humans should only review the final diff.
Put it together: a repo layout that scales
Here’s a structure that implements progressive disclosure end-to-end:
repo/
AGENTS.md
CLAUDE.md
docs/
architecture.md
ownership.md
runbooks/
.agent/
PLANS.md
.codex/
skills/
ci_autofix/
SKILL.md
scripts/
new_endpoint/
SKILL.md
references/
.claude/
commands/
fix-ci.md
review-pr.md
.mcp.json
Each layer is discoverable, but nothing forces the agent to ingest it all at once.
A practical “Week 1” checklist
If you only do one week of work, do this:
Create root
AGENTS.mdandCLAUDE.mdwith:- repo map
- commands
- Definition of Done
- “where to look next” pointers
Add 2–3 directory-level instruction files in the messiest areas
Create 3 high-ROI skills:
- CI auto-fix
- “add tests for bug fix”
- “new feature scaffold”
Add
.mcp.jsonwith the 1–2 MCP servers you actually need (docs/search/logs)Standardize local commands so agents can self-verify:
- one-liner scripts or
maketargets
- one-liner scripts or
How to know it’s working (metrics)
Track these per week:
- Tokens per merged PR (or per story point)
- Median time from “start task” → “PR ready”
- Retry rate (how often agents have to ask for missing context)
- Merge conflict rate
- Human review minutes per PR
You want token counts down without retry rate spiking and without merge conflicts rising.
Bottom line
If you believe “tokens take time,” then your job isn’t to beg models to be cheaper. Your job is to stop paying tokens for the same context over and over.
Progressive disclosure does that:
- Instruction files align sessions fast
- Skills turn repeated prompting into reusable workflow modules
- MCP replaces context dumping with on-demand retrieval
- Parallel-friendly repo design preserves throughput
- Automation ensures humans aren’t the retry engine
That’s how you get faster delivery with fewer tokens while keeping parallel capacity in the same range.
Sources
OpenAI: How we used Codex to build Sora for Android in 28 days
https://openai.com/index/shipping-sora-for-android-with-codex/OpenAI Codex docs:
AGENTS.mdguide
https://developers.openai.com/codex/guides/agents-md/OpenAI Codex docs: Agent Skills (progressive disclosure)
https://developers.openai.com/codex/skills/OpenAI Codex docs: MCP (Model Context Protocol)
https://developers.openai.com/codex/mcp/OpenAI Codex docs: Non-interactive mode (
codex exec)
https://developers.openai.com/codex/noninteractiveAnthropic: Claude Code: Best practices for agentic coding
https://www.anthropic.com/engineering/claude-code-best-practicesAnthropic docs: Manage Claude’s memory (
CLAUDE.mdhierarchy and imports)
https://code.claude.com/docs/en/memoryAnthropic docs: MCP in Claude Code (
.mcp.json)
https://code.claude.com/docs/en/mcpAnthropic: Introducing advanced tool use (Tool Search Tool, tool definition bloat)
https://www.anthropic.com/engineering/advanced-tool-useAnthropic: Equipping agents for the real world with Agent Skills
https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skillsAnthropic docs: Prompt caching
https://platform.claude.com/docs/en/build-with-claude/prompt-caching
