Pre-context (skippable): I have not written here for around 8 years — a lot of had to do switching to a management career where topics are much more sensitive to discuss and the proliferation of LinkedIn posts has both drive traffic away from “blog posts” and makes one not want to be grouped with the self promotion groups. However, lately one topic just keeps coming back to me as I have came back to work the startup world again to have a technical perspective on a topic that I can still freely discuss: building great products fast in the LLM era — this is the Part 1 where we discuss the importance of parallel progressive disclosure, and then in another post discuss how we build an app that is comparable in complexity with Sora for Android.

Thesis: Tokens take time, so less tokens means faster delivery — provided your parallel capacity stays in the same range.

OpenAI’s “Sora for Android” sprint is a great case study in both speed and scale: from October 8 to November 5, 2025, a small team shipped the app while “consuming roughly 5 billion tokens,” often running multiple agent sessions in parallel. The payoff was velocity; the cost was a giant token bill and a new bottleneck: human review and integration.

This post is about getting the same speed benefits without paying for a giant, repeated context tax.

The key idea is progressive disclosure: agents start with a thin “index” of rules and routes, and only load deeper context when it’s actually needed. Done right, this cuts token usage and preserves parallel throughput, so wall‑clock delivery goes down.

Why “tokens = time” is the constraint you should design around

When you use coding agents seriously, you’re operating a compute pipeline:

Tokens processed (inputs + tool definitions + retrieved context + outputs)
Throughput (tokens/second for your model/runtime)
Parallel capacity (how many independent tasks you can run at once)
Coordination overhead (human review, merge conflicts, design decisions)

A useful mental model:

Wall‑clock time ≈ tokens on the critical path ÷ (tokens/sec × parallel sessions) + coordination overhead

So yes: fewer tokens tends to mean faster — if you don’t reduce success rate or force more retries, and if you don’t collapse parallelism by creating integration chaos.

Progressive disclosure is the engineering pattern that makes “less tokens” compatible with “many parallel workers.”

Progressive disclosure in 3 layers

Think of context as a hierarchy:

Always‑on, tiny: repo rules, commands, invariants → instruction files
Loaded only when relevant: repeatable workflows → Skills
Fetched on demand: docs, tickets, logs, dashboards → MCP tools

You’re trying to avoid this anti-pattern:

“Paste the entire world into every prompt so the agent won’t get lost.”

Instead, build a discoverable system so agents can find what they need when they need it.

Layer 1: Instruction files (fast startup alignment)

Instruction files are your “contract” with the agent. They replace long onboarding prompts.

Codex: `AGENTS.md` as an instruction chain

Codex supports an instruction chain that walks from repo root down to the current directory, layering guidance. Practically, that means:

Put stable, repo‑wide guidance in the root AGENTS.md
Put area‑specific overrides in nested directories (e.g., mobile/AGENTS.md)
Keep each file small — Codex enforces a default size cap (32 KiB) across discovered files

Codex teams often use lots of these small files so parallel sessions stay consistent across the repo.

Claude Code: `CLAUDE.md` for “memory” and startup context

Anthropic’s guidance is similar in spirit: Claude Code auto-pulls CLAUDE.md into context at the start of a session, and they explicitly note that “context gathering consumes time and tokens.”

Claude’s memory system is hierarchical (user‑level, project‑level, etc.) and supports imports, so you can keep base guidance small and route to deeper docs only when needed.

What belongs in instruction files (and what doesn’t)

✅ Put in AGENTS.md / CLAUDE.md (token‑efficient, stable):

Repo map (top folders and what they mean)
“How to run checks” (lint, tests, typecheck)
Coding conventions that cause CI failures if ignored
PR/branch etiquette
Definition of Done (what “complete” means here)
“Where to find deeper docs” (links or file paths)

❌ Keep out of instruction files (token-expensive, volatile):

Full architecture docs
API references
Incident runbooks
Every edge case and historical decision

Those go into Skills and retrievable docs.

A minimal, shareable template

Create both files (they can be similar). Keep them short.

# AGENTS.md / CLAUDE.md

## Repo map (fast orientation)
- /apps/mobile: Android app
- /services/api: backend services
- /packages/ui: shared UI components
- /docs: architecture, ADRs, runbooks

## How to run checks (always)
- Unit: `./gradlew test`  (or `npm test` / `pytest`)
- Lint/format: `./gradlew detektFix && ./gradlew detekt`
- Typecheck: `npm run typecheck` (if applicable)

## Working agreements
- Keep diffs small and scoped.
- Update tests for bug fixes.
- Don’t change public APIs without updating docs and adding tests.

## Progressive disclosure pointers
- For large work: write/update a plan file in `.agent/PLANS.md`.
- For CI failures: use skill `$ci_autofix`.
- For external docs: use MCP tools (don’t paste long docs into chat).

Layer 2: Skills (workflow modules with built‑in progressive disclosure)

Skills are the biggest multiplier for token efficiency because they turn repeated prompting into reusable modules.

Codex Agent Skills

Codex Skills are explicitly designed for progressive disclosure:

At startup, Codex loads only each skill’s name + description
When invoked (explicitly or implicitly), Codex loads full instructions and any referenced files

That means you can have lots of skills available without paying their full token cost on every run.

Anthropic Agent Skills (Claude)

Anthropic’s Agent Skills follow the same principle:

Load only metadata (name/description) at startup
Load SKILL.md only when relevant
Bundle additional resources (docs/scripts) that can be read/run on demand

Their guidance is blunt and correct: every paragraph has to justify its token cost once it’s loaded.

What a “high ROI” Skill looks like

A good skill:

Encodes the workflow (not a generic explanation)
Calls out repo-specific commands, locations, constraints
Includes an explicit verification loop
Produces a PR-ready summary

Example: a CI auto-fix skill (`$ci_autofix`)

.codex/skills/ci_autofix/
  SKILL.md
  scripts/
    run_ci_locally.sh
  references/
    ci-matrix.md

A simple SKILL.md (illustrative):

---
name: ci_autofix
description: Diagnose and fix CI failures with minimal diffs. Use when tests/lint/typecheck fail in CI or locally.
---

# CI Auto-fix workflow

## Goal
Make CI green with the smallest safe change.

## Steps
1) Reproduce locally using the canonical commands from AGENTS.md.
2) Identify the *first failing* test/lint error.
3) Patch minimally (no refactors unless required).
4) Re-run the same checks.
5) Summarize: root cause, fix, files changed, how verified.

## Guardrails
- Don’t change public APIs unless CI failure forces it.
- Prefer adding/adjusting tests over weakening assertions.

Once this exists, you stop spending tokens re-explaining CI behavior in every conversation.

Layer 3: MCP (tool-backed context instead of prompt dumps)

MCP (Model Context Protocol) is the standard way to connect an agent to external tools and context—documentation, dashboards, issue trackers, internal services—without pasting everything into the prompt.

Both OpenAI and Anthropic support MCP in their coding workflows.

Team sharing: check in `.mcp.json`

Claude Code supports a project-scoped .mcp.json at your repo root, designed to be checked into version control so everyone shares the same tool connections.

The point is not “more tools.” The point is less prompt bloat because tools can fetch the right info at the right time.

The hidden token killer: tool definition bloat

Anthropic’s advanced tool use writeup shows how bad this can get:

Loading dozens of MCP tool definitions up front can burn tens of thousands of tokens before any work begins
Their “Tool Search Tool” approach defers tool loading and can reduce that overhead dramatically while keeping the full library available

Design rule: treat tools like Skills:

Load a small “router/search” tool up front
Discover and load specific tools only when needed
Keep tool results concise and structured

MCP patterns that reduce tokens and retries

Search → fetch → decide
Don’t fetch full docs; fetch top hits, then open only what matters.
Return structured outputs
Prefer JSON payloads over long prose. Let the agent summarize.
Keep results out of context when possible
If a tool can write to a file (artifact), do that and return a pointer + small summary.
Cache at the tool layer
If a build matrix or service map changes weekly, cache it server-side.

Parallel capacity: speed only happens if your codebase can absorb concurrency

The Sora Android team described running multiple Codex sessions in parallel—playback, search, error handling, tests/refactors—like managing a team. Their key lesson: the bottleneck shifted from writing code to decisions, feedback, and integrating changes.

That’s the core constraint for “less tokens → faster delivery”:

you need parallel workers
but you also need low-friction integration

Make your repo “parallel-task friendly”

If you want many agents working at once, design for it:

1) Reduce hot files

Global registries and mega “index” files cause constant merge conflicts
Prefer local registration and convention-based wiring

2) Strengthen module boundaries

Organize around “things that change together,” not just technical layers
Publish stable interfaces; isolate feature work behind them

3) Make Definition of Done executable

Deterministic format/lint
Fast unit tests per module
Clear smoke tests

4) Provide a “golden path”

Agents are far better with examples than with abstract rules
Maintain 1–2 representative, end-to-end implementations per major feature type

5) Isolate work with branches/worktrees

Claude Code docs recommend git worktrees for running parallel sessions with full code isolation
The same principle applies for any multi-agent workflow: isolate, then merge

Automation: make the agents do the loops

If you want to actually realize parallel capacity, stop using humans as the retry engine.

Codex: non-interactive `codex exec` for CI and scripts

Codex supports non-interactive mode (codex exec) specifically for scripts and CI:

Run in pipelines
Emit JSONL events for automation
Use minimal permissions
Implement common patterns like CI auto-fix workflows that open PRs

This is the practical bridge from “agent assistant” to “agent automation.”

Claude: use CLI workflows and headless usage patterns

Claude Code docs show patterns like:

running parallel sessions (worktrees)
using Claude as a unix-style utility (pipe in/out)
adding Claude to verification (lint-like checks)

The important part isn’t which agent you use. It’s this principle:

Automation should make the agent responsible for re-running checks until green; humans should only review the final diff.

Put it together: a repo layout that scales

Here’s a structure that implements progressive disclosure end-to-end:

repo/
  AGENTS.md
  CLAUDE.md

  docs/
    architecture.md
    ownership.md
    runbooks/

  .agent/
    PLANS.md

  .codex/
    skills/
      ci_autofix/
        SKILL.md
        scripts/
      new_endpoint/
        SKILL.md
        references/

  .claude/
    commands/
      fix-ci.md
      review-pr.md

  .mcp.json

Each layer is discoverable, but nothing forces the agent to ingest it all at once.

A practical “Week 1” checklist

If you only do one week of work, do this:

Create root AGENTS.md and CLAUDE.md with:
- repo map
- commands
- Definition of Done
- “where to look next” pointers
Add 2–3 directory-level instruction files in the messiest areas
Create 3 high-ROI skills:
- CI auto-fix
- “add tests for bug fix”
- “new feature scaffold”
Add .mcp.json with the 1–2 MCP servers you actually need (docs/search/logs)
Standardize local commands so agents can self-verify:
- one-liner scripts or make targets

How to know it’s working (metrics)

Track these per week:

Tokens per merged PR (or per story point)
Median time from “start task” → “PR ready”
Retry rate (how often agents have to ask for missing context)
Merge conflict rate
Human review minutes per PR

You want token counts down without retry rate spiking and without merge conflicts rising.

Bottom line

If you believe “tokens take time,” then your job isn’t to beg models to be cheaper. Your job is to stop paying tokens for the same context over and over.

Progressive disclosure does that:

Instruction files align sessions fast
Skills turn repeated prompting into reusable workflow modules
MCP replaces context dumping with on-demand retrieval
Parallel-friendly repo design preserves throughput
Automation ensures humans aren’t the retry engine

That’s how you get faster delivery with fewer tokens while keeping parallel capacity in the same range.

Sources

OpenAI: How we used Codex to build Sora for Android in 28 days
https://openai.com/index/shipping-sora-for-android-with-codex/
OpenAI Codex docs: AGENTS.md guide
https://developers.openai.com/codex/guides/agents-md/
OpenAI Codex docs: Agent Skills (progressive disclosure)
https://developers.openai.com/codex/skills/
OpenAI Codex docs: MCP (Model Context Protocol)
https://developers.openai.com/codex/mcp/
OpenAI Codex docs: Non-interactive mode (codex exec)
https://developers.openai.com/codex/noninteractive
Anthropic: Claude Code: Best practices for agentic coding
https://www.anthropic.com/engineering/claude-code-best-practices
Anthropic docs: Manage Claude’s memory (CLAUDE.md hierarchy and imports)
https://code.claude.com/docs/en/memory
Anthropic docs: MCP in Claude Code (.mcp.json)
https://code.claude.com/docs/en/mcp
Anthropic: Introducing advanced tool use (Tool Search Tool, tool definition bloat)
https://www.anthropic.com/engineering/advanced-tool-use
Anthropic: Equipping agents for the real world with Agent Skills
https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills
Anthropic docs: Prompt caching
https://platform.claude.com/docs/en/build-with-claude/prompt-caching