Official References: Gemini CLI Overview · Project context with GEMINI.md · Token caching
The 1M-token headline is real — but strategy still matters
The official Gemini CLI docs and repo position 1M tokens as a major advantage. That is enough for broad, repo-level reasoning that smaller-context tools often have to approximate.
But the practical lesson is not "always dump the whole repo into one prompt." The real win is this:
- do a broad architectural pass once
- then do targeted follow-up turns on the risky files
- keep
GEMINI.mdcarrying stable instructions so you do not resend them every time
Context Efficiency: Why you still need to save tokens
Even with 1 million tokens, reading entire files unnecessarily (read_file) increases latency and cost. To use Gemini CLI like a pro, enforce surgical targeting:
- Parallel searching: Use
grep_searchwith conservative limits (total_max_matches) and narrow scopes (include_pattern) in parallel. - Read with context: When searching, request surrounding lines (
context,before,after) so the agent does not waste an extra turn opening the file. - Read ranges: Use
start_lineandend_lineto extract only what is needed from massive files.
Patterns that are genuinely good
1) Architecture mapping before editing
gemini "@src @content @package.json map this application's architecture, content pipeline, and likely maintenance risks"2) Delegating to Sub-agents: Codebase Investigator
When the repository is too vast, delegate the mapping task to the codebase_investigator sub-agent.
Instead of polluting your main chat session, you can ask, "Find the root cause of the ambiguous auth bug" or "Map the entire routing architecture." The sub-agent navigates the code autonomously and returns a synthesized architectural report, saving your precious context window.
3) Cross-cutting migrations
gemini "@src @package.json We plan to change our content model. Show every place likely affected: loaders, routes, UI, and localization. Group risk by area."How to avoid wasting the context window
Prefer structured inclusion over vague prompts
Bad:
Read everything and tell me what matters.Better:
@src @content @messages
Explain the content architecture, locale strategy, and where slug parity could drift.Token caching changes the economics
Official docs note that token caching is available when you authenticate with a Gemini API key or Vertex AI. It is not currently available for OAuth-style Google account sign-in.
That means large-context workflows are often cheaper and smoother when you use:
GEMINI_API_KEY, or- Vertex AI project auth
Use /stats to inspect usage and cached-token savings during longer sessions.
Recommended next reads
- Deep dive into delegation: Sub-agents and Skills
- Gemini Extensions & MCP
- Headless Automation