AI Native Notes

Official References: Gemini CLI Overview · Project context with GEMINI.md · Token caching

The 1M-token headline is real — but strategy still matters

The official Gemini CLI docs and repo position 1M tokens as a major advantage. That is enough for broad, repo-level reasoning that smaller-context tools often have to approximate.

But the practical lesson is not "always dump the whole repo into one prompt." The real win is this:

do a broad architectural pass once
then do targeted follow-up turns on the risky files
keep GEMINI.md carrying stable instructions so you do not resend them every time

Context Efficiency: Why you still need to save tokens

Even with 1 million tokens, reading entire files unnecessarily (read_file) increases latency and cost. To use Gemini CLI like a pro, enforce surgical targeting:

Parallel searching: Use grep_search with conservative limits (total_max_matches) and narrow scopes (include_pattern) in parallel.
Read with context: When searching, request surrounding lines (context, before, after) so the agent does not waste an extra turn opening the file.
Read ranges: Use start_line and end_line to extract only what is needed from massive files.

Patterns that are genuinely good

1) Architecture mapping before editing

gemini "@src @content @package.json map this application's architecture, content pipeline, and likely maintenance risks"

2) Delegating to Sub-agents: Codebase Investigator

When the repository is too vast, delegate the mapping task to the codebase_investigator sub-agent. Instead of polluting your main chat session, you can ask, "Find the root cause of the ambiguous auth bug" or "Map the entire routing architecture." The sub-agent navigates the code autonomously and returns a synthesized architectural report, saving your precious context window.

3) Cross-cutting migrations

gemini "@src @package.json We plan to change our content model. Show every place likely affected: loaders, routes, UI, and localization. Group risk by area."

How to avoid wasting the context window

Prefer structured inclusion over vague prompts

Bad:

Read everything and tell me what matters.

Better:

@src @content @messages
Explain the content architecture, locale strategy, and where slug parity could drift.

Token caching changes the economics

Official docs note that token caching is available when you authenticate with a Gemini API key or Vertex AI. It is not currently available for OAuth-style Google account sign-in.

That means large-context workflows are often cheaper and smoother when you use:

GEMINI_API_KEY, or
Vertex AI project auth

Use /stats to inspect usage and cached-token savings during longer sessions.

The 1M Token Advantage — Working with Entire Codebases