pi-ghost-autocomplete
v0.3.2
Published
Inline ghost-text autocomplete extension for the Pi coding agent (pi.dev). Cloud and local LLM providers, zero-flicker rendering, no popup interference.
Maintainers
Readme
pi-ghost-autocomplete
Inline ghost-text autocomplete for the Pi coding agent (@earendil-works/pi-coding-agent).
While you type at the Pi prompt, an LLM predicts the most likely continuation of the current line and renders it as dim grey "ghost text" right of the cursor. Press Right Arrow to accept. The cursor never moves until you do, the existing slash-command popup keeps Tab, and a slow or unreachable backend silently disables itself rather than spamming the chat.
Four provider modes ship in the same package:
| Mode | Backend | Default debounce | Default model |
| --- | --- | --- | --- |
| cloud | Groq / Cerebras / OpenRouter via @earendil-works/pi-ai | 150 ms | groq/llama-3.1-8b-instant |
| local | Ollama / vLLM (OpenAI-compatible) | 400 ms | qwen2.5-coder:1.5b on http://localhost:11434 |
| openai-compat | Any OpenAI-compatible REST API (e.g. Mercury Edit) — also used as the default provider | 150 ms | mercury-edit-2 on https://api.inceptionlabs.ai/v1 |
| race | Multiple providers in parallel; first valid response wins | 400 ms | per-member defaults |
Getting started
1. Get a Mercury Edit 2 API key
The default provider is Mercury Edit 2 by Inception Labs — a diffusion-based model optimised for low-latency code completions. Sign up at platform.inceptionlabs.ai to get an API key.
Set the INCEPTION_API_KEY environment variable so the extension can reach the API. Add it to your shell profile (e.g. ~/.bashrc, ~/.zshrc, or PowerShell $PROFILE) for persistence:
# ~/.bashrc or ~/.zshrc
export INCEPTION_API_KEY="your-key"# PowerShell $PROFILE
$env:INCEPTION_API_KEY = "your-key"Or pass it inline for a single session:
INCEPTION_API_KEY="your-key" pi$env:INCEPTION_API_KEY = "your-key"; piTo use a different provider instead (Groq, Cerebras, a local model, …), see Configuration.
2. Install the extension
pi install npm:pi-ghost-autocompleteThis registers the extension globally. Pi picks it up automatically on the next launch. To install it only for a single project, add -l:
pi install npm:pi-ghost-autocomplete -l # writes to .pi/ in the current directory3. Launch Pi and try it
piStart typing at the prompt. A dim ghost text should appear to the right of your cursor within ~150 ms. Press Right Arrow to accept it, or keep typing to dismiss it.
Run /ghost to confirm the extension loaded and see the active configuration:
Pi Ghost: enabled
minChars=3
maxLineLength=240
mode=openai-compat
provider=mercury-edit
baseUrl=https://api.inceptionlabs.ai/v1
model=mercury-edit-2
apiKey=4e24***
debounceMs=150
maxTokens=64
maxRecentMessages=44. (Optional) Load the extension without installing
Pass -e to load the package for a single session without a permanent install:
pi -e npm:pi-ghost-autocompleteInstall
npm install pi-ghost-autocompleteConfiguration
All knobs are environment variables — no extra config file needed:
| Variable | Default | Effect |
| --- | --- | --- |
| PI_GHOST_DISABLED | unset | Set to 1 to disable on launch |
| PI_GHOST_PROVIDER | unset (Mercury Edit 2) | cloud or local — set explicitly to switch away from Mercury Edit 2 (ignored when PI_GHOST_RACE_PROVIDERS is set) |
| PI_GHOST_PROVIDER_NAME | groq | Cloud only (PI_GHOST_PROVIDER=cloud): any KnownProvider from pi-ai (groq, cerebras, openrouter, …) |
| PI_GHOST_MODEL | provider-specific | Model id, e.g. llama-3.1-8b-instant, qwen2.5-coder:1.5b |
| PI_GHOST_API_KEY | env var lookup | Cloud only: explicit API key (else pi-ai env lookup is used) |
| PI_GHOST_BASE_URL | http://localhost:11434 | Local only: Ollama / vLLM base URL |
| PI_GHOST_DEBOUNCE_MS | 150 (cloud) / 400 (local) | Inputs are debounced this long before a request is sent |
| PI_GHOST_MAX_TOKENS | 64 (cloud) / 48 (local) | Cap on completion tokens |
| PI_GHOST_RECENT | 4 (cloud) / 2 (local) | Number of recent user/assistant messages bundled as context |
| PI_GHOST_MIN_CHARS | 3 | Minimum draft length before a prediction fires |
| PI_GHOST_MAX_LINE | 240 | Predictions skipped if the current line exceeds this length |
| PI_GHOST_TEMPERATURE | 0 | Sampling temperature [0, 2]. 0 is best-effort deterministic; see note below |
| PI_GHOST_MERCURY_PARALLEL_N | 1 | Mercury FIM only: fire N parallel candidate completions for cycling. Requires PI_GHOST_TEMPERATURE > 0 (downgrades silently to 1 at temp=0). Costs scale linearly with N |
| PI_GHOST_CONTEXT_MAX_SCAN | 200 | M6b: cap on the candidate pool scanned for relevance ranking |
| PI_GHOST_CONTEXT_MAX_SELECTED | 8 | M6b: max number of selected context entries |
| PI_GHOST_CONTEXT_MAX_CHARS | 4000 | M6b: total char budget for the selected context window |
| PI_GHOST_CONTEXT_BM25_WEIGHT | 0.3 | M6b: weight of BM25 content score vs recency (0=pure recency, 1=pure relevance) |
| PI_GHOST_CONTEXT_TOOL_MAX | 3 | M6c: max number of tool entries to include in selected context (0 disables tools entirely) |
| PI_GHOST_ALLOW_INSECURE | unset | Set to 1 to allow http:// provider base URLs on non-localhost hosts. Off by default — API keys + prompts would be sent in clear |
| PI_GHOST_PROFILE | unset | Set to off to disable persistent profile capture + reranking. Default: enabled. Profile is local-only — see Profile below |
| PI_GHOST_PROFILE_DECAY_DAYS | 90 | Days before an unused profile record's weight halves during compaction |
| PI_GHOST_PROVISIONAL | unset | Set to off to disable the 150 ms provisional fallback. Default: enabled |
| PI_GHOST_PROVISIONAL_PREFER | unset | Set to 1/true/on/yes to bias swap decisions toward the already-displayed provisional. Raises the LLM-vs-provisional swap margin from 0.1 to 0.3, reducing perceived flicker at the cost of recall. Bench reports a flicker=X% rate so this knob can be evaluated |
| PI_GHOST_MIN_CONFIDENCE | 0.55 | M3c.1: minimum reranker score for the top candidate to be shown. Range [0, 1]. Lower = more lenient, more ghosts shown |
| PI_GHOST_RERANK_WEIGHTS | 0.45,0.25,0.15,0.15 | M3c: reranker weights as logprob,profileBias,lengthPrior,agreement. Negative values clamped to 0 |
| PI_GHOST_EVAL_CAPTURE | unset | Set to 1 to capture committed prompts to .pi/ghost-autocomplete/eval.jsonl for offline replay (pi-ghost-replay). Sensitive: contains raw prompt text. Off by default |
| PI_GHOST_REGRET_WINDOW_MS | 3000 | After accept, window in ms during which a >50% deletion of inserted text flags the record acceptRegret: true |
| PI_GHOST_SHORT_HOVER_MS | 800 | Enhancement #3: a ghost dismissed within this many ms is flagged shortHoverDismiss: true and triggers a soft trie penalty (magnitude 0.25). Tunes the bench short-hover rate |
| PI_GHOST_METRICS | unset | Set to 1 to write metrics to .pi/ghost-autocomplete/metrics.jsonl |
| PI_GHOST_DEBUG_LOG | unset | Set to 1 to write debug records to .pi/ghost-autocomplete/debug.jsonl |
Note on temperature=0
temperature=0 is the default and asks the provider for greedy decoding so
the same prefix produces the same completion. This is best-effort, not
guaranteed: vendors batch requests across users and kernel scheduling
differences can produce different greedy paths from one call to the next.
The session's in-memory completion cache is the real determinism guarantee
— retyping the same prefix in the same context will reuse the previous
ghost rather than re-querying. Raise temperature only if you want
cache-miss diversity (e.g. PI_GHOST_TEMPERATURE=0.5).
Run /ghost inside Pi to see the effective config; /ghost on and
/ghost off toggle the feature for the current session.
Race mode
Race mode fires requests to multiple providers simultaneously and uses the first valid response. This reduces perceived latency when one provider is slow or rate-limited.
# Race Groq against Cerebras
PI_GHOST_RACE_PROVIDERS=groq,cerebras pi ...
# Race Groq, Cerebras, and Mercury Edit (Inception Labs)
PI_GHOST_RACE_PROVIDERS=groq,cerebras,mercury-edit \
INCEPTION_API_KEY=your-key pi ...Race-specific environment variables:
| Variable | Default | Effect |
| --- | --- | --- |
| PI_GHOST_RACE_PROVIDERS | unset | Comma-separated list of providers to race (groq, cerebras, local, mercury-edit, mercury-edit-2, or any KnownProvider) |
| PI_GHOST_RACE_PAUSE_MS | 400 | Milliseconds before the second+ provider is unblocked |
| PI_GHOST_RACE_COOLDOWN_MS | 1500 | Quiet period after a winner before the next race starts |
| PI_GHOST_RACE_CUTOFF_MS | 700 | Maximum time to wait for any provider before giving up |
| PI_GHOST_RACE_<NAME>_MODEL | provider default | Override the model for a specific race member (e.g. PI_GHOST_RACE_GROQ_MODEL) |
| PI_GHOST_RACE_<NAME>_API_KEY | env var lookup | Override the API key for a specific race member |
| PI_GHOST_RACE_<NAME>_BASE_URL | provider default | Override the base URL for a specific race member |
| INCEPTION_API_KEY | unset | API key for Mercury Edit; used when mercury-edit or mercury-edit-2 appears in the race list |
| INCEPTION_BASE_URL | https://api.inceptionlabs.ai/v1 | Base URL override for Mercury Edit |
<NAME> in the per-member variables is the provider name uppercased with
non-alphanumeric characters replaced by _ (e.g. mercury-edit →
MERCURY_EDIT).
Metrics and debug logging
Enable structured JSONL logs for acceptance tracking and performance analysis:
PI_GHOST_METRICS=1 pi ... # writes .pi/ghost-autocomplete/metrics.jsonl
PI_GHOST_DEBUG_LOG=1 pi ... # writes .pi/ghost-autocomplete/debug.jsonlMetrics records include: timestamp, request id, provider latencies, completion
mode (deterministic or llm), result (produced, empty, rejected,
aborted, error), rejection reason, completion length, whether the ghost was
shown, and acceptance outcome (accepted, dismissed, stale, expired).
Raw prompt text and LLM output are never written to the metrics file. Both
files rotate at 10 MB.
User profile
The extension keeps a long-lived JSONL store at
.pi/ghost-autocomplete/profile.jsonl to learn slash-command, path, and
trigram frequencies across sessions. The profile feeds three local-only
signals:
- Slash fast-path. When the active line starts with
/, the editor serves the most-typed extension as a ghost before the LLM is queried (modetrie, providerprofile-slash). Press Right to accept. - Profile-bias rerank (M3c hook). A
profileBiasscore plugs into the multi-candidate reranker. Wired but only takes effect when the provider returns >1 candidate. - Bounded path boost. Persistent path frequency adds a small capped boost to deterministic path/fuzzy ranking, capped so it cannot beat a fresh exact path-prefix match.
What's stored: paths (src/foo.ts), slash commands (/review),
trigrams from committed prompts, and acceptance records keyed by the
SHA-256 hash of the prefix. Raw prefixes and raw completions are
never written. Token tuples are tokenized and capped at 4 for
prefix tails / 8 for completions.
Sanitization runs before persistence: messages with API keys (sk-…,
ghp_…, JWT, etc.) or high-entropy tokens (Shannon > 4.0 bits/char,
length ≥ 20) are dropped entirely. The check is shared with the session
trie (src/trie-sanitize.ts).
Rotation: when the JSONL exceeds 2 MB, the store compacts in place —
sums weights per (kind, key), halves records older than 90 days
(configurable via PI_GHOST_PROFILE_DECAY_DAYS), drops weights below
0.1. The original file is preserved as profile.jsonl.1.
Set PI_GHOST_PROFILE=off to disable capture + use entirely.
Benchmark CLI
The pi-ghost-bench binary reads a metrics JSONL file and reports p50/p95
latency, acceptance rates, and whether PRD thresholds pass:
pi-ghost-bench # reads .pi/ghost-autocomplete/metrics.jsonl
pi-ghost-bench path/to/metrics.jsonl # explicit path
pi-ghost-bench --json # machine-readable outputExit code is 0 when all thresholds pass, 1 when any fail, 2 on I/O error or when the file contains no records.
How it works
- The extension registers a
GhostEditorviactx.ui.setEditorComponent.GhostEditorextends Pi'sCustomEditorso all built-in app keybindings (escape, ctrl+d, model switching, slash-command popup) keep working. - After every text change, a debounced request is sent through
pi-ai'scomplete()with anAbortSignalthat is canceled on the next keystroke. - Before calling the LLM, the editor checks the path index for a deterministic match. If the current token is a file path prefix or a fuzzy filename with a strong enough score, the deterministic result is shown immediately and the LLM call is suppressed.
- The path index is built asynchronously at session start via
git ls-filesfrom the repository root, capped at 20 000 files. High-confidence basename-prefix matches (score ≥85) preempt the LLM unconditionally; lower- confidence fuzzy matches only activate when an action word (open,edit,read,show,view,find,check,look,load,import,include,see,run) precedes the token. - Recent agent messages come from
ctx.sessionManager.getBranch()— only user/assistant text, no tool calls. Both providers truncate to a small, fixed payload so cloud requests don't ship the whole file. - Path mentions are extracted from recent conversation messages (slashed paths, backticked path-like tokens, and bare filenames with known code extensions). These give a ranking boost (+10 exact, +7 basename) to deterministic path completions, so recently discussed files surface higher.
- For fuzzy filename matches where the typed token is not a prefix of the
match, the editor enters replace mode: accepting the ghost replaces
the typed token with the full matched path. A
→prefix is rendered before the ghost text to signal this behaviour. Prefix-suffix matches still append as usual. render()searches forCURSOR_MARKERin the lines returned bysuper.render()and injects ANSI dim text directly after it. Pi's differential renderer redraws only the changed cells — no flicker, no full repaint.- Right Arrow is intercepted only if a ghost is shown, the popup
autocomplete is not open, and the cursor sits at the end of the text.
We use
matchesKey(data, "right")from@earendil-works/pi-tui, which handles legacy CSI/SS3 sequences and the Kitty keyboard protocol uniformly, so kitty / WezTerm / iTerm2 (Kitty-mode) all work. Otherwise the input passes through tosuper.handleInput. - Ctrl+Right accepts the next word of the ghost (PRD §M3f). A "word"
is a run of
\w+characters, or a single non-word non-whitespace char with optional leading whitespace — VS Code's Cursor Word Right semantics. Repeated presses peel the ghost off one token at a time; the final press finalises the metric asaccepted-partial. - Alt+] / Alt+[ cycle through ranked candidates when the provider
returns more than one (e.g.
openai-compatwithPI_GHOST_MERCURY_PARALLEL_N>1andPI_GHOST_TEMPERATURE>0). Single-candidate sources (cache, trie, speculative warm, deterministic path, default LLM fire) treat both keys as no-ops. The selected candidate's index is logged ascandidateRankon accept along withcycledCount. - Visible-column math uses
string-width+strip-ansi. Ghost text is clipped grapheme-aware to the remaining columns on the current visual line so wide CJK characters and emoji never overflow the row.
Development
npm install
npm run typecheck # tsc --noEmit against pi-tui / pi-coding-agent v0.74
npm test # vitest, ~170 unit tests
npm run build # emits dist/CI runs all three on Node 20 and 22 (.github/workflows/ci.yml).
Limitations
- No native Ollama provider in
pi-ai; we configure anopenai-compatmodel pointed at Ollama's/v1endpoint. Any OpenAI-compatible local server works the same way. - Pin to
^0.74ofpi-tui/pi-coding-agent/pi-ai/pi-agent-core(all under@earendil-works). The relevant interfaces are pre-1.0; expect occasional follow-up bumps. - Predictions are shown only at the end of the buffer to avoid splitting lines mid-stream.
- The path index covers only tracked Git files. Untracked files are not indexed in v1.
- Three-provider racing may create elevated cost or rate-limit pressure at high typing speeds.
License
MIT
