pi-ghost-autocomplete

v0.3.2

Published

7 days ago

Inline ghost-text autocomplete extension for the Pi coding agent (pi.dev). Cloud and local LLM providers, zero-flicker rendering, no popup interference.

0High
0Medium
0Low

ngsoftware

pi pi.dev pi-package pi-extension autocomplete ghost-text tui llm ollama groq

pi-ghost-autocomplete

Inline ghost-text autocomplete for the Pi coding agent (@earendil-works/pi-coding-agent).

While you type at the Pi prompt, an LLM predicts the most likely continuation of the current line and renders it as dim grey "ghost text" right of the cursor. Press Right Arrow to accept. The cursor never moves until you do, the existing slash-command popup keeps Tab, and a slow or unreachable backend silently disables itself rather than spamming the chat.

Four provider modes ship in the same package:

| Mode | Backend | Default debounce | Default model | | --- | --- | --- | --- | | cloud | Groq / Cerebras / OpenRouter via @earendil-works/pi-ai | 150 ms | groq/llama-3.1-8b-instant | | local | Ollama / vLLM (OpenAI-compatible) | 400 ms | qwen2.5-coder:1.5b on http://localhost:11434 | | openai-compat | Any OpenAI-compatible REST API (e.g. Mercury Edit) — also used as the default provider | 150 ms | mercury-edit-2 on https://api.inceptionlabs.ai/v1 | | race | Multiple providers in parallel; first valid response wins | 400 ms | per-member defaults |

Getting started

1. Get a Mercury Edit 2 API key

The default provider is Mercury Edit 2 by Inception Labs — a diffusion-based model optimised for low-latency code completions. Sign up at platform.inceptionlabs.ai to get an API key.

Set the INCEPTION_API_KEY environment variable so the extension can reach the API. Add it to your shell profile (e.g. ~/.bashrc, ~/.zshrc, or PowerShell $PROFILE) for persistence:

# ~/.bashrc or ~/.zshrc
export INCEPTION_API_KEY="your-key"

# PowerShell $PROFILE
$env:INCEPTION_API_KEY = "your-key"

Or pass it inline for a single session:

INCEPTION_API_KEY="your-key" pi

$env:INCEPTION_API_KEY = "your-key"; pi

To use a different provider instead (Groq, Cerebras, a local model, …), see Configuration.

2. Install the extension

pi install npm:pi-ghost-autocomplete

This registers the extension globally. Pi picks it up automatically on the next launch. To install it only for a single project, add -l:

pi install npm:pi-ghost-autocomplete -l   # writes to .pi/ in the current directory

3. Launch Pi and try it

pi

Start typing at the prompt. A dim ghost text should appear to the right of your cursor within ~150 ms. Press Right Arrow to accept it, or keep typing to dismiss it.

Run /ghost to confirm the extension loaded and see the active configuration:

Pi Ghost: enabled
  minChars=3
  maxLineLength=240
  mode=openai-compat
  provider=mercury-edit
  baseUrl=https://api.inceptionlabs.ai/v1
  model=mercury-edit-2
  apiKey=4e24***
  debounceMs=150
  maxTokens=64
  maxRecentMessages=4

4. (Optional) Load the extension without installing

Pass -e to load the package for a single session without a permanent install:

pi -e npm:pi-ghost-autocomplete

Install

npm install pi-ghost-autocomplete

Configuration

All knobs are environment variables — no extra config file needed:

| Variable | Default | Effect | | --- | --- | --- | | PI_GHOST_DISABLED | unset | Set to 1 to disable on launch | | PI_GHOST_PROVIDER | unset (Mercury Edit 2) | cloud or local — set explicitly to switch away from Mercury Edit 2 (ignored when PI_GHOST_RACE_PROVIDERS is set) | | PI_GHOST_PROVIDER_NAME | groq | Cloud only (PI_GHOST_PROVIDER=cloud): any KnownProvider from pi-ai (groq, cerebras, openrouter, …) | | PI_GHOST_MODEL | provider-specific | Model id, e.g. llama-3.1-8b-instant, qwen2.5-coder:1.5b | | PI_GHOST_API_KEY | env var lookup | Cloud only: explicit API key (else pi-ai env lookup is used) | | PI_GHOST_BASE_URL | http://localhost:11434 | Local only: Ollama / vLLM base URL | | PI_GHOST_DEBOUNCE_MS | 150 (cloud) / 400 (local) | Inputs are debounced this long before a request is sent | | PI_GHOST_MAX_TOKENS | 64 (cloud) / 48 (local) | Cap on completion tokens | | PI_GHOST_RECENT | 4 (cloud) / 2 (local) | Number of recent user/assistant messages bundled as context | | PI_GHOST_MIN_CHARS | 3 | Minimum draft length before a prediction fires | | PI_GHOST_MAX_LINE | 240 | Predictions skipped if the current line exceeds this length | | PI_GHOST_TEMPERATURE | 0 | Sampling temperature [0, 2]. 0 is best-effort deterministic; see note below | | PI_GHOST_MERCURY_PARALLEL_N | 1 | Mercury FIM only: fire N parallel candidate completions for cycling. Requires PI_GHOST_TEMPERATURE > 0 (downgrades silently to 1 at temp=0). Costs scale linearly with N | | PI_GHOST_CONTEXT_MAX_SCAN | 200 | M6b: cap on the candidate pool scanned for relevance ranking | | PI_GHOST_CONTEXT_MAX_SELECTED | 8 | M6b: max number of selected context entries | | PI_GHOST_CONTEXT_MAX_CHARS | 4000 | M6b: total char budget for the selected context window | | PI_GHOST_CONTEXT_BM25_WEIGHT | 0.3 | M6b: weight of BM25 content score vs recency (0=pure recency, 1=pure relevance) | | PI_GHOST_CONTEXT_TOOL_MAX | 3 | M6c: max number of tool entries to include in selected context (0 disables tools entirely) | | PI_GHOST_ALLOW_INSECURE | unset | Set to 1 to allow http:// provider base URLs on non-localhost hosts. Off by default — API keys + prompts would be sent in clear | | PI_GHOST_PROFILE | unset | Set to off to disable persistent profile capture + reranking. Default: enabled. Profile is local-only — see Profile below | | PI_GHOST_PROFILE_DECAY_DAYS | 90 | Days before an unused profile record's weight halves during compaction | | PI_GHOST_PROVISIONAL | unset | Set to off to disable the 150 ms provisional fallback. Default: enabled | | PI_GHOST_PROVISIONAL_PREFER | unset | Set to 1/true/on/yes to bias swap decisions toward the already-displayed provisional. Raises the LLM-vs-provisional swap margin from 0.1 to 0.3, reducing perceived flicker at the cost of recall. Bench reports a flicker=X% rate so this knob can be evaluated | | PI_GHOST_MIN_CONFIDENCE | 0.55 | M3c.1: minimum reranker score for the top candidate to be shown. Range [0, 1]. Lower = more lenient, more ghosts shown | | PI_GHOST_RERANK_WEIGHTS | 0.45,0.25,0.15,0.15 | M3c: reranker weights as logprob,profileBias,lengthPrior,agreement. Negative values clamped to 0 | | PI_GHOST_EVAL_CAPTURE | unset | Set to 1 to capture committed prompts to .pi/ghost-autocomplete/eval.jsonl for offline replay (pi-ghost-replay). Sensitive: contains raw prompt text. Off by default | | PI_GHOST_REGRET_WINDOW_MS | 3000 | After accept, window in ms during which a >50% deletion of inserted text flags the record acceptRegret: true | | PI_GHOST_SHORT_HOVER_MS | 800 | Enhancement #3: a ghost dismissed within this many ms is flagged shortHoverDismiss: true and triggers a soft trie penalty (magnitude 0.25). Tunes the bench short-hover rate | | PI_GHOST_METRICS | unset | Set to 1 to write metrics to .pi/ghost-autocomplete/metrics.jsonl | | PI_GHOST_DEBUG_LOG | unset | Set to 1 to write debug records to .pi/ghost-autocomplete/debug.jsonl |

Note on `temperature=0`

temperature=0 is the default and asks the provider for greedy decoding so the same prefix produces the same completion. This is best-effort, not guaranteed: vendors batch requests across users and kernel scheduling differences can produce different greedy paths from one call to the next. The session's in-memory completion cache is the real determinism guarantee — retyping the same prefix in the same context will reuse the previous ghost rather than re-querying. Raise temperature only if you want cache-miss diversity (e.g. PI_GHOST_TEMPERATURE=0.5).

Run /ghost inside Pi to see the effective config; /ghost on and /ghost off toggle the feature for the current session.

Race mode

Race mode fires requests to multiple providers simultaneously and uses the first valid response. This reduces perceived latency when one provider is slow or rate-limited.

# Race Groq against Cerebras
PI_GHOST_RACE_PROVIDERS=groq,cerebras pi ...

# Race Groq, Cerebras, and Mercury Edit (Inception Labs)
PI_GHOST_RACE_PROVIDERS=groq,cerebras,mercury-edit \
  INCEPTION_API_KEY=your-key pi ...

Race-specific environment variables:

| Variable | Default | Effect | | --- | --- | --- | | PI_GHOST_RACE_PROVIDERS | unset | Comma-separated list of providers to race (groq, cerebras, local, mercury-edit, mercury-edit-2, or any KnownProvider) | | PI_GHOST_RACE_PAUSE_MS | 400 | Milliseconds before the second+ provider is unblocked | | PI_GHOST_RACE_COOLDOWN_MS | 1500 | Quiet period after a winner before the next race starts | | PI_GHOST_RACE_CUTOFF_MS | 700 | Maximum time to wait for any provider before giving up | | PI_GHOST_RACE_<NAME>_MODEL | provider default | Override the model for a specific race member (e.g. PI_GHOST_RACE_GROQ_MODEL) | | PI_GHOST_RACE_<NAME>_API_KEY | env var lookup | Override the API key for a specific race member | | PI_GHOST_RACE_<NAME>_BASE_URL | provider default | Override the base URL for a specific race member | | INCEPTION_API_KEY | unset | API key for Mercury Edit; used when mercury-edit or mercury-edit-2 appears in the race list | | INCEPTION_BASE_URL | https://api.inceptionlabs.ai/v1 | Base URL override for Mercury Edit |

<NAME> in the per-member variables is the provider name uppercased with non-alphanumeric characters replaced by _ (e.g. mercury-edit → MERCURY_EDIT).

Metrics and debug logging

Enable structured JSONL logs for acceptance tracking and performance analysis:

PI_GHOST_METRICS=1 pi ...        # writes .pi/ghost-autocomplete/metrics.jsonl
PI_GHOST_DEBUG_LOG=1 pi ...      # writes .pi/ghost-autocomplete/debug.jsonl

Metrics records include: timestamp, request id, provider latencies, completion mode (deterministic or llm), result (produced, empty, rejected, aborted, error), rejection reason, completion length, whether the ghost was shown, and acceptance outcome (accepted, dismissed, stale, expired). Raw prompt text and LLM output are never written to the metrics file. Both files rotate at 10 MB.

User profile

The extension keeps a long-lived JSONL store at .pi/ghost-autocomplete/profile.jsonl to learn slash-command, path, and trigram frequencies across sessions. The profile feeds three local-only signals:

Slash fast-path. When the active line starts with /, the editor serves the most-typed extension as a ghost before the LLM is queried (mode trie, provider profile-slash). Press Right to accept.
Profile-bias rerank (M3c hook). A profileBias score plugs into the multi-candidate reranker. Wired but only takes effect when the provider returns >1 candidate.
Bounded path boost. Persistent path frequency adds a small capped boost to deterministic path/fuzzy ranking, capped so it cannot beat a fresh exact path-prefix match.

What's stored: paths (src/foo.ts), slash commands (/review), trigrams from committed prompts, and acceptance records keyed by the SHA-256 hash of the prefix. Raw prefixes and raw completions are never written. Token tuples are tokenized and capped at 4 for prefix tails / 8 for completions.

Sanitization runs before persistence: messages with API keys (sk-…, ghp_…, JWT, etc.) or high-entropy tokens (Shannon > 4.0 bits/char, length ≥ 20) are dropped entirely. The check is shared with the session trie (src/trie-sanitize.ts).

Rotation: when the JSONL exceeds 2 MB, the store compacts in place — sums weights per (kind, key), halves records older than 90 days (configurable via PI_GHOST_PROFILE_DECAY_DAYS), drops weights below 0.1. The original file is preserved as profile.jsonl.1.

Set PI_GHOST_PROFILE=off to disable capture + use entirely.

Benchmark CLI

The pi-ghost-bench binary reads a metrics JSONL file and reports p50/p95 latency, acceptance rates, and whether PRD thresholds pass:

pi-ghost-bench                                    # reads .pi/ghost-autocomplete/metrics.jsonl
pi-ghost-bench path/to/metrics.jsonl              # explicit path
pi-ghost-bench --json                             # machine-readable output

Exit code is 0 when all thresholds pass, 1 when any fail, 2 on I/O error or when the file contains no records.

How it works

The extension registers a GhostEditor via ctx.ui.setEditorComponent. GhostEditor extends Pi's CustomEditor so all built-in app keybindings (escape, ctrl+d, model switching, slash-command popup) keep working.
After every text change, a debounced request is sent through pi-ai's complete() with an AbortSignal that is canceled on the next keystroke.
Before calling the LLM, the editor checks the path index for a deterministic match. If the current token is a file path prefix or a fuzzy filename with a strong enough score, the deterministic result is shown immediately and the LLM call is suppressed.
The path index is built asynchronously at session start via git ls-files from the repository root, capped at 20 000 files. High-confidence basename-prefix matches (score ≥85) preempt the LLM unconditionally; lower- confidence fuzzy matches only activate when an action word (open, edit, read, show, view, find, check, look, load, import, include, see, run) precedes the token.
Recent agent messages come from ctx.sessionManager.getBranch() — only user/assistant text, no tool calls. Both providers truncate to a small, fixed payload so cloud requests don't ship the whole file.
Path mentions are extracted from recent conversation messages (slashed paths, backticked path-like tokens, and bare filenames with known code extensions). These give a ranking boost (+10 exact, +7 basename) to deterministic path completions, so recently discussed files surface higher.
For fuzzy filename matches where the typed token is not a prefix of the match, the editor enters replace mode: accepting the ghost replaces the typed token with the full matched path. A → prefix is rendered before the ghost text to signal this behaviour. Prefix-suffix matches still append as usual.
render() searches for CURSOR_MARKER in the lines returned by super.render() and injects ANSI dim text directly after it. Pi's differential renderer redraws only the changed cells — no flicker, no full repaint.
Right Arrow is intercepted only if a ghost is shown, the popup autocomplete is not open, and the cursor sits at the end of the text. We use matchesKey(data, "right") from @earendil-works/pi-tui, which handles legacy CSI/SS3 sequences and the Kitty keyboard protocol uniformly, so kitty / WezTerm / iTerm2 (Kitty-mode) all work. Otherwise the input passes through to super.handleInput.
Ctrl+Right accepts the next word of the ghost (PRD §M3f). A "word" is a run of \w+ characters, or a single non-word non-whitespace char with optional leading whitespace — VS Code's Cursor Word Right semantics. Repeated presses peel the ghost off one token at a time; the final press finalises the metric as accepted-partial.
Alt+] / Alt+[ cycle through ranked candidates when the provider returns more than one (e.g. openai-compat with PI_GHOST_MERCURY_PARALLEL_N>1 and PI_GHOST_TEMPERATURE>0). Single-candidate sources (cache, trie, speculative warm, deterministic path, default LLM fire) treat both keys as no-ops. The selected candidate's index is logged as candidateRank on accept along with cycledCount.
Visible-column math uses string-width + strip-ansi. Ghost text is clipped grapheme-aware to the remaining columns on the current visual line so wide CJK characters and emoji never overflow the row.

Development

npm install
npm run typecheck   # tsc --noEmit against pi-tui / pi-coding-agent v0.74
npm test            # vitest, ~170 unit tests
npm run build       # emits dist/

CI runs all three on Node 20 and 22 (.github/workflows/ci.yml).

Limitations

No native Ollama provider in pi-ai; we configure an openai-compat model pointed at Ollama's /v1 endpoint. Any OpenAI-compatible local server works the same way.
Pin to ^0.74 of pi-tui / pi-coding-agent / pi-ai / pi-agent-core (all under @earendil-works). The relevant interfaces are pre-1.0; expect occasional follow-up bumps.
Predictions are shown only at the end of the buffer to avoid splitting lines mid-stream.
The path index covers only tracked Git files. Untracked files are not indexed in v1.
Three-provider racing may create elevated cost or rate-limit pressure at high typing speeds.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

pi-ghost-autocomplete

Getting started

1. Get a Mercury Edit 2 API key

2. Install the extension

3. Launch Pi and try it

4. (Optional) Load the extension without installing

Install

Configuration

Note on temperature=0

Race mode

Metrics and debug logging

User profile

Benchmark CLI

How it works

Development

Limitations

License

Note on `temperature=0`