npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

pi-ghost-autocomplete

v0.3.2

Published

Inline ghost-text autocomplete extension for the Pi coding agent (pi.dev). Cloud and local LLM providers, zero-flicker rendering, no popup interference.

Readme

pi-ghost-autocomplete

Inline ghost-text autocomplete for the Pi coding agent (@earendil-works/pi-coding-agent).

While you type at the Pi prompt, an LLM predicts the most likely continuation of the current line and renders it as dim grey "ghost text" right of the cursor. Press Right Arrow to accept. The cursor never moves until you do, the existing slash-command popup keeps Tab, and a slow or unreachable backend silently disables itself rather than spamming the chat.

Four provider modes ship in the same package:

| Mode | Backend | Default debounce | Default model | | --- | --- | --- | --- | | cloud | Groq / Cerebras / OpenRouter via @earendil-works/pi-ai | 150 ms | groq/llama-3.1-8b-instant | | local | Ollama / vLLM (OpenAI-compatible) | 400 ms | qwen2.5-coder:1.5b on http://localhost:11434 | | openai-compat | Any OpenAI-compatible REST API (e.g. Mercury Edit) — also used as the default provider | 150 ms | mercury-edit-2 on https://api.inceptionlabs.ai/v1 | | race | Multiple providers in parallel; first valid response wins | 400 ms | per-member defaults |

Getting started

1. Get a Mercury Edit 2 API key

The default provider is Mercury Edit 2 by Inception Labs — a diffusion-based model optimised for low-latency code completions. Sign up at platform.inceptionlabs.ai to get an API key.

Set the INCEPTION_API_KEY environment variable so the extension can reach the API. Add it to your shell profile (e.g. ~/.bashrc, ~/.zshrc, or PowerShell $PROFILE) for persistence:

# ~/.bashrc or ~/.zshrc
export INCEPTION_API_KEY="your-key"
# PowerShell $PROFILE
$env:INCEPTION_API_KEY = "your-key"

Or pass it inline for a single session:

INCEPTION_API_KEY="your-key" pi
$env:INCEPTION_API_KEY = "your-key"; pi

To use a different provider instead (Groq, Cerebras, a local model, …), see Configuration.

2. Install the extension

pi install npm:pi-ghost-autocomplete

This registers the extension globally. Pi picks it up automatically on the next launch. To install it only for a single project, add -l:

pi install npm:pi-ghost-autocomplete -l   # writes to .pi/ in the current directory

3. Launch Pi and try it

pi

Start typing at the prompt. A dim ghost text should appear to the right of your cursor within ~150 ms. Press Right Arrow to accept it, or keep typing to dismiss it.

Run /ghost to confirm the extension loaded and see the active configuration:

Pi Ghost: enabled
  minChars=3
  maxLineLength=240
  mode=openai-compat
  provider=mercury-edit
  baseUrl=https://api.inceptionlabs.ai/v1
  model=mercury-edit-2
  apiKey=4e24***
  debounceMs=150
  maxTokens=64
  maxRecentMessages=4

4. (Optional) Load the extension without installing

Pass -e to load the package for a single session without a permanent install:

pi -e npm:pi-ghost-autocomplete

Install

npm install pi-ghost-autocomplete

Configuration

All knobs are environment variables — no extra config file needed:

| Variable | Default | Effect | | --- | --- | --- | | PI_GHOST_DISABLED | unset | Set to 1 to disable on launch | | PI_GHOST_PROVIDER | unset (Mercury Edit 2) | cloud or local — set explicitly to switch away from Mercury Edit 2 (ignored when PI_GHOST_RACE_PROVIDERS is set) | | PI_GHOST_PROVIDER_NAME | groq | Cloud only (PI_GHOST_PROVIDER=cloud): any KnownProvider from pi-ai (groq, cerebras, openrouter, …) | | PI_GHOST_MODEL | provider-specific | Model id, e.g. llama-3.1-8b-instant, qwen2.5-coder:1.5b | | PI_GHOST_API_KEY | env var lookup | Cloud only: explicit API key (else pi-ai env lookup is used) | | PI_GHOST_BASE_URL | http://localhost:11434 | Local only: Ollama / vLLM base URL | | PI_GHOST_DEBOUNCE_MS | 150 (cloud) / 400 (local) | Inputs are debounced this long before a request is sent | | PI_GHOST_MAX_TOKENS | 64 (cloud) / 48 (local) | Cap on completion tokens | | PI_GHOST_RECENT | 4 (cloud) / 2 (local) | Number of recent user/assistant messages bundled as context | | PI_GHOST_MIN_CHARS | 3 | Minimum draft length before a prediction fires | | PI_GHOST_MAX_LINE | 240 | Predictions skipped if the current line exceeds this length | | PI_GHOST_TEMPERATURE | 0 | Sampling temperature [0, 2]. 0 is best-effort deterministic; see note below | | PI_GHOST_MERCURY_PARALLEL_N | 1 | Mercury FIM only: fire N parallel candidate completions for cycling. Requires PI_GHOST_TEMPERATURE > 0 (downgrades silently to 1 at temp=0). Costs scale linearly with N | | PI_GHOST_CONTEXT_MAX_SCAN | 200 | M6b: cap on the candidate pool scanned for relevance ranking | | PI_GHOST_CONTEXT_MAX_SELECTED | 8 | M6b: max number of selected context entries | | PI_GHOST_CONTEXT_MAX_CHARS | 4000 | M6b: total char budget for the selected context window | | PI_GHOST_CONTEXT_BM25_WEIGHT | 0.3 | M6b: weight of BM25 content score vs recency (0=pure recency, 1=pure relevance) | | PI_GHOST_CONTEXT_TOOL_MAX | 3 | M6c: max number of tool entries to include in selected context (0 disables tools entirely) | | PI_GHOST_ALLOW_INSECURE | unset | Set to 1 to allow http:// provider base URLs on non-localhost hosts. Off by default — API keys + prompts would be sent in clear | | PI_GHOST_PROFILE | unset | Set to off to disable persistent profile capture + reranking. Default: enabled. Profile is local-only — see Profile below | | PI_GHOST_PROFILE_DECAY_DAYS | 90 | Days before an unused profile record's weight halves during compaction | | PI_GHOST_PROVISIONAL | unset | Set to off to disable the 150 ms provisional fallback. Default: enabled | | PI_GHOST_PROVISIONAL_PREFER | unset | Set to 1/true/on/yes to bias swap decisions toward the already-displayed provisional. Raises the LLM-vs-provisional swap margin from 0.1 to 0.3, reducing perceived flicker at the cost of recall. Bench reports a flicker=X% rate so this knob can be evaluated | | PI_GHOST_MIN_CONFIDENCE | 0.55 | M3c.1: minimum reranker score for the top candidate to be shown. Range [0, 1]. Lower = more lenient, more ghosts shown | | PI_GHOST_RERANK_WEIGHTS | 0.45,0.25,0.15,0.15 | M3c: reranker weights as logprob,profileBias,lengthPrior,agreement. Negative values clamped to 0 | | PI_GHOST_EVAL_CAPTURE | unset | Set to 1 to capture committed prompts to .pi/ghost-autocomplete/eval.jsonl for offline replay (pi-ghost-replay). Sensitive: contains raw prompt text. Off by default | | PI_GHOST_REGRET_WINDOW_MS | 3000 | After accept, window in ms during which a >50% deletion of inserted text flags the record acceptRegret: true | | PI_GHOST_SHORT_HOVER_MS | 800 | Enhancement #3: a ghost dismissed within this many ms is flagged shortHoverDismiss: true and triggers a soft trie penalty (magnitude 0.25). Tunes the bench short-hover rate | | PI_GHOST_METRICS | unset | Set to 1 to write metrics to .pi/ghost-autocomplete/metrics.jsonl | | PI_GHOST_DEBUG_LOG | unset | Set to 1 to write debug records to .pi/ghost-autocomplete/debug.jsonl |

Note on temperature=0

temperature=0 is the default and asks the provider for greedy decoding so the same prefix produces the same completion. This is best-effort, not guaranteed: vendors batch requests across users and kernel scheduling differences can produce different greedy paths from one call to the next. The session's in-memory completion cache is the real determinism guarantee — retyping the same prefix in the same context will reuse the previous ghost rather than re-querying. Raise temperature only if you want cache-miss diversity (e.g. PI_GHOST_TEMPERATURE=0.5).

Run /ghost inside Pi to see the effective config; /ghost on and /ghost off toggle the feature for the current session.

Race mode

Race mode fires requests to multiple providers simultaneously and uses the first valid response. This reduces perceived latency when one provider is slow or rate-limited.

# Race Groq against Cerebras
PI_GHOST_RACE_PROVIDERS=groq,cerebras pi ...

# Race Groq, Cerebras, and Mercury Edit (Inception Labs)
PI_GHOST_RACE_PROVIDERS=groq,cerebras,mercury-edit \
  INCEPTION_API_KEY=your-key pi ...

Race-specific environment variables:

| Variable | Default | Effect | | --- | --- | --- | | PI_GHOST_RACE_PROVIDERS | unset | Comma-separated list of providers to race (groq, cerebras, local, mercury-edit, mercury-edit-2, or any KnownProvider) | | PI_GHOST_RACE_PAUSE_MS | 400 | Milliseconds before the second+ provider is unblocked | | PI_GHOST_RACE_COOLDOWN_MS | 1500 | Quiet period after a winner before the next race starts | | PI_GHOST_RACE_CUTOFF_MS | 700 | Maximum time to wait for any provider before giving up | | PI_GHOST_RACE_<NAME>_MODEL | provider default | Override the model for a specific race member (e.g. PI_GHOST_RACE_GROQ_MODEL) | | PI_GHOST_RACE_<NAME>_API_KEY | env var lookup | Override the API key for a specific race member | | PI_GHOST_RACE_<NAME>_BASE_URL | provider default | Override the base URL for a specific race member | | INCEPTION_API_KEY | unset | API key for Mercury Edit; used when mercury-edit or mercury-edit-2 appears in the race list | | INCEPTION_BASE_URL | https://api.inceptionlabs.ai/v1 | Base URL override for Mercury Edit |

<NAME> in the per-member variables is the provider name uppercased with non-alphanumeric characters replaced by _ (e.g. mercury-editMERCURY_EDIT).

Metrics and debug logging

Enable structured JSONL logs for acceptance tracking and performance analysis:

PI_GHOST_METRICS=1 pi ...        # writes .pi/ghost-autocomplete/metrics.jsonl
PI_GHOST_DEBUG_LOG=1 pi ...      # writes .pi/ghost-autocomplete/debug.jsonl

Metrics records include: timestamp, request id, provider latencies, completion mode (deterministic or llm), result (produced, empty, rejected, aborted, error), rejection reason, completion length, whether the ghost was shown, and acceptance outcome (accepted, dismissed, stale, expired). Raw prompt text and LLM output are never written to the metrics file. Both files rotate at 10 MB.

User profile

The extension keeps a long-lived JSONL store at .pi/ghost-autocomplete/profile.jsonl to learn slash-command, path, and trigram frequencies across sessions. The profile feeds three local-only signals:

  1. Slash fast-path. When the active line starts with /, the editor serves the most-typed extension as a ghost before the LLM is queried (mode trie, provider profile-slash). Press Right to accept.
  2. Profile-bias rerank (M3c hook). A profileBias score plugs into the multi-candidate reranker. Wired but only takes effect when the provider returns >1 candidate.
  3. Bounded path boost. Persistent path frequency adds a small capped boost to deterministic path/fuzzy ranking, capped so it cannot beat a fresh exact path-prefix match.

What's stored: paths (src/foo.ts), slash commands (/review), trigrams from committed prompts, and acceptance records keyed by the SHA-256 hash of the prefix. Raw prefixes and raw completions are never written. Token tuples are tokenized and capped at 4 for prefix tails / 8 for completions.

Sanitization runs before persistence: messages with API keys (sk-…, ghp_…, JWT, etc.) or high-entropy tokens (Shannon > 4.0 bits/char, length ≥ 20) are dropped entirely. The check is shared with the session trie (src/trie-sanitize.ts).

Rotation: when the JSONL exceeds 2 MB, the store compacts in place — sums weights per (kind, key), halves records older than 90 days (configurable via PI_GHOST_PROFILE_DECAY_DAYS), drops weights below 0.1. The original file is preserved as profile.jsonl.1.

Set PI_GHOST_PROFILE=off to disable capture + use entirely.

Benchmark CLI

The pi-ghost-bench binary reads a metrics JSONL file and reports p50/p95 latency, acceptance rates, and whether PRD thresholds pass:

pi-ghost-bench                                    # reads .pi/ghost-autocomplete/metrics.jsonl
pi-ghost-bench path/to/metrics.jsonl              # explicit path
pi-ghost-bench --json                             # machine-readable output

Exit code is 0 when all thresholds pass, 1 when any fail, 2 on I/O error or when the file contains no records.

How it works

  • The extension registers a GhostEditor via ctx.ui.setEditorComponent. GhostEditor extends Pi's CustomEditor so all built-in app keybindings (escape, ctrl+d, model switching, slash-command popup) keep working.
  • After every text change, a debounced request is sent through pi-ai's complete() with an AbortSignal that is canceled on the next keystroke.
  • Before calling the LLM, the editor checks the path index for a deterministic match. If the current token is a file path prefix or a fuzzy filename with a strong enough score, the deterministic result is shown immediately and the LLM call is suppressed.
  • The path index is built asynchronously at session start via git ls-files from the repository root, capped at 20 000 files. High-confidence basename-prefix matches (score ≥85) preempt the LLM unconditionally; lower- confidence fuzzy matches only activate when an action word (open, edit, read, show, view, find, check, look, load, import, include, see, run) precedes the token.
  • Recent agent messages come from ctx.sessionManager.getBranch() — only user/assistant text, no tool calls. Both providers truncate to a small, fixed payload so cloud requests don't ship the whole file.
  • Path mentions are extracted from recent conversation messages (slashed paths, backticked path-like tokens, and bare filenames with known code extensions). These give a ranking boost (+10 exact, +7 basename) to deterministic path completions, so recently discussed files surface higher.
  • For fuzzy filename matches where the typed token is not a prefix of the match, the editor enters replace mode: accepting the ghost replaces the typed token with the full matched path. A prefix is rendered before the ghost text to signal this behaviour. Prefix-suffix matches still append as usual.
  • render() searches for CURSOR_MARKER in the lines returned by super.render() and injects ANSI dim text directly after it. Pi's differential renderer redraws only the changed cells — no flicker, no full repaint.
  • Right Arrow is intercepted only if a ghost is shown, the popup autocomplete is not open, and the cursor sits at the end of the text. We use matchesKey(data, "right") from @earendil-works/pi-tui, which handles legacy CSI/SS3 sequences and the Kitty keyboard protocol uniformly, so kitty / WezTerm / iTerm2 (Kitty-mode) all work. Otherwise the input passes through to super.handleInput.
  • Ctrl+Right accepts the next word of the ghost (PRD §M3f). A "word" is a run of \w+ characters, or a single non-word non-whitespace char with optional leading whitespace — VS Code's Cursor Word Right semantics. Repeated presses peel the ghost off one token at a time; the final press finalises the metric as accepted-partial.
  • Alt+] / Alt+[ cycle through ranked candidates when the provider returns more than one (e.g. openai-compat with PI_GHOST_MERCURY_PARALLEL_N>1 and PI_GHOST_TEMPERATURE>0). Single-candidate sources (cache, trie, speculative warm, deterministic path, default LLM fire) treat both keys as no-ops. The selected candidate's index is logged as candidateRank on accept along with cycledCount.
  • Visible-column math uses string-width + strip-ansi. Ghost text is clipped grapheme-aware to the remaining columns on the current visual line so wide CJK characters and emoji never overflow the row.

Development

npm install
npm run typecheck   # tsc --noEmit against pi-tui / pi-coding-agent v0.74
npm test            # vitest, ~170 unit tests
npm run build       # emits dist/

CI runs all three on Node 20 and 22 (.github/workflows/ci.yml).

Limitations

  • No native Ollama provider in pi-ai; we configure an openai-compat model pointed at Ollama's /v1 endpoint. Any OpenAI-compatible local server works the same way.
  • Pin to ^0.74 of pi-tui / pi-coding-agent / pi-ai / pi-agent-core (all under @earendil-works). The relevant interfaces are pre-1.0; expect occasional follow-up bumps.
  • Predictions are shown only at the end of the buffer to avoid splitting lines mid-stream.
  • The path index covers only tracked Git files. Untracked files are not indexed in v1.
  • Three-provider racing may create elevated cost or rate-limit pressure at high typing speeds.

License

MIT