npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@troed/oc-ls-stats

v1.1.0

Published

A TUI plugin for OpenCode that displays live prefill rate (PP) and generation rate (TG) from a llama.cpp-based llama-server.

Readme

oc-ls-stats

A TUI plugin for OpenCode that displays live prefill rate (PP) and generation rate (TG) from a llama.cpp-based llama-server.

There are many plugins to show tokens per second, but the reason for this one is that when running a local model I often found myself not knowing what the server was currently doing, which meant constantly switching over to a console where I could see its output. Especially during prefill/prompt processing which can take more than a minute with no feedback in the UI from other plugins I tried.

Another motivation was to display the data of interest, but in a non-intrusive way with no UI elements jumping around. This plugin thus only shows a single numeric value, tokens per second, with an indicator as to whether the model is currently doing prompt processing or inference (token generation).

I'm using the llama-server /slots endpoint to get the needed data, which means if you connect opencode to another provider the plugin will just display "-" since it's not getting any data to display.

Note: As explained in further detail below the data that's displayed has to be deduced from llama-server's output. Sometimes the plugin might display PP for prompt processing while in reality the model is doing TG. If additional developments are made to the llama-server output data the plugin might be able to discern between them in a better way, but I think is as good as it gets for now.

I made this for my own usage. If you find it useful as well I'm just happy.

/Troed

thanks to Tarquinen for their oc-tps, which I used as a base although I guess most of the code has now been replaced

Installation

opencode plugin @troed/oc-ls-stats@latest --global

Requires opencode 1.3.14 or newer.

TUI plugins are loaded from ~/.config/opencode/tui.json, which after installation should look like this:

{
  "plugin": ["@troed/oc-ls-stats@latest"]
}

Display Format

The plugin renders a single line in the session prompt right slot:

1247 tps (PP)    -- during prefill
  25 tps (TG)    -- during generation
   - tps (TG)    -- idle
          n/a    -- unable to reach llama-server

Detection and Calculation

Server Discovery

The plugin discovers the llama-server URL by reading the OpenCode configuration via the TUI API and extracting baseURL/base_url fields from provider options whose name contains "llama" (but excludes providers whose name contains "ollama"). Falls back to http://localhost:8080 if no matching provider is found.

Slot Polling

Every 500ms, the plugin polls GET /slots?model=<model> on each discovered server. The model parameter is required by the /slots endpoint. If the model cannot be discovered from the current route's session, the plugin skips polling.

State Classification

Each slot is classified as prefill or generation based on the n_decoded counter in next_token[0]. The plugin tracks a per-slot baseline value:

  1. When a slot first appears as processing, the current n_decoded is recorded as the baseline with hasIncreased = false.
  2. If n_decoded <= baseline and hasIncreased is false, the slot is classified as prefilling.
  3. If n_decoded > baseline, hasIncreased is set to true and the slot is classified as generating.

This approach handles the case where n_decoded drops when a new request starts on a reused slot, and prevents generation stalls (where n_decoded plateaus) from being misclassified as prefill.

When no slots are processing, all tracked state for those slots is cleared.

Prefill Rate (PP)

During prefill, the plugin calculates the instantaneous prompt processing rate:

  1. On first detection of a prefill slot, the current n_prompt_tokens is captured as the baseline.
  2. On subsequent polls, the delta in n_prompt_tokens is divided by the elapsed time in seconds.
  3. The rate is updated only when both dt > 0 and delta > 0.

The per-slot n_prompt_tokens field is used instead of the global llamacpp:prompt_tokens_total from /metrics because the global counter includes tokens from all slots, producing inflated values when multiple slots are active simultaneously.

Generation Rate (TG)

During generation, the plugin calculates the instantaneous token generation rate:

  1. On first detection of a generation slot, the current n_decoded is captured as the baseline.
  2. On subsequent polls (same slot ID), the delta in n_decoded is divided by the elapsed time in seconds.
  3. The rate is updated only when both dt > 0 and delta > 0.

Slot reuse is tracked via generateSlotId to detect when a new generation starts on a different slot.

Limitations

Progress Percentage

The plugin cannot display prefill progress percentage. The /slots endpoint returns n_prompt_tokens (current prompt size) and n_prompt_tokens_processed (tokens processed), but not the final prompt size (task->n_tokens() from llama.cpp). Progress requires the ratio n_prompt_tokens_processed / task->n_tokens().

What Would Improve Compatibility

The following changes to the /slots endpoint would improve the plugin's functionality:

  1. Expose final prompt size: Add n_prompt_tokens_total (or n_tokens) to the /slots output, representing task->n_tokens() from llama.cpp. This would enable prefill progress percentage calculation as (n_prompt_tokens_processed / n_prompt_tokens_total) * 100.

  2. Per-slot metrics endpoints: Currently, the /metrics endpoint provides only global counters (llamacpp:prompt_tokens_total, llamacpp:prompt_tokens_seconds). Per-slot metrics would allow independent rate tracking without relying on slot state classification.

  3. Slot transition notifications: The plugin polls every 500ms to detect state transitions. A WebSocket or SSE-based notification system for slot state changes would reduce polling overhead and improve detection latency.

  4. Stall detection: When generation stalls (e.g., due to context window limits), n_decoded remains constant while n_remain stops decreasing. The plugin detects this via zero delta but has no way to distinguish a stall from normal generation. An explicit stalled flag in the slot output would help.

  5. Model-agnostic slot data: The /slots endpoint requires a model parameter. Returning all slots without model filtering, or supporting * as a wildcard, would simplify discovery when multiple models are loaded.

Source code repo

For known issues, posting new ones, forking or contributing:

https://codeberg.org/troed/oc-ls-stats

Debug Logging

Debug logging is controlled by the DEBUG_ENABLED constant in tui.tsx. When enabled, full slot state data is written to /tmp/oc-ls-stats-debug.log on every poll.

License

Creative Commons Zero (CC0 1.0 Universal)