nc-pilot-debug-server

v0.2.0

Published

18 days ago

Local debug sink + CLI for the NC-Pilot browser extension. Captures agent steps (model requests, responses, tool calls) and starts/stops from the terminal or the extension.

0High
0Medium
0Low

nilesh0608

nc-pilot debug browser-extension cli

NC-Pilot Debug Server

A tiny, zero-dependency observability server for AI browser agents. It captures everything the NC-Pilot browser extension's agent does — every model request, model response, reasoning trace, tool call, and tool result — prints it live to your terminal, and stores it grouped by conversation so you can replay, analyze, and tune agent behavior.

Think of it as a flight recorder for an autonomous browser agent.

Pure Node.js. No npm dependencies. One file. Runs anywhere Node 18+ runs.

Why this exists

Autonomous browser agents (read the page → decide → click/type/navigate) are hard to debug. When a run goes wrong you usually only see the final answer, not why the model chose a wrong element or got stuck in a loop. This server records the full decision trace so you can see exactly what the model saw and did at every step — and hand that trace to an LLM to get fixes.

Features

Live trace — color-coded, human-readable stream of every agent step in your terminal.
Full fidelity capture — records the complete messages array the model received each step (system prompt + history + tool results), so prompt/tool-description problems are visible.
Conversation grouping — every event is tagged with a UUID conversationId; events are stored per conversation under conversations/<id>.ndjson.
Chat restore — the extension can fetch a conversation back from this server on startup, so reopening the side panel restores the previous chat (history + transcript).
Reasoning capture — captures model "thinking" output when the model emits it.
Simple HTTP API — list/read/delete conversations as JSON.
NDJSON storage — one JSON object per line; trivially greppable, diffable, and feedable to any LLM or data pipeline.
Zero dependencies, zero config — node server.js and you're recording.

Use cases

Debugging agent failures — see the exact step where the agent picked the wrong element, reused a stale index, or stopped early.
Prompt & tool tuning — the model_request event holds the full context the model saw; diff good vs. bad runs to refine the system prompt or tool descriptions.
Crawling / scraping observability — when you drive the agent to crawl listings, fill forms, paginate, or extract data across pages, capture the whole multi-step run: which elements were scanned (get_dom), what was clicked, what each page returned (read_page). Replay it to confirm coverage or find where extraction broke.
Dataset capture — build a labeled corpus of (page state → chosen action) pairs from real runs for evaluation or fine-tuning.
Regression checks — keep traces of known-good tasks; re-run after changing the prompt or model and compare.
LLM-assisted analysis — hand a conversation's NDJSON to Claude/GPT and ask "why did the agent loop here, and how should I fix the prompt?"
Conversation persistence in dev — restore a chat after a service-worker restart without losing context.

Benefits

See the model's actual input, not just its output — the #1 thing missing from most agent debugging.
No infrastructure — no database, no cloud, no API keys. A single local file.
Privacy by design — runs entirely on your machine; nothing is sent anywhere.
Portable data — plain NDJSON works with jq, grep, pandas, or any LLM.

Requirements

Node.js 18+
The NC-Pilot browser extension (the source of the events)

Install & run

Quickest — npx (no clone)

npx nc-pilot-debug-server start      # start in the background
npx nc-pilot-debug-server status     # is it running?
npx nc-pilot-debug-server stop       # stop it

CLI commands

| command | what it does | |---|---| | start | start the server detached (background); records a PID so stop can find it | | start -f | start in the foreground (this terminal) | | stop | stop the running server (graceful POST /shutdown, falls back to killing the PID) | | status | show whether it's running + reachable | | restart | stop then start | | --port, -p <N> | listen on a custom port (default 8787; or env NC_DEBUG_PORT) |

From a clone

git clone <this-repo-url> nc-pilot-debug-server
cd nc-pilot-debug-server
node cli.js start      # background  ·  node cli.js start -f for foreground
# or the classic: node server.js

You'll see:

nc-pilot-debug-server on http://localhost:8787
  POST   /log
  GET    /conversations
  GET    /conversations/:id
  DELETE /conversations/:id
  POST   /shutdown
  per-conversation logs in ./conversations

The server can also be stopped from the extension — Options → Debug → Stop server (it calls POST /shutdown). Starting must be done from the terminal: browser extensions are sandboxed and cannot launch local processes.

It listens on http://localhost:8787 and writes:

debug-events.ndjson — combined log of all events
conversations/<conversationId>.ndjson — per-conversation logs

Connect the NC-Pilot extension

Debug logging is off by default (so it never phones home for end users). Turn it on from the extension's Options page → Debug server (dev):

Start this server (npx nc-pilot-debug-server start).
In NC-Pilot Options, check Enable debug server.
Set Debug server URL if you changed the port (default http://localhost:8787).
Click Test connection — it should report "Connected".
Save. (To stop the server later: Stop server here, or npx nc-pilot-debug-server stop.)

The extension's manifest already allows http://localhost:* in its CSP, so any local port works — no manifest editing needed.

Now open the side panel and run a task — events stream into your terminal and files, and the side panel gains a ＋ New chat button and a ☰ chat history list (past conversations are restored from this server).

Uncheck Enable debug server in Options for normal use / before publishing the extension.

What a captured run looks like

run_start [11111111] ollama · qwen2.5:7b
  user: find the top story on Hacker News and open it
model_response step 0
  → tool get_dom({})
tool_result get_dom 227 elements
model_response step 1
  → tool open_tab({"url":"https://news.ycombinator.com/item?id=..."})
tool_result open_tab {"openedTabId":42,...}
model_response step 2
  text: Opened the top story "…" in a new tab.
run_end 3 steps · final

Event types

Each line in the NDJSON files is one event:

| type | key fields | meaning | |---|---|---| | run_start | conversationId, backend, model, permissionMode, userMessage | a new user task started | | model_request | step, messageCount, lastRole, messages | the full conversation sent to the model this step | | model_response | step, text, thinking, toolCalls | the model's reply + reasoning + chosen tools | | tool_result | name, args, ok, summary | outcome of running a tool | | error | step, error | a step failed | | run_end | steps, reason | task finished (final / step_cap / stopped) | | history_snapshot | history | full normalized messages; used to restore the chat |

All events also include ts (epoch ms) and conversationId.

HTTP API

| method & path | description | |---|---| | POST /log | ingest one event (used by the extension) | | GET /conversations | list conversations: id, events count, model, firstUser, lastTs | | GET /conversations/:id | full event list for one conversation | | POST /conversations/:id/title | rename a conversation ({ "title": "..." }) | | DELETE /conversations/:id | clear one conversation | | POST /shutdown | stop the server (used by the CLI stop and the extension Stop button) | | GET / | service info + conversation list |

Examples

# list conversations
curl http://localhost:8787/conversations

# fetch one conversation's full trace
curl http://localhost:8787/conversations/<uuid> | jq

# pull just the model's chosen tools across a run
curl -s http://localhost:8787/conversations/<uuid> \
  | jq '.events[] | select(.type=="model_response") | .toolCalls'

# delete a conversation
curl -X DELETE http://localhost:8787/conversations/<uuid>

Analyzing a run with an LLM

The fastest way to improve an agent: give the trace to a capable model.

Grab the conversation file: conversations/<id>.ndjson.
Paste it (or the relevant events) to Claude/GPT with a prompt like:
"This is an NC-Pilot agent trace. The agent kept re-clicking the same element instead of finding the 'Apply' button. Look at the model_request and tool_result events and tell me what to change in the system prompt or tool descriptions."

The model_request events are the most valuable — they show the model's exact input.

Privacy & data

Everything runs locally. This server makes no outbound connections.
debug-events.ndjson and conversations/ can contain page content, URLs, and form text from whatever the agent read. Treat them as sensitive.
Both are git-ignored by default. Do not commit or publish captured traces — scrub or delete them before sharing.

Project layout

nc-pilot-debug-server/
  server.js            # the whole server (zero deps)
  cli.js               # start/stop/status/restart wrapper (the `bin`)
  package.json         # bin + npm scripts
  README.md
  conversations/       # generated: per-conversation NDJSON (git-ignored)
  debug-events.ndjson  # generated: combined log (git-ignored)
  .debug-server.pid    # generated: background PID (git-ignored)

License

MIT — do whatever you want. (Add a LICENSE file before publishing.)