pando-proxy
v0.1.31
Published
By pando (https://getpando.ai) under the MIT License. Run Codex through a local OpenAI Responses-compatible memory proxy with one npx command. The proxy enforces a one-tier exact-piece memory sieve across rounds and keeps a bounded archive-backed recall p
Maintainers
Readme
pando-proxy
pando-proxy is a local Codex wrapper that rewrites each Responses request through a strict
active-memory sieve.
The important invariant is simple:
- active memory is the exact kept piece set
- the next forwarded prompt contains that exact kept set
- anything not kept is dropped from active memory completely
- if older exact material is needed later, the agent can use
recall({offset,limit})against the per-session archive, up to 3 times in that round
There is no projection layer, no hidden omitted-memory tier, and no summary/embedding memory.
Validation Policy
For the active-memory redesign in this repository:
- ignore unit tests completely
- do not use unit tests as a correctness signal
- validate with live E2E runs against the real backend
- inspect logs and persisted state as the primary verification method
Current Design
The runtime is built around:
groups: compact semantic buckets managed only by structured LLM callspieces: exact retained user/assistant/tool chunksprocessedSourceIds: source ids already seen and archivedarchive: raw original sources kept only for bounded recovery, not for normal prompt memory
Normal end-of-round flow:
- collect new round sources
- run
source_chunk_batchandgroup_intentin parallel piece_retention_batchretained_piece_prune- persist the surviving exact pieces
Normal request flow:
- load session state
- materialize active-piece payloads for rendering
- inject one synthetic developer memory block
- forward to upstream
- if the model explicitly calls
recall, resolve archived sources locally - finalize memory after the upstream round completes
Active Memory vs Archive
These two surfaces are intentionally separate.
Active memory:
- exact surviving pieces only
- always shown in the next rewritten prompt
- what survives is exactly what crosses the prompt boundary
Archive:
- raw original round sources on disk
- not part of normal prompt construction
- only reachable through explicit
recall - bounded to at most 3 recall calls per round
The archive is a recovery surface, not a second active-memory tier.
recall
The proxy may inject one local function tool:
- name:
recall - arguments:
{ offset, limit } - max uses per round:
3
Guidance injected to the model:
- prefer answering from active memory first
- use
recallonly when exact needed material is not visible in active memory - when using it, err on requesting more archive coverage rather than too little
The tool result explicitly marks returned content as archive content and includes:
requestedOffsetrequestedLimitreturnedCountremainingArchivedSourceCount- exact archived source payloads
Quickstart
Requires:
- Deno
- Codex on
PATH - Codex already logged in
Typical use:
deno run --allow-net --allow-env --allow-read --allow-write --allow-run \
src/main.ts \
exec \
--sandbox read-only \
"inspect this repo"Resume with the exact thread id printed by the wrapper:
deno run --allow-net --allow-env --allow-read --allow-write --allow-run \
src/main.ts \
exec resume 019dc204-22fb-7c50-95ad-2f2508254945 \
--sandbox read-only \
"continue"Prefer exact thread ids almost always. --last should be treated as fallback-only.
Auth
Live calls resolve auth in this order:
OPENAI_API_KEY~/.codex/auth.jsonviatokens.access_token
If Codex is already logged in, that is usually enough.
Live E2E Workflow
For real validation, use one fixed state dir and one fixed log file per session:
deno run --allow-net --allow-env --allow-read --allow-write --allow-run \
src/main.ts \
--proxy-log-file /tmp/pando-test.jsonl \
--proxy-state-dir /tmp/pando-test-state \
exec \
--sandbox read-only \
-o /tmp/round1.txt \
"round 1 prompt"Then resume the same exact thread id:
deno run --allow-net --allow-env --allow-read --allow-write --allow-run \
src/main.ts \
--proxy-log-file /tmp/pando-test.jsonl \
--proxy-state-dir /tmp/pando-test-state \
exec resume 019dc204-22fb-7c50-95ad-2f2508254945 \
--sandbox read-only \
-o /tmp/round2.txt \
"round 2 prompt"Inspect after each run:
memory_round_chunkedmemory_round_decisionmemory_round_updatedmemory_state_savedarchive_recallstructured_model_usagestructured_model_skippedround_complete
Wrapper stderr now also prints at exit:
- estimated input tokens without the proxy
- billed all-in tokens with the proxy
- proxy overhead tokens from internal manager calls
Repo Map
- ACTIVE_MEMORY_REDESIGN_PLAN.md — implementation target
- CONTEXT_MEMORY_DESIGN.md — one-tier sieve design
- DESIGN_PRINCIPLES.md — architecture rules
- MEMORY_OPERATIONS.md — round-by-round operations
- REFERENCE.md — concrete runtime types and contracts
- LIVE_E2E.md — live validation loop
- MEMORY_DIAGRAMS.md — simplified diagrams
- npm-publishing.md — package checks and npm release flow
Key runtime files:
src/memory_state.tssrc/memory_pipeline.tssrc/group_manager.tssrc/chunking.tssrc/prompt_view.tssrc/upstream.tssrc/store.tssrc/server.ts
Benchmarks
Replay and benchmark material remains in this repository, but treat the docs above as the source of truth for the shipped active-memory runtime. Historical benchmark docs may discuss earlier designs.
