@daino/tokenwise

v0.2.0

Published

22 days ago

Spend less, agent more. Token optimization SDK for Agentic AI.

Downloads

204

0High
0Medium
0Low

daino

llm token optimization openai agent agentic-ai cost-reduction proxy

The Problem

Agentic AI systems burn through tokens at alarming rates:

Stateless repetition — 30-40% of tokens are redundant context resent every turn
Tool bloat — 10-50 tool definitions sent per call, most irrelevant
Wrong model for the job — expensive models used for simple tasks
No visibility — you don't know where your tokens go until the bill arrives

How TokenWise Works

Your App ──→ TokenWise ──→ LLM API
                │
                ├── Context Differ     dedup messages, compress history
                ├── Skill Compressor   trim tool descriptions, filter irrelevant
                ├── Model Router       gpt-4o → gpt-4.1-nano for simple tasks
                └── Cost Tracker       log everything, report savings

TokenWise sits between your app and the LLM API. It optimizes every request transparently — your code sees the same responses, just cheaper.

| Module | What it does | Savings | |--------|-------------|---------| | Context Differ | Deduplicates messages, compresses old turns, maximizes cache hits | 20–40% | | Skill Compressor | Removes filler from tool descriptions, filters irrelevant tools | 30–50% | | Model Router | Routes simple tasks to cheaper models automatically | 50–90% per request | | Cost Tracker | Real-time cost monitoring with savings dashboard | visibility |

Quick Start

npm install @daino/tokenwise

Option 1: SDK — same API, automatic optimization

import { TokenWise } from '@daino/tokenwise';

const tw = new TokenWise({
  apiKey: process.env.OPENAI_API_KEY,
  verbose: true,
});

// Same API as OpenAI SDK — just cheaper
const response = await tw.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'What is 2+2?' }],
  tools: myTools,  // automatically compressed
});

console.log(tw.printSavings());

Option 2: Proxy — zero code change

Start the proxy, change one line:

npx tokenwise proxy --port 8787

// Your existing code — only change base_url
const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: 'http://localhost:8787/v1',  // ← this is it
});

// Everything else stays exactly the same
const response = await openai.chat.completions.create({ ... });

# Check savings anytime
curl http://localhost:8787/v1/tokenwise/report

Real Results

╔══════════════════════════════════════╗
║       TokenWise Cost Report          ║
╠══════════════════════════════════════╣
║  Requests:                      247 ║
║  Input tokens:            1,203,847 ║
║  Original:                2,891,203 ║
║  Saved tokens:            1,687,356 ║
╠══════════════════════════════════════╣
║  Actual cost:  $          2.4103    ║
║  Without TW:   $          8.6736    ║
║  You saved:    $          6.2633    ║
║  Savings:                   72.2%   ║
╚══════════════════════════════════════╝

Configuration

All modules are enabled by default. Toggle what you need:

const tw = new TokenWise({
  apiKey: process.env.OPENAI_API_KEY,

  contextDiffer: true,       // dedup & compress messages
  skillCompressor: true,     // trim tool definitions
  modelRouter: true,         // route to cheaper models
  trackCosts: true,          // cost monitoring

  // Custom routing rules
  routingRules: [
    { condition: 'simple', model: 'gpt-4.1-nano' },
    { condition: 'moderate', model: 'gpt-4.1-mini' },
    { condition: 'complex', model: 'gpt-4o' },
  ],

  verbose: true,
});

Anthropic SDK

import { TokenWiseAnthropic } from '@daino/tokenwise';

const tw = new TokenWiseAnthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
  verbose: true,
});

const response = await tw.messages.create({
  model: 'claude-sonnet-4-6',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Hello!' }],
});

Streaming

const stream = await tw.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Tell me a story' }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Web Dashboard

Start the proxy and visit http://localhost:8787/dashboard for a real-time cost analytics UI.

Advanced: Individual Modules

Use any module standalone:

import {
  SkillCompressor, ModelRouter, ContextDiffer,
  SharedStateStore, SmartWakeGate, OutputCompactor,
} from '@daino/tokenwise';

// Compress tools independently
const compressor = new SkillCompressor({ maxTools: 10 });
const { tools, stats } = compressor.optimize(myTools, userMessage);

// Route models independently
const router = new ModelRouter();
const { model, complexity } = router.route('gpt-4o', messages, tools);

// Share context across agents
const store = new SharedStateStore();
store.set('system-prompt', longSystemPrompt, 'agent-1');
const cached = store.get('system-prompt'); // reuse without resending

// Gate agent activation
const gate = new SmartWakeGate();
gate.register({ id: 'search', name: 'Search Agent', triggerKeywords: ['search', 'find'], toolNames: ['web_search'] });
const activeAgents = gate.evaluate('search for the latest news');

// Compact LLM output for downstream agents
const compactor = new OutputCompactor();
const { text } = compactor.compact(verboseLLMResponse);

Supported Models

| Provider | Models | |----------|--------| | OpenAI | gpt-4o, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-5, gpt-5-mini, o3, o4-mini | | Anthropic | claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5 |

Roadmap

[x] Context Differ — message dedup + compression
[x] Skill Compressor — tool description optimization
[x] Model Router — complexity-based routing
[x] Cost Tracker — real-time monitoring
[x] Proxy Server — zero code change mode
[x] Streaming support
[x] Anthropic API native support
[x] Shared State Store — cross-agent context sharing
[x] Smart Wake Gate — idle agent suppression
[x] Output Compactor — response format optimization
[x] Web dashboard for cost analytics
[x] Python SDK (pip install tokenwise)

Contributing

PRs welcome. This project aims to make Agentic AI affordable for everyone.

git clone https://github.com/DainoJung/tokenwise.git
cd tokenwise
npm install
npm run build

License

MIT