houtini-lite

v2.1.1

Published

8 months ago

Streamlined MCP server for LM Studio with dynamic token allocation

0High
0Medium
0Low

richardbaxterseo

mcp lm-studio llm local-ai model-context-protocol dynamic-tokens claude anthropic

Houtini-Lite 🎩

A streamlined MCP (Model Context Protocol) server for LM Studio with intelligent dynamic token allocation. Execute custom prompts on your local LLMs with automatic token optimisation for maximum output.

Features

🚀 Dynamic Token Allocation: Automatically maximises output tokens based on your model's context window
💡 Smart Context Management: Uses 80% of available context with safety margins to prevent overflow
🎯 Simple & Focused: Streamlined toolset for prompt execution without complexity
📊 Transparent Diagnostics: See exactly how tokens are allocated in every response
🔧 Flexible Override: Manual control when you need specific token limits

Why Houtini-Lite?

Unlike standard MCP servers that use fixed token limits, Houtini-Lite intelligently allocates tokens based on your prompt size and model capabilities. Send a simple prompt? Get 100,000+ tokens of output. Send a large context? Automatically scales to fit.

Installation

Prerequisites

LM Studio (v0.3.0 or later)
- Download from: https://lmstudio.ai/
- Enable the local server (port 1234)
- Load a model (e.g., Qwen3 30B, LLaMA, DeepSeek)
Node.js (v18 or later)
- Download from: https://nodejs.org/
Claude Desktop
- Download from: https://claude.ai/download

Quick Install (via npm)

Install globally from npm
```
npm install -g houtini-lite
```
Configure Claude Desktop
Add to your claude_desktop_config.json:
```
{
  "mcpServers": {
    "houtini-lite": {
      "command": "npx",
      "args": ["houtini-lite"],
      "env": {
        "LM_STUDIO_URL": "ws://localhost:1234"
      }
    }
  }
}
```
Windows config location: %APPDATA%\Claude\claude_desktop_config.json Mac config location: ~/Library/Application Support/Claude/claude_desktop_config.json
Restart Claude Desktop

Install from Source

Clone the repository

git clone https://github.com/houtini-ai/houtini-lite.git
cd houtini-lite

Install dependencies
```
npm install
```
Build the project
```
npm run build
```

Configure Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "houtini-lite": {
      "command": "node",
      "args": ["C:\\path\\to\\houtini-lite\\dist\\index.js"],
      "env": {
        "LM_STUDIO_URL": "ws://localhost:1234"
      }
    }
  }
}

Restart Claude Desktop

Usage

Basic Commands

Health Check

Verify connection and see model capabilities:

Use houtini-lite:health_check

Simple Prompt

Let dynamic allocation maximise your output:

Use houtini-lite:custom_prompt with prompt: "Explain quantum computing"

With Context

Provide additional context for better responses:

Use houtini-lite:custom_prompt with:
- prompt: "Analyse this code for security issues"
- context: "[paste your code here]"

Manual Token Control

Override automatic allocation when needed:

Use houtini-lite:custom_prompt with:
- prompt: "Give a brief summary"
- maxTokens: 200

Batch Processing

Execute multiple prompts efficiently:

Use houtini-lite:batch_prompts with:
- prompts: [
    {"prompt": "First question"},
    {"prompt": "Second question", "maxTokens": 500}
  ]
- combineResults: true

Advanced Features

Temperature Control

Adjust creativity vs consistency:

Use houtini-lite:custom_prompt with:
- prompt: "Write a creative story"
- temperature: 0.9  (0.0 = deterministic, 1.0 = creative)

File-Based Prompts

Load prompts from files with variable substitution:

Use houtini-lite:execute_file_prompt with:
- filePath: "C:\\prompts\\analysis.txt"
- variables: {"project": "MyApp", "language": "Python"}

Dynamic Token Allocation

How It Works

Context Detection: Identifies your model's context window (e.g., 128K for Qwen3)
Safety Margin: Uses 80% of total context to prevent overflow
Input Estimation: Calculates tokens needed for your prompt (~3 chars per token)
Output Maximisation: Allocates all remaining space for output
Smart Scaling: Automatically reduces output tokens for large inputs

Example Allocations

| Scenario | Model Context | Input Size | Output Allocated | |----------|--------------|------------|------------------| | Simple prompt | 128K | 50 tokens | ~102,000 tokens | | Medium context | 128K | 10K tokens | ~92,000 tokens | | Large context | 128K | 50K tokens | ~52,000 tokens | | Manual override | 128K | Any | Your specified limit |

Token Info in Responses

Every response includes diagnostic information:

[Your LLM's response here...]

[Token Allocation Info]
Model: qwen.qwen3-coder-30b-a3b-instruct
Context Window: 128,000 tokens
Usable Context: 102,400 tokens
Allocated Output Tokens: 102,350
Input Estimate: 50 tokens
Execution Time: 3500ms
Temperature: 0.7
Needs Chunking: No

Supported Models

Houtini-Lite automatically detects context windows for:

Qwen3 Series: 128K context
LLaMA Models: 32K context
CodeLlama: 16K context
DeepSeek: 32K context
Meta-LLaMA: 8K context
Others: Defaults to safe limits

Troubleshooting

"No models loaded in LM Studio"

Open LM Studio and load a model
Ensure the local server is running (bottom bar should show "Server Running")

"LM Studio connection failed"

Check LM Studio is running on port 1234
Try restarting LM Studio's server
Verify firewall isn't blocking local connections

"Tool not found" in Claude

Restart Claude Desktop completely
Check your claude_desktop_config.json syntax
Ensure the path to index.js is absolute and correct

Token allocation seems wrong

Different models have different context windows
Check the health_check output to verify detected context size
Some models may report incorrect context sizes

Configuration

Environment Variables

LM_STUDIO_URL: WebSocket URL for LM Studio (default: ws://localhost:1234)

Default Settings

Edit these in the source code if needed:

contextUsageRatio: 0.8 (use 80% of context)
minOutputTokens: 1000 (minimum reserved for output)
tokenEstimateRatio: 3 (characters per token estimate)
defaultTemperature: 0.7
timeout: 120000ms (2 minutes)

Development

Project Structure

houtini-lite/
├── src/
│   └── index.ts        # Main server implementation
├── dist/               # Compiled JavaScript (git ignored)
├── package.json        # Dependencies and scripts
├── tsconfig.json       # TypeScript configuration
└── README.md          # This file

Building from Source

# Install dependencies
npm install

# Build once
npm run build

# Watch mode for development
npm run watch

Adding New Models

To add context window detection for new models, edit the knownContextSizes object in src/index.ts:

const knownContextSizes: Record<string, number> = {
  'your-model': 32000,  // Add your model here
  // ... existing models
};

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

Areas for Contribution

Additional model context detection
Token estimation improvements
New prompt management features
Performance optimisations
Documentation improvements

License

MIT License - see LICENSE file for details

Acknowledgements

Inspired by the original Houtini LM project
Built with MCP SDK
Powered by LM Studio

Version History

v2.1.0 (Current)

Dynamic token allocation system
Automatic context window detection
Improved error handling
Token diagnostics in responses

v2.0.0

Initial standalone release
Core prompt execution features
Basic MCP integration

Note: This is a community project and is not officially affiliated with Anthropic, LM Studio, or the original Houtini project.