houtini-lite
v2.1.1
Published
Streamlined MCP server for LM Studio with dynamic token allocation
Maintainers
Readme
Houtini-Lite 🎩
A streamlined MCP (Model Context Protocol) server for LM Studio with intelligent dynamic token allocation. Execute custom prompts on your local LLMs with automatic token optimisation for maximum output.
Features
- 🚀 Dynamic Token Allocation: Automatically maximises output tokens based on your model's context window
- 💡 Smart Context Management: Uses 80% of available context with safety margins to prevent overflow
- 🎯 Simple & Focused: Streamlined toolset for prompt execution without complexity
- 📊 Transparent Diagnostics: See exactly how tokens are allocated in every response
- 🔧 Flexible Override: Manual control when you need specific token limits
Why Houtini-Lite?
Unlike standard MCP servers that use fixed token limits, Houtini-Lite intelligently allocates tokens based on your prompt size and model capabilities. Send a simple prompt? Get 100,000+ tokens of output. Send a large context? Automatically scales to fit.
Installation
Prerequisites
LM Studio (v0.3.0 or later)
- Download from: https://lmstudio.ai/
- Enable the local server (port 1234)
- Load a model (e.g., Qwen3 30B, LLaMA, DeepSeek)
Node.js (v18 or later)
- Download from: https://nodejs.org/
Claude Desktop
- Download from: https://claude.ai/download
Quick Install (via npm)
Install globally from npm
npm install -g houtini-liteConfigure Claude Desktop
Add to your
claude_desktop_config.json:{ "mcpServers": { "houtini-lite": { "command": "npx", "args": ["houtini-lite"], "env": { "LM_STUDIO_URL": "ws://localhost:1234" } } } }Windows config location:
%APPDATA%\Claude\claude_desktop_config.jsonMac config location:~/Library/Application Support/Claude/claude_desktop_config.jsonRestart Claude Desktop
Install from Source
Clone the repository
git clone https://github.com/houtini-ai/houtini-lite.git cd houtini-liteInstall dependencies
npm installBuild the project
npm run buildConfigure Claude Desktop
Add to your
claude_desktop_config.json:{ "mcpServers": { "houtini-lite": { "command": "node", "args": ["C:\\path\\to\\houtini-lite\\dist\\index.js"], "env": { "LM_STUDIO_URL": "ws://localhost:1234" } } } }Restart Claude Desktop
Usage
Basic Commands
Health Check
Verify connection and see model capabilities:
Use houtini-lite:health_checkSimple Prompt
Let dynamic allocation maximise your output:
Use houtini-lite:custom_prompt with prompt: "Explain quantum computing"With Context
Provide additional context for better responses:
Use houtini-lite:custom_prompt with:
- prompt: "Analyse this code for security issues"
- context: "[paste your code here]"Manual Token Control
Override automatic allocation when needed:
Use houtini-lite:custom_prompt with:
- prompt: "Give a brief summary"
- maxTokens: 200Batch Processing
Execute multiple prompts efficiently:
Use houtini-lite:batch_prompts with:
- prompts: [
{"prompt": "First question"},
{"prompt": "Second question", "maxTokens": 500}
]
- combineResults: trueAdvanced Features
Temperature Control
Adjust creativity vs consistency:
Use houtini-lite:custom_prompt with:
- prompt: "Write a creative story"
- temperature: 0.9 (0.0 = deterministic, 1.0 = creative)File-Based Prompts
Load prompts from files with variable substitution:
Use houtini-lite:execute_file_prompt with:
- filePath: "C:\\prompts\\analysis.txt"
- variables: {"project": "MyApp", "language": "Python"}Dynamic Token Allocation
How It Works
- Context Detection: Identifies your model's context window (e.g., 128K for Qwen3)
- Safety Margin: Uses 80% of total context to prevent overflow
- Input Estimation: Calculates tokens needed for your prompt (~3 chars per token)
- Output Maximisation: Allocates all remaining space for output
- Smart Scaling: Automatically reduces output tokens for large inputs
Example Allocations
| Scenario | Model Context | Input Size | Output Allocated | |----------|--------------|------------|------------------| | Simple prompt | 128K | 50 tokens | ~102,000 tokens | | Medium context | 128K | 10K tokens | ~92,000 tokens | | Large context | 128K | 50K tokens | ~52,000 tokens | | Manual override | 128K | Any | Your specified limit |
Token Info in Responses
Every response includes diagnostic information:
[Your LLM's response here...]
[Token Allocation Info]
Model: qwen.qwen3-coder-30b-a3b-instruct
Context Window: 128,000 tokens
Usable Context: 102,400 tokens
Allocated Output Tokens: 102,350
Input Estimate: 50 tokens
Execution Time: 3500ms
Temperature: 0.7
Needs Chunking: NoSupported Models
Houtini-Lite automatically detects context windows for:
- Qwen3 Series: 128K context
- LLaMA Models: 32K context
- CodeLlama: 16K context
- DeepSeek: 32K context
- Meta-LLaMA: 8K context
- Others: Defaults to safe limits
Troubleshooting
"No models loaded in LM Studio"
- Open LM Studio and load a model
- Ensure the local server is running (bottom bar should show "Server Running")
"LM Studio connection failed"
- Check LM Studio is running on port 1234
- Try restarting LM Studio's server
- Verify firewall isn't blocking local connections
"Tool not found" in Claude
- Restart Claude Desktop completely
- Check your claude_desktop_config.json syntax
- Ensure the path to index.js is absolute and correct
Token allocation seems wrong
- Different models have different context windows
- Check the health_check output to verify detected context size
- Some models may report incorrect context sizes
Configuration
Environment Variables
LM_STUDIO_URL: WebSocket URL for LM Studio (default:ws://localhost:1234)
Default Settings
Edit these in the source code if needed:
contextUsageRatio: 0.8 (use 80% of context)minOutputTokens: 1000 (minimum reserved for output)tokenEstimateRatio: 3 (characters per token estimate)defaultTemperature: 0.7timeout: 120000ms (2 minutes)
Development
Project Structure
houtini-lite/
├── src/
│ └── index.ts # Main server implementation
├── dist/ # Compiled JavaScript (git ignored)
├── package.json # Dependencies and scripts
├── tsconfig.json # TypeScript configuration
└── README.md # This fileBuilding from Source
# Install dependencies
npm install
# Build once
npm run build
# Watch mode for development
npm run watchAdding New Models
To add context window detection for new models, edit the knownContextSizes object in src/index.ts:
const knownContextSizes: Record<string, number> = {
'your-model': 32000, // Add your model here
// ... existing models
};Contributing
Contributions are welcome! Please feel free to submit issues or pull requests.
Areas for Contribution
- Additional model context detection
- Token estimation improvements
- New prompt management features
- Performance optimisations
- Documentation improvements
License
MIT License - see LICENSE file for details
Acknowledgements
- Inspired by the original Houtini LM project
- Built with MCP SDK
- Powered by LM Studio
Version History
v2.1.0 (Current)
- Dynamic token allocation system
- Automatic context window detection
- Improved error handling
- Token diagnostics in responses
v2.0.0
- Initial standalone release
- Core prompt execution features
- Basic MCP integration
Note: This is a community project and is not officially affiliated with Anthropic, LM Studio, or the original Houtini project.
