@protectqa/tcl-nli-local
v0.1.0
Published
TCL local NLI scoring HTTP service.
Downloads
31
Readme
TCL NLI Service (Local Model)
Run NLI scoring using a local Mistral 7B model (or any local LLM) instead of API calls.
Why Use Local Models?
- ✅ No API costs - Free to run
- ✅ No rate limits - Process as many requests as you want
- ✅ Privacy - Data never leaves your server
- ✅ Full control - Use any model you want
Setup Options
Option 1: Using Ollama (Easiest) ⭐ Recommended
Ollama makes it easy to run Mistral locally - no API keys, no cloud costs!
Install Ollama:
# macOS brew install ollama # Or download from https://ollama.aiPull Mistral model (downloads ~4GB, one-time):
ollama pull mistral:7b # Or use quantized version (smaller, faster): ollama pull mistral:7b-instruct-q4_0Start Ollama (runs in background):
ollama serve # Or just run: ollama (starts server automatically)Run the local NLI service:
cd packages/tcl-nli-local npm install export OLLAMA_URL=http://localhost:11434 # default Ollama URL export MODEL=mistral:7b # or mistral:7b-instruct-q4_0 npm startPoint TCL Core to it:
# In Railway or local env export TCL_NLI_ENDPOINT=http://localhost:8081
That's it! You now have:
- ✅ No API keys needed
- ✅ Zero ongoing costs
- ✅ Excellent NLI quality (same as Mistral API)
- ✅ Full privacy (data never leaves your machine)
Option 2: Using llama.cpp
If you have Mistral running via llama.cpp:
- Run your llama.cpp server (e.g., on port 8080)
- Update the service to use your llama.cpp endpoint
- Set TCL_NLI_ENDPOINT to point to it
Option 3: Using Transformers (Python)
If you're using Python with transformers:
- Create a FastAPI service (see
server_python.pyexample) - Load Mistral 7B using transformers
- Expose the
/scoreendpoint - Point TCL_NLI_ENDPOINT to it
API Contract
The service must implement:
POST /score
Request:
{
"pairs": [
{
"task": "contradiction",
"a": "text A",
"b": "text B",
"key": "unique-key"
}
]
}Response:
{
"scores": [
{
"key": "unique-key",
"score": 0.85
}
]
}Comparison
| Approach | Cost | Speed | Setup | Best For | |----------|------|-------|-------|----------| | Local Model | Free | Medium | Medium | Production, privacy | | Mistral API | ~$0.60/1M tokens | Fast | Easy | Quick setup | | TokenHeuristic | Free | Very Fast | None | Development |
Recommendation
For production with local models:
- ✅ Use Ollama (easiest local setup)
- ✅ Use
mistral:7b-instruct-q4_0(quantized, faster, still excellent quality) - ✅ Point
TCL_NLI_ENDPOINTto your local service - ✅ No API keys needed!
Quality: Local Mistral 7B gives you excellent NLI quality - the same as using Mistral's API, but with:
- Zero ongoing costs
- No rate limits
- Full privacy
- Works offline
Performance tips:
- Use quantized model (
q4_0) for faster inference and less RAM - GPU acceleration works automatically if available
- Model loads once, then serves requests quickly
