@protectqa/tcl-nli-local

v0.1.0

Published

a month ago

TCL local NLI scoring HTTP service.

Downloads

0High
0Medium
0Low

immrlucky

TCL NLI Service (Local Model)

Run NLI scoring using a local Mistral 7B model (or any local LLM) instead of API calls.

Why Use Local Models?

✅ No API costs - Free to run
✅ No rate limits - Process as many requests as you want
✅ Privacy - Data never leaves your server
✅ Full control - Use any model you want

Setup Options

Option 1: Using Ollama (Easiest) ⭐ Recommended

Ollama makes it easy to run Mistral locally - no API keys, no cloud costs!

Install Ollama:

# macOS
brew install ollama
   
# Or download from https://ollama.ai

Pull Mistral model (downloads ~4GB, one-time):

ollama pull mistral:7b
   
# Or use quantized version (smaller, faster):
ollama pull mistral:7b-instruct-q4_0

Start Ollama (runs in background):

ollama serve
# Or just run: ollama (starts server automatically)

Run the local NLI service:

cd packages/tcl-nli-local
npm install
export OLLAMA_URL=http://localhost:11434  # default Ollama URL
export MODEL=mistral:7b  # or mistral:7b-instruct-q4_0
npm start

Point TCL Core to it:

# In Railway or local env
export TCL_NLI_ENDPOINT=http://localhost:8081

That's it! You now have:

✅ No API keys needed
✅ Zero ongoing costs
✅ Excellent NLI quality (same as Mistral API)
✅ Full privacy (data never leaves your machine)

Option 2: Using llama.cpp

If you have Mistral running via llama.cpp:

Run your llama.cpp server (e.g., on port 8080)
Update the service to use your llama.cpp endpoint
Set TCL_NLI_ENDPOINT to point to it

Option 3: Using Transformers (Python)

If you're using Python with transformers:

Create a FastAPI service (see server_python.py example)
Load Mistral 7B using transformers
Expose the /score endpoint
Point TCL_NLI_ENDPOINT to it

API Contract

The service must implement:

POST /score

Request:

{
  "pairs": [
    {
      "task": "contradiction",
      "a": "text A",
      "b": "text B",
      "key": "unique-key"
    }
  ]
}

Response:

{
  "scores": [
    {
      "key": "unique-key",
      "score": 0.85
    }
  ]
}

Comparison

| Approach | Cost | Speed | Setup | Best For | |----------|------|-------|-------|----------| | Local Model | Free | Medium | Medium | Production, privacy | | Mistral API | ~$0.60/1M tokens | Fast | Easy | Quick setup | | TokenHeuristic | Free | Very Fast | None | Development |

Recommendation

For production with local models:

✅ Use Ollama (easiest local setup)
✅ Use mistral:7b-instruct-q4_0 (quantized, faster, still excellent quality)
✅ Point TCL_NLI_ENDPOINT to your local service
✅ No API keys needed!

Quality: Local Mistral 7B gives you excellent NLI quality - the same as using Mistral's API, but with:

Zero ongoing costs
No rate limits
Full privacy
Works offline

Performance tips:

Use quantized model (q4_0) for faster inference and less RAM
GPU acceleration works automatically if available
Model loads once, then serves requests quickly

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme