@o-lang/llm-ollama

v1.0.5

Published

19 days ago

O-Lang resolver for local LLM inference via Ollama. Zero data leaves your infrastructure.

0High
0Medium
0Low

blackgenius007

olang olang-resolver ollama llm local on-premise private governed-ai

@o-lang/llm-ollama

O-Lang resolver for local LLM inference via Ollama.

Run any Ollama-compatible model — Llama 3, Mistral, Gemma, Phi, DeepSeek — inside governed O-Lang workflows. Zero data leaves your infrastructure.

Why this resolver?

Every other O-Lang LLM resolver (@o-lang/llm-groq, @o-lang/llm-openai) sends data to external APIs. For teams with data residency requirements, air-gapped environments, or a preference for local inference, this resolver closes that gap — with the same O-Lang governance guarantees: allowlists, intent scope, PII redaction, cryptographic audit chain.

File structure

resolver-llm-ollama/
├── .gitignore
├── .npmignore
├── badges/
│   └── llm-ollama-badge.svg
├── capability.js           ← core LLM logic (buildMessages, executeLLM, MODELS)
├── capability.test.js      ← unit tests (mocked fetch — no Ollama required)
├── conformance.json        ← R-005 → R-012 conformance declarations
├── index.js                ← O-Lang kernel interface (resolver function + metadata)
├── package.json
├── README.md
├── resolver.js             ← O-Lang resolver declaration (inputs, outputs, failures)
└── test-local-resolver.js  ← integration tests (requires live Ollama)

Prerequisites

Install Ollama and pull a model:

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.ai/install.sh | sh

ollama pull llama3.2      # fast — good for bank/ICU style demos
ollama pull llama3.1      # smart — good for RAG, multi-turn chat
ollama serve              # starts local API on :11434

Install the kernel and this resolver:

npm install @o-lang/olang
npm install @o-lang/llm-ollama

Usage in .ol workflows

v1 — simple prompt (bank / ICU style)

Workflow "Balance Check" with customer_id, user_question

Allow resolvers:
  - @o-lang/bank-account-lookup
  - @o-lang/llm-ollama

Step 1: Ask @o-lang/bank-account-lookup "{customer_id}"
Save as account_info

Step 2: Ask @o-lang/llm-ollama "{user_question}. Balance: {account_info.balance}"
Save as response

Return response

v2 — multi-turn RAG chatbot

Workflow "Private RAG" with user_question, document_path

Allow resolvers:
  - @o-lang/semantic-doc-search
  - @o-lang/llm-ollama

Prohibit actions:
  - send_data_externally

Intent scope: "document_question_answering_only"

Step 1: Ask @o-lang/semantic-doc-search "{user_question}" from "{document_path}"
Save as context_chunks

Step 2: Ask @o-lang/llm-ollama "Answer using only this context:

{context_chunks.text}

Question: {user_question}"
Save as answer

Return answer.text

Environment variables

| Variable | Default | Description | |---|---|---| | OLLAMA_HOST | http://localhost:11434 | Ollama server URL | | OLLAMA_MODEL | llama3.1 | Default model for llm_chat | | OLLAMA_MAX_TOKENS | 1024 | Default max tokens for llm_chat | | OLLAMA_NUM_CTX | 4096 | Default context window |

Available models (MODELS constants)

| Constant | Model | Best for | |---|---|---| | MODELS.FAST | llama3.2 | Short answers, classification, v1 demos | | MODELS.SMART | llama3.1 | RAG, multi-turn chat, document reasoning | | MODELS.REASONING | deepseek-r1 | Complex synthesis, code, math |

Any model available via ollama list can be passed as model at call time.

Input schema (v2 llm_chat)

| Field | Type | Default | Description | |---|---|---|---| | user_message | string | required | The current user turn | | system_prompt | string | — | System instruction | | history | array | [] | Prior {role, content} turns | | model | string | OLLAMA_MODEL env | Ollama model name | | temperature | number 0–2 | 0.7 | Sampling temperature | | max_tokens | integer | 1024 | Max response tokens (num_predict) | | num_ctx | integer | 4096 | Context window size |

Output schema (v2 llm_chat)

| Field | Type | Description | |---|---|---| | reply | string | Model response | | model | string | Model that generated the response | | token_usage.prompt | integer | Prompt tokens used | | token_usage.completion | integer | Completion tokens generated | | token_usage.total | integer | Total tokens |

Error codes

| Code | Retries | Meaning | |---|---|---| | OLLAMA_NOT_RUNNING | 0 | Cannot connect — run ollama serve | | MODEL_NOT_FOUND | 0 | Model not pulled — run ollama pull <model> | | EMPTY_PROMPT | 0 | Prompt is blank | | UNRESOLVED_PLACEHOLDERS | 0 | Prompt contains {variable} that wasn't interpolated | | LLM_ERROR | 1 | Ollama API error (5xx) | | TIMEOUT | 1 | Request exceeded timeout |

Running tests

# Unit tests (no Ollama required — uses mocked fetch)
npm test

# Integration tests (requires: ollama serve && ollama pull llama3.2)
npm run test:local

# O-Lang conformance suite (R-005 → R-012)
npm run conformance

Publishing to the O-Lang registry

Run conformance: npm run conformance → generates badge.txt
Publish to npm: npm publish --access public
Submit at olang.cloud/registry
O-Lang team independently re-runs the conformance suite before approving

License

MIT — community resolver, not affiliated with Ollama or Anthropic.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@o-lang/llm-ollama

Why this resolver?

File structure

Prerequisites

Usage in .ol workflows

v1 — simple prompt (bank / ICU style)

v2 — multi-turn RAG chatbot

Environment variables

Available models (MODELS constants)

Input schema (v2 llm_chat)

Output schema (v2 llm_chat)

Error codes

Running tests

Publishing to the O-Lang registry

License