@ordis-dev/ordis
v0.1.1
Published
Schema-first LLM extraction tool that turns unstructured text into validated structured data
Downloads
191
Maintainers
Readme
Ordis
Ordis is a local-first tool and library that turns messy, unstructured text into clean, structured data using a schema-driven extraction pipeline powered by LLMs. You give it a schema that describes the fields you expect, point it at some raw text, and choose any OpenAI-compatible model. Ordis builds the prompt, calls the model, validates the output, and returns either a correct structured record or a clear error.
Ordis does for LLM extraction what Prisma does for databases: strict schemas, predictable output and no more glue code.
Status
✅ CLI functional - Core extraction pipeline working with real LLMs. Ready for testing and feedback.
✅ Programmatic API - Can be used as an npm package in Node.js applications.
Features
- Local-first extraction: Supports Ollama, LM Studio, or any OpenAI-compatible endpoint
- Schema-first workflow: Define your data structure upfront
- Deterministic output: Returns validated records or structured failures
- Token budget awareness: Automatic token counting with warnings and limits
- Dual-purpose: Use as a CLI or import as a library
- TypeScript support: Full type definitions included
Example
ordis extract \
--schema examples/invoice.schema.json \
--input examples/invoice.txt \
--base http://localhost:11434/v1 \
--model llama3.1:8b \
--debugSample schema (invoice.schema.json):
{
"fields": {
"invoice_id": { "type": "string" },
"amount": { "type": "number" },
"currency": { "type": "string", "enum": ["USD", "SGD", "EUR"] },
"date": { "type": "string", "format": "date-time", "optional": true }
}
}Model Compatibility
Works with any service exposing an OpenAI-compatible API:
- Ollama
- LM Studio
- OpenRouter
- Mistral
- Groq
- OpenAI
- vLLM servers
Installation
From npm (recommended)
Install globally to use the CLI anywhere:
npm install -g @ordis-dev/ordis
ordis --helpOr install locally in your project:
npm install @ordis-dev/ordisFrom Source
git clone https://github.com/ordis-dev/ordis
cd ordis
npm install
npm run build
node dist/cli.js --helpUsage
CLI Usage
Extract data from text using a schema:
ordis extract \
--schema examples/invoice.schema.json \
--input examples/invoice.txt \
--base http://localhost:11434/v1 \
--model llama3.1:8b \
--debugWith API key (for providers like OpenAI, Deepseek, etc.):
ordis extract \
--schema examples/invoice.schema.json \
--input examples/invoice.txt \
--base https://api.deepseek.com/v1 \
--model deepseek-chat \
--api-key your-api-key-hereProgrammatic Usage
Use ordis as a library in your Node.js application:
import { extract, loadSchema, LLMClient } from '@ordis-dev/ordis';
// Load schema from file
const schema = await loadSchema('./invoice.schema.json');
// Or create schema from object
import { loadSchemaFromObject } from 'ordis-cli';
const schema = loadSchemaFromObject({
fields: {
invoice_id: { type: 'string' },
amount: { type: 'number' },
currency: { type: 'string', enum: ['USD', 'EUR', 'SGD'] }
}
});
// Configure LLM
const llmConfig = {
baseURL: 'http://localhost:11434/v1',
model: 'llama3.2:3b'
};
// Extract data
const result = await extract({
input: 'Invoice #INV-2024-0042 for $1,250.00 USD',
schema,
llmConfig
});
if (result.success) {
console.log(result.data);
// { invoice_id: 'INV-2024-0042', amount: 1250, currency: 'USD' }
console.log('Confidence:', result.confidence);
} else {
console.error('Extraction failed:', result.errors);
}Using LLM Presets:
import { extract, loadSchema, LLMPresets } from '@ordis-dev/ordis';
const schema = await loadSchema('./schema.json');
// Use preset configurations
const result = await extract({
input: text,
schema,
llmConfig: LLMPresets.ollama('llama3.2:3b')
// Or: LLMPresets.openai(apiKey, 'gpt-4o-mini')
// Or: LLMPresets.lmStudio('local-model')
});What Works
- ✅ Schema loader and validator
- ✅ Prompt builder with confidence scoring
- ✅ Universal LLM client (OpenAI-compatible APIs)
- ✅ Token budget awareness with warnings and errors
- ✅ Structured error system
- ✅ CLI extraction command
- ✅ Programmatic API for library usage
- ✅ Field-level confidence tracking
- ✅ TypeScript type definitions
- ✅ Performance benchmarks
Performance
Pipeline overhead is negligible (~1-2ms). LLM calls dominate execution time (1-10s depending on model). See benchmarks/README.md for detailed metrics.
Run benchmarks:
npm run benchmarkRoadmap
Completed in v0.1.0:
- ✅ Core extraction pipeline with schema validation
- ✅ Token budget awareness and management
- ✅ Confidence scoring for extracted data
- ✅ Programmatic API for library usage
- ✅ CLI tool with debug mode
- ✅ Comprehensive test suite and benchmarks
- ✅ Support for any OpenAI-compatible API
Upcoming:
- [ ] Smart input truncation (#40)
- [ ] Multi-pass extraction for large inputs (#41)
- [ ] Config file support (#16)
- [ ] Output formatting options (#14)
- [ ] Batch extraction (#19)
- [ ] More example schemas (#13)
Contributing
Contributions are welcome!
