@ztimson/ai-utils
v1.0.3
Published
AI Utility library
Readme
@ztimson/ai-utils
AI Utility Library - Unified interface for multiple AI providers
Table of Contents
About
A TypeScript library that provides a unified interface for working with multiple AI providers, making it easy to integrate various AI capabilities into your applications.
Features
- Multi-Provider LLM Support: Seamlessly work with OpenAI, Anthropic (Claude), and Self-hosted (Ollama) models
- Audio Speech Recognition (ASR): Convert audio to text using Whisper models
- Optical Character Recognition (OCR): Extract text from images using Tesseract
- Semantic Similarity: Compare text similarity using tensor-based cosine similarity
- Provider Abstraction: Switch between AI providers without changing your code
Built With
Setup
Prerequisites
Instructions
- Install the package:
npm i @ztimson/ai-utils - For speaker diarization:
pip install pyannote.audio
Prerequisites
- Node.js
- Whisper.cpp (ASR)
- Pyannote (ASR Diarization):
pip install pyannote.audio
Instructions
- Install the dependencies:
npm i - For speaker diarization:
pip install pyannote.audio - Build library:
npm build - Run unit tests:
npm test
Documentation
Setup
const ai = new Ai({
path: '/ai-models',
// Setup audio
whisper: '/path/to/binary', // Required for ASR
hfToken: '...', // Required for diarization
asr: 'ggml-base.en.bin', // Override default ASR model
// Setup LLM
embedder: 'bge-small-en-v1.5', // Override default embedder model
llm: {
system: 'You are a helpful assistant.',
compress: {max: 90_000, min: 50_000}, // Compress chat history to min tokens when max is reached
temperature: 0.8,
max_tokens: 100_000,
memoryModel: 'gpt-4o', // Cheap model for managing memories in background, defaults to current model
models: {
'claude-3-5-sonnet': {proto: 'anthropic', token: process.env.ANTHROPIC_TOKEN},
'gpt-4o': {proto: 'openai', token: process.env.OPENAI_TOKEN},
'llama3': {proto: 'ollama', host: 'http://localhost:11434'},
},
mcp: [
{name: 'files', url: 'https://mcp.example.com', token: process.env.MCP_TOKEN}
],
skills: [
{name: 'Tone of voice', description: 'Brand writing guidelines', content: '# Tone of Voice\n\nAlways be concise and friendly...'}
],
tools: [{
name: 'Marco?',
description: 'Where is marco polo?',
args: {
shout: {type: 'boolean', default: 'Shout into the void?', description: false, required: false}
},
fn: (args: any, stream: LLMRequest['stream'], ai: Ai) => {
const {shout} = args;
return shout ? 'Polo!' : 'Polo';
}
}],
},
// Setup Vision
ocr: 'eng' // Override default OCR model
});
Audio
// Crate audio transcript
const text = await ai.audio.asr('./path/to/audio.mp3');
console.log(text);
// Break transcript into speakers
const text = await ai.audio.asr('./path/to/audio.mp3', {diarization: true});
console.log(text);
// Break transcript into named speakers
const text = await ai.audio.asr('./path/to/audio.mp3', {diarization: 'llm'});
console.log(text);Language
const history = [], memory = [];
// Wait for entire response
const text = await ai.language.ask('My favorite color is blue, whats yours?', {history, memory});
console.log(text);
// Stream response
const chunks = '';
await ai.language.ask('Write me a poem', {
history, memory,
stream: chunk => chunks += chunk,
});
console.log(chunks);
// Manually compile history into memories at end of conversation
// Happens automatically when coverstaions are compressed
await ai.language.updateMemory(history, memory);
// Summarize text
const summary = await ai.language.summarize(longText, 200);
// Code response (no conversation or extra BS)
const code = await ai.language.code('Write a fibonacci function');
// Structured JSON response
const data = await ai.language.json('Extract the name and age', `{
"name": "string",
"age": "number"
}`, {system: 'Extract from user input'});Premade LLM Tools:
cli: Run a shell command, returns its outputget_datetime: Returns local date/timeget_datetime_utc: Returns current UTC date/timeexec: Execute code in cli, node, or pythonfetch: Make HTTP requests (GET/POST/PUT/DELETE)exec_javascript: Execute CommonJS JavaScriptexec_python: Execute Python via python -cread_webpage: Scrape & clean content from a URL, handles HTML, JSON, CSV, media, PDFs etc.web_search: Anonymous DuckDuckGo search, returns a list of URLswikipedia_lookup: Fetch a Wikipedia article (intro or full)wikipedia_search: Search Wikipedia and return matching articlesget_weather: Fetch current weather + forecast for a location (just built!)
Vision
// Extract text from image
const text = await ai.vision.ocr('./path/to/image.png');
console.log(text);License
Copyright © 2023 Zakary Timson | Available under MIT Licensing
See the license for more information.
