amallo

v0.0.5

Published

a year ago

Thin Ollama API wrapper.

0High
0Medium
0Low

thot_experiment

ollama promise streaming minimal lightweight

Amallo

Amallo is a thin wrapper for the Ollama API. There is an official wrapper that is ~~probably~~ definitely more robust. This is a single JS file that works in the Browser, Node, Deno, and Bun and has ergonomics I particularly like. It is current as of Ollama v0.5.7.

Installation

// if you just cloned the repo and copied the file
import Amallo from './amallo.mjs'

// if you got it via `npm install amallo` or similar
import Amallo from 'amallo'

<!-- if you want to use it in the browser -->
<script type="module" src="amallo.mjs"></script>

<!-- or just importing it the esm way also works -->
<script>
import Amallo from 'amallo'
</script>

You also need an ollama running somewhere accessible. It should be available on localhost by default. If you have a more complicated usecase check the FAQ, you may need to set some envvars (OLLAMA_HOST="0.0.0.0" and/or OLLAMA_ORIGINS="*"). Don't blindly do this but it'll save you some googling if you end up having CORS/external accessibility issues.

Basic Usage

Amallo is a closure that keeps some state for convenience and returns an object that has properties for the common API endpoints. Instantiate it like this, both parameters are optional.

//url defaults to 'http://localhost:11434
//model defaults to 'deepseek-r1:1.5b'
const amallo = Amallo(model, url)

By default Amallo defaults to a streaming+promise combo API.

let generation = amallo.generate('Can you please tell me a joke?')

You can also just send a request object as detailed by the Ollama API docs (but model is optional, defaults to whatever you set when you instantiated the closure)

let generation = amallo.generate({prompt: 'Can you please tell me a joke?'})

Once the request is in flight you can set the ontoken property to a callback that will be fired every time there's a new token.

generation.ontoken = token => process.stdout.write(token)

Or you can simply await the response.

let generated = await generation
console.log(generated.response)

> <think>
  Okay, so I need to tell a funny joke ...

If you change your mind before a request finishes, it's possible to abort it, ollama will cancel inference in response.

generation.abort()

Interface

You may freely get/set the instance url and model post instantiation.

amallo.url = 'https://ollama.runninginthe.cloud:1337'
amallo.model = 'MidnightMiqu70b:iq3_xxs'

All methods default to a streaming request if possible, stream must manually be set to false in the request to avoid this behavior. You can set .onchunk if you want to process the raw response chunks as they come in, .ontoken for token strings.

`.generate( prompt_string | request_object )`

const generation = amallo.generate({prompt: 'Can you please tell me a joke?'})
let response = ''
generation.ontoken = token => response += token
const generated = await generation
generate.response === response
> true
console.log(generated.response)
> `Why did the chicken cross the road?
...`

`.chat( request_object )`

const generated = await amallo.chat({messages: [{role:'user', content:'Can you please tell me a joke?'}]})
console.log(generated.messages.at(-1))
> {
  role: 'assistant',
  response: 'Why did the tomato turn red? ...'
}

`.tags() also available as .ls()`

Returns an array of models currently available.

`.ps()`

Returns an array of models currently running.

`.show( model_name_string | request_object )`

Shows the model info, specifying the model is optional, and if omitted it returns the info for the instance model.

`.stop( model_name_string | request_object )`

This isn't an official API endpoint, but it's a wrapper that works the exact same way as ollama stop modelname when using the CLI. Like .show() the model is optional, if omitted the instance model is stopped. n.b. This will return before the model is fully unloaded, the latency isn't large but it's something to beware of.

await amallo.stop('llama3.2:latest')

`.version()`

await amallo.version()
> '0.5.7'

`.embed( string_to_embed | array_of_strings | request_object )`

Generate embeddings for a text or list of texts.

Additional Methods

I haven't ever needed to use these methods so I haven't bothered to test them. Since this just wraps the relevant API endpoints in a very transparent way all of these probably work fine.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Amallo

Installation

Basic Usage

Interface

Additional Methods