npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@lucas-bortoli/fluent-llama

v0.1.2

Published

Client library for interacting with llama.cpp servers

Readme

fluent-llama

npm License TypeScript

This package is currently in Alpha status. It is not yet suitable for production use. Breaking changes may occur without notice.

fluent-llama is a type-safe, fluent API client for interacting with llama-server (llama.cpp inference server). It provides a modern, expressive interface for chat completions, tool calling, vision tasks, and agent loops.

Features

  • Fluent Configuration 🧠: Builder pattern for Sampling and Toolset configurations.
  • Agent Loops 🤖: The act() method handles the multi-turn reasoning and tool execution cycle automatically.
  • Error Handling with supermacro's neverthrow 🛡️: All operations return Result<T, E> types from the neverthrow library. Handle errors explicitly with TypeScript support. Highly recommend checking it out.
  • Vision Support 📷: Native handling of image attachments via Base64.
  • Reasoning 🔍: Supports reasoningContent (Chain of Thought) streams.
  • Advanced Sampling ⚙️: Fine-grained control over temperature, top-k, top-p, mirostat, DRY, XTC, and more.
  • Streaming 🔄: Full SSE (Server-Sent Events) support for real-time token streaming.

Prerequisites

  • Node.js (v20+ recommended)
  • llama-server: This client is designed to connect to the OpenAI-compatible API exposed by llama-server. Ensure your server is running at a compatible version.

Installation

npm install @lucas-bortoli/fluent-llama

Quick Start

1. Basic Chat Completion

import { Client } from "@lucas-bortoli/fluent-llama";
import { RandomSeed, Sampling } from "@lucas-bortoli/fluent-llama";

async function main() {
  // Connect to your local llama-server
  const client = await Client.from("http://localhost:8080");
  const llmResult = await client.createTextModel("Qwen3.5-35B-A3B");

  if (llmResult.isErr()) {
    console.error("Error opening client:", llmResult.error);
    process.exit(1);
  }

  const llm = llmResult.value;

  const result = await llm.respond({
    instructions: "You are a helpful assistant.",
    history: [{ role: "user", content: "Hello, who are you?", attachments: [] }],
    sampling: new Sampling().setSeed(RandomSeed).build(),
  });

  if (result.isOk()) {
    console.log(result.value.response.content);
  } else {
    // Always handle errors explicitly with neverthrow patterns
    console.error(result.error);
  }
}

main();

2. Tool Calling (Agent Mode)

Use the act() method to run autonomous agent loops where the model decides when to use tools.

import { Client, tool, Toolset, Sampling, RandomSeed } from "@lucas-bortoli/fluent-llama";
import * as v from "valibot";

// Define a tool using Valibot for schema validation
const weatherTool = tool({
  name: "get_weather",
  description: "Gets weather data for a location.",
  parameters: { location: v.string() },
  exec: async ({ location }) => {
    return { temp: 20, condition: "Sunny" };
  },
});

const client = await Client.from("http://localhost:8080");
const llmResult = await client.createTextModel("Qwen3.5-35B-A3B");

if (llmResult.isErr()) {
  console.error("Error opening client:", llmResult.error);
  process.exit(1);
}

const llm = llmResult.value;

// Run the agent loop
const response = await llm.act({
  instructions: "You are a helpful assistant. Use tools to answer.",
  history: [{ role: "user", content: "What's the weather in Tokyo?", attachments: [] }],
  sampling: new Sampling()
    .setSamplerTemperature(0.7)
    .setSamplerTopK(80)
    .setSamplerMinP(0.02)
    .setSeed(RandomSeed)
    .build(),
  toolset: new Toolset([weatherTool]).build(),
});

if (response.isOk()) {
  // The history contains the new generated messages with tool results
  console.log(response.value);
} else {
  // Handle errors explicitly
  console.error(response.error);
}

3. Vision Support

You can send images by attaching binary content to user messages.

import fs from "node:fs/promises";
import path from "node:path";

// ...obtain a TextModel instance like before...

const imageData = await fs.readFile(path.join(__dirname, "image.jpg"));
const response = await llm.respond({
  instructions: "Describe this image.",
  history: [
    {
      role: "user",
      content: "What is in this picture?",
      attachments: [{ mimeType: "image/jpeg", content: imageData.buffer }],
    },
  ],
  sampling: new Sampling().build(),
});

Error Handling with Neverthrow

This library uses neverthrow for all error handling. Every fallible operation returns a Result<T, E> instead of throwing errors. This means you must handle errors explicitly.

Understanding Result<T, E>

  • Result.isOk() → Check if the operation succeeded
  • Result.isErr() → Check if the operation failed
  • Result.value → Access the successful result (only when isOk())
  • Result.error → Access the error (only when isErr())

Check neverthrow's documentation for more information.

Example: Handling Different Error Types

const response = await llm.respond({
  instructions: "You are a helpful assistant.",
  history: [
    /* ... */
  ],
  sampling: new Sampling().build(),
});

if (response.isErr()) {
  const error = response.error;

  switch (error.kind) {
    case "EmptyMessageHistory":
      console.error("No conversation history provided");
      break;
    case "RequestAborted":
      console.error("Request was cancelled before completion");
      break;
    case "ServerError":
      console.error("Server returned unexpected error:", error.cause);
      break;
    case "RequestError":
      console.error("API request failed:", {
        status: error.httpStatusCode,
        details: error.details,
      });
      break;
    case "InvalidParameter":
      console.error("Invalid parameters provided:", error.details);
      break;
  }
}

Configuration Reference

Sampling

The Sampling class allows you to configure generation parameters fluently.

const config = new Sampling()
  .setSamplerTemperature(0.7)
  .setSamplerTopP(0.95)
  .setSamplerTopK(40)
  .setSeed(RandomSeed) // or setSeed(42) for deterministic results
  .setSamplerPresencePenalty(1.0)
  .setGrammar({ type: "Json", schema: { ... } }) // For structured outputs
  .build();

Toolset

The Toolset class manages available functions for the LLM.

const tools = new Toolset([weatherTool, webSearchTool])
  .setWhitelist(["weather-tool"]) // Only allow these tools
  .setBatchMode("Parallel") // Run tools concurrently
  .setInvocationRequirement("AsNeeded") // Or "RequireOne"
  .build();

Compatibility

This package is built specifically for the API interface exposed by llama-server (llama.cpp). Some endpoints use the OpenAI compat layer, but this package is specifically geared towards llama-server's API. Do not use this package with other OpenAI-compatible servers.

Disclaimer

This software is in Alpha.

  • Stability is not guaranteed.
  • API endpoints or types may change.
  • Do not use in production environments until a stable version is released.

License

MIT License. See LICENSE for details.