npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

llama-cpp-node

v1.0.10

Published

Node.js addon to use llama.cpp

Downloads

35

Readme

Llama-Cpp-Node

Llama-Cpp-Node is a Node.js binding for llama.cpp, a C++ library for LLMs (Large Language Models) like wizard models. This module allows you to load a model file, create a context, encode strings into tokens, evaluate tokens on the context to predict the next token, and decode tokens back to strings.

Prerequisites

Before using llama-cpp-node, please ensure the following prerequisites are met:

  • C++ Compiler: A C++ compiler is required to build the underlying llama.cpp library. Make sure you have a compatible C++ compiler installed. For Linux, you may need to install build-essential or an equivalent package depending on your distribution. For Windows, you can use Visual Studio with C++ support.

Installation

To install llama-cpp-node, you can use npm:

npm install llama-cpp-node

Note: The latest llama.cpp source code will be automatically downloaded from here during the installation.

Usage

To get started, require the module in your Node.js application:

var llamaCppNode = require('llama-cpp-node');
var { LLAMAModel, LLAMAContext } = llamaCppNode;

Loading a Model File and Creating a Context

Before you can use llama-cpp-node, you need to load a model file and create a context. The model file should be in the ggml format.

var model = llamaCppNode.createModel('C:/models/13B/Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_0.bin');
var ctx = model.createContext();

Encoding Strings into Tokens

To use the model for predictions, you need to encode input strings into tokens. Tokens are represented as Uint32Arrays.

var prompt = 'You are a 25 old human named ASSISTANT. It follows a transcript between you and your wife named USER.\nASSISTANT:';
var tokens = ctx.encode(prompt);

Evaluating Tokens to Predict the Next Token

After encoding the input string, you can evaluate the tokens on the context to predict the next token.

var nextToken = await ctx.eval(tokens);

Decode Tokens into Strings

To decode tokens back to strings you can use the decode method.

var tokens = Uint32Array.from([nextToken]);
var tokenStr = ctx.decode(tokens);

Chatbot Example

To create a conversational chatbot, you can use the readline module to communicate with the user.

var llamaCppNode = require('llama-cpp-node');
var { LLAMAModel, LLAMAContext } = llamaCppNode;
var readline = require('readline/promises');

// Print some system info.
console.log(llamaCppNode.systemInfo());

// Load the model and create a context.
var model = llamaCppNode.createModel('C:/models/13B/Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_0.bin');

var ctx = model.createContext();

// Specify the initial prompt and encode it as tokens.
var prompt = 'You are a 25 old human named ASSISTANT. It follows a transcript between you and your wife named USER.\nASSISTANT:';
var tokens = ctx.encode(prompt);

// Add a BOS token at the beginning of the tokens.
tokens = Array.from(tokens);
tokens.unshift(llamaCppNode.tokenBos());
tokens = Uint32Array.from(tokens);

// Create a readline interface to communicate with the user.
var rl = readline.createInterface({
  input: process.stdin,
  output: process.stdout
});

interact = async () => {
  // Print out the promnpt line.
  var line = 'ASSISTANT:';
  process.stdout.write(line);

  // Loop until user input is needed.
  while (1) {
    // Evaluate to get the next token.
    var nextToken = await ctx.eval(tokens);

    // Handle EOS as switch between assistant to user.
    if (nextToken === llamaCppNode.tokenEos()) {
      // Fixup the reverse prompt.
      tokenStr = '\nUSER: '
      process.stdout.write(tokenStr);
      break;
    }

    // Create tokens for the next eval.
    tokens = Uint32Array.from([nextToken]);

    // Decode the next token to token string.
    var tokenStr = ctx.decode(tokens);

    // Remove additional spaces after the assistant colon (only on the terminal).
    if (tokenStr.startsWith(' ') && line === 'ASSISTANT: ') {
      tokenStr = tokenStr.slice(1);
    }

    // Output the new token.
    process.stdout.write(tokenStr);
    if (tokenStr === '') {
      process.stdout.write('[' + nextToken + ']');
    }

    // Track what is on the current line.
    if (tokenStr === '\n') {
      line = '';
    } else {
      line += tokenStr;
    }

    // Handle the reverse prompt as switch between assistant to user.
    if (line.toUpperCase().startsWith('USER:')) {
      // Add a missing space if needed.
      if (!line.endsWith(' ')) {
        tokenStr += ' ';
        process.stdout.write(' ');
      }
      break;
    }
  }

  // Prompt the user for input and encode the tokens.
  var input = await rl.question('USER: ');
  tokens = ctx.encode(tokenStr + input + '\nASSISTANT:');
};

main = async () => {
  try {
    // Endless loop for endless chatting.
    while (1) {
      await interact();
    }
  } catch (e) {
    console.log(e.stack);
  }
};

main();

Make sure to replace C:/models/13B/Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_0.bin with the actual path to your llama model file.

API

llamaCppNode.systemInfo()

This function returns information about the system where llama-cpp-node is running, such as the CPU and GPU information.

llamaCppNode.createModel(modelPath: string): LLAMAModel

This class represents a llama model. It takes the path to the model file as a parameter and can be used to create a context.

model.createContext(): LLAMAContext

This class represents a context for the llama model. It takes a model instance as a parameter and can be used to encode, evaluate, and decode tokens.

ctx.encode(input: string): Uint32Array

This method takes an input string and encodes it into tokens. It returns a Uint32Array representing the tokens.

ctx.eval(tokens: Uint32Array): Promise<number>

This method takes a Uint32Array of tokens and evaluates them on the context to predict the next token. It returns a Promise that resolves to the next predicted token.

ctx.decode(tokens: Uint32Array): string

This method takes a Uint32Array of tokens and decodes them back into a string. It returns the decoded string.

Contributing

Contributions are welcome! If you have any suggestions, bug reports, or feature requests, please open an issue on the GitHub repository.

License

This module is released under the MIT License. See the LICENSE file for more details.