tcnt

v1.0.0

Published

9 months ago

A CLI tool to count LLM tokens in text files.

0High
0Medium
0Low

kinlan

llm token counter cli

TokenCount CLI

A Node.js command-line tool to count LLM tokens in text files. It can process local files or read from stdin.

Features

Counts tokens using OpenAI's tiktoken library by default.
Supports input from file path or stdin.
Basic framework for adding other tokenizers.
Handles text files only.

Installation

Clone this repository (or ensure you have the tokencount.js and package.json files).
Navigate to the project directory in your terminal.
Install dependencies:
```
npm install
```
Make the script executable:
```
chmod +x tokencount.js
```
Link the package to make the tokencount command available globally:
```
npm link
```

Usage

Count tokens in a file:

tokencount /path/to/your/file.txt

Count tokens from piped input:

cat /path/to/your/file.txt | tokencount

Specify a tokenizer (default is openai-tiktoken):

tokencount --tokenizer openai-tiktoken /path/to/your/file.txt

Currently, openai-tiktoken is the primary supported tokenizer. A placeholder for gemini-text exists but will use openai-tiktoken as a fallback with a warning, as on-device Gemini tokenization is not yet implemented.

Get help:

tokencount --help

Supported Tokenizers

openai-tiktoken: Uses the gpt2 encoding from OpenAI's tiktoken library.
gemini-text (Placeholder): Currently falls back to openai-tiktoken. On-device support for Gemini tokenization is a future consideration pending available libraries.

Limitations

Text Files Only: This tool is designed for text files. Attempting to process binary files will result in an error or incorrect counts.
On-Device Tokenization for Gemini: True on-device tokenization for Gemini models is not yet implemented.

Development

To contribute or modify:

The project uses ES Module syntax (import/export).
The main script is tokencount.js.
Tokenizer logic is handled within the .action(...) callback in tokencount.js.
To add a new tokenizer, you would typically:
1. Install any necessary Node.js package for that tokenizer.
2. Import necessary functions from the package using import.
3. Add a new else if condition for your tokenizer's name in tokencount.js.
4. Implement the token counting logic within that block.
5. Update this README and the help messages.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme