tcnt
v1.0.0
Published
A CLI tool to count LLM tokens in text files.
Readme
TokenCount CLI
A Node.js command-line tool to count LLM tokens in text files. It can process local files or read from stdin.
Features
- Counts tokens using OpenAI's
tiktokenlibrary by default. - Supports input from file path or stdin.
- Basic framework for adding other tokenizers.
- Handles text files only.
Installation
- Clone this repository (or ensure you have the
tokencount.jsandpackage.jsonfiles). - Navigate to the project directory in your terminal.
- Install dependencies:
npm install - Make the script executable:
chmod +x tokencount.js - Link the package to make the
tokencountcommand available globally:npm link
Usage
Count tokens in a file:
tokencount /path/to/your/file.txtCount tokens from piped input:
cat /path/to/your/file.txt | tokencountSpecify a tokenizer (default is openai-tiktoken):
tokencount --tokenizer openai-tiktoken /path/to/your/file.txtCurrently, openai-tiktoken is the primary supported tokenizer. A placeholder for gemini-text exists but will use openai-tiktoken as a fallback with a warning, as on-device Gemini tokenization is not yet implemented.
Get help:
tokencount --helpSupported Tokenizers
openai-tiktoken: Uses thegpt2encoding from OpenAI'stiktokenlibrary.gemini-text(Placeholder): Currently falls back toopenai-tiktoken. On-device support for Gemini tokenization is a future consideration pending available libraries.
Limitations
- Text Files Only: This tool is designed for text files. Attempting to process binary files will result in an error or incorrect counts.
- On-Device Tokenization for Gemini: True on-device tokenization for Gemini models is not yet implemented.
Development
To contribute or modify:
- The project uses ES Module syntax (
import/export). - The main script is
tokencount.js. - Tokenizer logic is handled within the
.action(...)callback intokencount.js. - To add a new tokenizer, you would typically:
- Install any necessary Node.js package for that tokenizer.
- Import necessary functions from the package using
import. - Add a new
else ifcondition for your tokenizer's name intokencount.js. - Implement the token counting logic within that block.
- Update this README and the help messages.
