countokens
v1.3.1
Published
Cloc for counting tokens
Readme
Countokens
A command-line tool, similar to cloc (Count Lines of Code), but designed to count OpenAI Tiktoken tokens in files within a directory.
Features
- Recursively scans directories to find files.
- Counts tokens using specified OpenAI models (defaults to
gpt-4o-mini). - Provides both human-readable and JSON output formats.
- Allows custom ignore patterns (globs).
- Ignores
node_modulesby default. - Handles binary or unreadable files gracefully by skipping them.
- Supports tree view output (
--tree). - Limits tree depth with
-d, --depth <number>option.
Installation
You can install countokens globally using npm:
npm install -g countokensAlternatively, you can run it directly without installation using npx:
npx countokens [options] [path]Usage
countokens [options] [path]Arguments:
path: The root directory to scan for files. Defaults to the current directory (.).
Options:
-m, --model <name>: Specify the OpenAI model name for tokenization. (Default:"gpt-4o-mini")-i, --ignore <globs>: Provide a comma-separated list of glob patterns to ignore. Example:-i "*.log,dist/**"--json: Output the results in JSON format (includes total count and per-file counts).--tree: Display the token counts in a file tree structure.-d, --depth <number>: When used with--tree, limit how many directory levels to display (default: all levels).-h, --help: Display help information.
Examples
1. Count tokens in the current directory using the default model:
npx countokensOutput (Example):
Token count (gpt-4o-mini): 15,823 tokens
src/cli.ts 1700
package.json 666
README.md 350
.gitignore 1282. Count tokens in a specific directory (./src) using gpt-4:
npx countokens --model gpt-4 ./src3. Count tokens, ignoring log files and the build directory:
npx countokens -i "*.log,build/**"4. Get JSON output:
npx countokens --jsonOutput (Example):
{
"total": 15823,
"files": {
"src/cli.ts": 1700,
"package.json": 666,
"README.md": 350,
".gitignore": 128
}
}5. Display token counts as a file tree:
npx countokens --treeOutput (Example):
Token count (gpt-4o-mini): 2,500 tokens
└─ src (2,000)
├─ cli.ts (1,700)
└─ utils (300)6. Display tree up to a maximum depth of 2:
npx countokens --tree --depth 2Output (Example):
Token count (gpt-4o-mini): 2,500 tokens
└─ src (2,000)
├─ cli.ts (1,700)
└─ utils (300)Ignoring Files
By default, countokens ignores files and directories matching the node_modules/** pattern. You can add more patterns using the -i or --ignore flag, providing a comma-separated list of globs.
Tokenization Models
The tool uses the tiktoken library for tokenization. You can specify which model's tokenizer to use via the --model flag. Refer to the Tiktoken documentation for available model names. The default is gpt-4o-mini.
Contributing
Issues and pull requests are welcome! Please refer to the GitHub repository and the issue tracker.
License
This project is licensed under the ISC License - see the LICENSE file for details (or check package.json).
