@homy/linguist
v0.1.2
Published
A high-performance code language analysis tool based on github-linguist
Downloads
10
Maintainers
Readme
Linguisto
Introduction
Linguisto is a high-performance code language analysis tool based on github-linguist. Built with Rust and providing Node.js bindings via NAPI-RS, it quickly scans directories to count files, calculate byte sizes, and determine language percentages, while intelligently filtering out third-party dependencies and ignored files.
Features
- Superior Performance: Written in Rust, leveraging multi-threading for fast file system traversal.
- Smart Filtering: Automatically respects
.gitignore, skips hidden files, and excludes vendored files (e.g.,node_modules). - Precise Detection: Based on robust language detection algorithms, supporting filename, extension, and content-based disambiguation.
- Beautiful Output: Provides a colorful terminal UI with progress bars, supporting sorting by bytes or file count.
- Data Integration: Supports JSON output for easy integration with other tools.
- Cross-platform: Supports macOS, Linux, Windows, and WASI environments.
Table of Contents
Install
For CLI
If you have Rust installed, you can install it via Cargo:
cargo install linguistoOr install it globally via npm:
npm install -g @homy/linguistFor API
Install it as a dependency in your Node.js project:
npm install @homy/linguistUsage
CLI Usage
Run it in the current directory to see an intuitive language distribution chart (sorted by byte size by default):
linguistoAnalyze a specific directory:
linguisto /path/to/your/projectCommon Options
--json: Output results in JSON format.--all: Show all calculated language statistics. (Currently behaves the same as the default view, kept for compatibility.)--sort <type>: Sort results.typecan befile_count(descending) orbytes(descending, default).--max-lang <number>: Maximum number of languages to display individually. Remaining languages will be grouped into "Other" (default: 6).
Example
# Get JSON stats for the current project sorted by file count
linguisto . --json --sort=file_countProgrammatic Usage
You can call the API provided by @homy/linguist directly in your Node.js or TypeScript code.
const { analyzeDirectory, analyzeDirectorySync } = require('@homy/linguist');
// Asynchronous analysis (recommended for large directories)
async function run() {
const stats = await analyzeDirectory('./src');
console.log(stats);
}
run();
// Synchronous analysis
const syncStats = analyzeDirectorySync('./src');
console.log(syncStats);Benchmark
This project includes a simple benchmark comparing linguisto (native Rust + NAPI) with the pure JavaScript implementation linguist-js.
The benchmark script is located at benchmark/bench.ts and can be run with:
pnpm benchBelow is a sample result on this repository (macOS, Node.js, default settings):
Running benchmark...
┌─────────┬──────────────────────┬────────────────────┬─────────────────────┬────────────────────────┬────────────────────────┬─────────┐
│ (index) │ Task name │ Latency avg (ns) │ Latency med (ns) │ Throughput avg (ops/s) │ Throughput med (ops/s) │ Samples │
├─────────┼──────────────────────┼────────────────────┼─────────────────────┼────────────────────────┼────────────────────────┼─────────┤
│ 0 │ 'linguist-js' │ '87352945 ± 0.86%' │ '86415417 ± 955312' │ '11 ± 0.79%' │ '12 ± 0' │ 64 │
│ 1 │ 'linguisto (native)' │ '2903259 ± 2.20%' │ '3118667 ± 52375' │ '363 ± 2.64%' │ '321 ± 5' │ 345 │
└─────────┴──────────────────────┴────────────────────┴─────────────────────┴────────────────────────┴────────────────────────┴─────────┘From this run, linguisto (native) achieves roughly 30× higher throughput than linguist-js on this project, thanks to Rust's multi-threaded file system traversal and native execution.
References
analyzeDirectory(dir)
- Type:
(dir: string) => Promise<LanguageStat[]>
Asynchronously analyzes the target directory and returns an array of language statistics.
analyzeDirectorySync(dir)
- Type:
(dir: string) => LanguageStat[]
Synchronously analyzes the target directory and returns an array of language statistics.
LanguageStat
Each statistical object contains the following fields:
| Field | Type | Description |
| :--- | :--- | :--- |
| lang | string | Detected language name (e.g., "Rust", "TypeScript") |
| count | number | Number of files for this language |
| bytes | number | Total bytes occupied by files of this language |
| ratio | number | Percentage in the overall project (0.0 - 1.0) |
Credits
- github-linguist/linguist - The project that inspired this tool and provides the language detection logic.
- drshade/linguist - A Rust implementation of linguist that served as a reference.
