tree-sitter-ts
v0.1.4
Published
Pure TypeScript parser library with declarative language profiles. No WASM required.
Maintainers
Readme
tree-sitter-ts
Pure TypeScript parser library with declarative language profiles.
- No WASM runtime
- Works in Node.js and modern bundlers
- Supports tokenization and symbol extraction
Installation
npm install tree-sitter-tsRequires Node.js 18+
Quick start
import { tokenize, extractSymbols } from "tree-sitter-ts";
const source = `
export class Service {
run(): void {}
}
`;
const tokens = tokenize(source, "typescript");
const symbols = extractSymbols(source, "typescript");
console.log(tokens[0]);
// {
// type: "keyword",
// value: "export",
// category: "keyword",
// range: { start: { line: 2, column: 1, offset: 1 }, end: ... }
// }
console.log(symbols);
// [{
// name: "Service",
// kind: "class",
// nameRange: { startLine: 2, startCol: 13, endLine: 2, endCol: 20 },
// contentRange: { startLine: 2, startCol: 1, endLine: 4, endCol: 2 }
// }]CLI
After installing globally (or using npx), run:
tree-sitter-ts <source-file> <token|symbols> [--language <name-or-extension>]Examples:
tree-sitter-ts ./src/app.ts token
tree-sitter-ts ./src/app.ts symbols
tree-sitter-ts ./snippet.txt token --language typescriptCLI output is deterministic JSON so you can inspect by eye and parse by machine:
{
"ok": true,
"extract": "token",
"sourceFile": "./src/app.ts",
"language": ".ts",
"count": 42,
"result": []
}When using symbols, each symbol item includes only name, kind, nameRange, and contentRange.
On failure, CLI returns non-zero exit code and JSON error payload:
{
"ok": false,
"error": {
"code": "INVALID_EXTRACT",
"message": "Invalid extract mode: \"ast\". Use \"token\" or \"symbols\"."
}
}You can resolve language by:
- Profile name (for example,
"typescript") - File extension (for example,
".ts")
API
Core functions
import {
tokenize,
extractSymbols,
tokenizeWithProfile,
extractSymbolsWithProfile,
} from "tree-sitter-ts";tokenize(source, language): Token[]- Converts source text to a token stream using a registered profile.
extractSymbols(source, language): CodeSymbol[]- Extracts symbols like functions/classes (depending on profile structure rules).
tokenizeWithProfile(source, profile): Token[]- Tokenizes directly with a
LanguageProfileobject.
- Tokenizes directly with a
extractSymbolsWithProfile(source, profile): CodeSymbol[]- Extracts symbols directly with a
LanguageProfileobject.
- Extracts symbols directly with a
Registry utilities
import {
registerProfile,
getProfile,
getRegisteredLanguages,
getSupportedExtensions,
builtinProfiles,
} from "tree-sitter-ts";registerProfile(profile)- Registers a custom language profile at runtime.
getProfile(nameOrExt)- Gets a profile by language name or file extension.
getRegisteredLanguages()- Lists currently registered profile names.
getSupportedExtensions()- Lists registered file extensions.
builtinProfiles- Array of all built-in profiles.
Output types
Token
interface Token {
type: string;
value: string;
category: TokenCategory;
range: Range;
}CodeSymbol
interface CodeSymbol {
name: string;
kind: SymbolKind;
nameRange: Range;
contentRange: Range;
}
interface Range {
startLine: number;
startCol: number;
endLine: number;
endCol: number;
}Built-in languages
Current built-in profiles:
jsoncssscsspythongojavascripttypescriptcpphtmlmarkdownyamlxmljavacsharprustrubyphpkotlinswiftshellbashsqltoml
To inspect at runtime:
import { getRegisteredLanguages, getSupportedExtensions } from "tree-sitter-ts";
console.log(getRegisteredLanguages());
console.log(getSupportedExtensions());Custom language profile example
import {
registerProfile,
tokenize,
extractSymbols,
type LanguageProfile,
} from "tree-sitter-ts";
const toyProfile: LanguageProfile = {
name: "toytest",
displayName: "Toy Test",
version: "1.0.0",
fileExtensions: [".toy"],
lexer: {
charClasses: {
identStart: { union: [{ predefined: "letter" }, { chars: "_" }] },
identPart: { union: [{ predefined: "alphanumeric" }, { chars: "_" }] },
},
tokenTypes: {
keyword: { category: "keyword" },
identifier: { category: "identifier" },
punctuation: { category: "punctuation" },
whitespace: { category: "whitespace" },
newline: { category: "newline" },
},
initialState: "default",
skipTokens: ["whitespace", "newline"],
states: {
default: {
rules: [
{ match: { kind: "keywords", words: ["fn"] }, token: "keyword" },
{
match: {
kind: "charSequence",
first: { ref: "identStart" },
rest: { ref: "identPart" },
},
token: "identifier",
},
{
match: { kind: "string", value: ["{", "}", "(", ")", ",", ";"] },
token: "punctuation",
},
],
},
},
},
structure: {
blocks: [{ name: "braces", open: "{", close: "}" }],
symbols: [
{
name: "function_declaration",
kind: "function",
pattern: [
{ token: "keyword", value: "fn" },
{ token: "identifier", capture: "name" },
],
hasBody: true,
bodyStyle: "braces",
},
],
},
};
registerProfile(toyProfile);
const source = "fn add(a, b) {\n}\n";
console.log(tokenize(source, "toytest"));
console.log(extractSymbols(source, ".toy"));Advanced exports
For advanced use cases, the package also exports lexer/parser internals and schema types, including:
CompiledLexer,getCompiledLexerCharReader,compileMatcher,compileCharClassfindBlockSpans,extractSymbolsFromTokens- Schema and output type exports from
schema/*andtypes/*
Error behavior
If you pass an unknown language name/extension to tokenize or extractSymbols, the library throws an error:
Unknown language: "...". Use getRegisteredLanguages() to see available languages.License
MIT
