npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@sharc-code/splitter

v0.2.3

Published

SHARC Splitter - AST and LangChain code chunking for semantic search

Readme

@sharc-code/splitter

Code chunking library for semantic search, featuring AST-based splitting with context injection and LangChain fallback.

npm version License: MIT

Overview

@sharc-code/splitter provides intelligent code chunking for RAG (Retrieval-Augmented Generation) and semantic code search applications. It extracts meaningful code units while preserving semantic context.

Key Features

  • AST-Based Splitting: Tree-sitter powered parsing for 9 languages
  • Context Injection: Automatically adds class/module context to extracted methods
  • Decorator/Annotation Support: Preserves decorators (@Get(), #[derive], etc.) in context
  • LangChain Fallback: Character-based splitting for unsupported languages
  • Syntax Error Detection: Validate code before indexing
  • Memory Safe: Proper cleanup of native tree-sitter resources

Installation

# npm
npm install @sharc-code/splitter

# bun
bun add @sharc-code/splitter

# pnpm
pnpm add @sharc-code/splitter

Note: This package includes native tree-sitter bindings. Ensure you have a C++ compiler available:

  • Windows: Visual Studio Build Tools
  • macOS: Xcode Command Line Tools (xcode-select --install)
  • Linux: build-essential package

Quick Start

import { AstCodeSplitter, LangChainCodeSplitter } from '@sharc-code/splitter';

// AST-based splitting (recommended for code)
const astSplitter = new AstCodeSplitter(3500, 0);

const chunks = await astSplitter.split(
  `class UserService {
    async authenticate(user: string): Promise<boolean> {
      return this.validateCredentials(user);
    }
  }`,
  'typescript',
  'src/services/user.ts'
);

console.log(chunks[0].content);
// Output:
// // Context: class UserService (services/user.ts)
// async authenticate(user: string): Promise<boolean> {
//   return this.validateCredentials(user);
// }

// LangChain splitting (for docs/config)
const langchainSplitter = new LangChainCodeSplitter(1500, 150);
const docChunks = await langchainSplitter.split(markdownContent, 'markdown', 'README.md');

Splitters

AstCodeSplitter

Tree-sitter based splitter that extracts complete semantic units (functions, classes, methods) with automatic context injection.

Supported Languages:

  • TypeScript / JavaScript
  • Python
  • Java
  • C++ / C
  • Go
  • Rust
  • C#
  • Scala

Features:

  • Extracts functions, classes, methods, interfaces as complete units
  • Injects context comments (e.g., // Context: class UserService > module auth)
  • Automatic fallback to LangChain for unsupported languages
  • Syntax error detection via checkSyntaxErrors()
  • Memory-safe with dispose() for cleanup
import { AstCodeSplitter } from '@sharc-code/splitter';

const splitter = new AstCodeSplitter(
  3500,  // chunkSize (max characters per chunk)
  0      // chunkOverlap (0 for AST - semantic units don't need overlap)
);

// Split code
const chunks = await splitter.split(code, 'typescript', 'src/auth.ts');

// Check for syntax errors before indexing
const { hasErrors, errorCount } = splitter.checkSyntaxErrors(code, 'typescript');

// Check if language is supported
if (AstCodeSplitter.isLanguageSupported('rust')) {
  // Use AST splitter
}

// Clean up when done (important for long-running processes)
splitter.dispose();

LangChainCodeSplitter

Character-based splitter using LangChain's RecursiveCharacterTextSplitter. Best for documentation, configuration files, and languages without AST support.

Supported Languages:

  • JavaScript/TypeScript (as js)
  • Python, Java, C++, Go, Rust, PHP, Ruby, Swift, Scala
  • Markdown, HTML, LaTeX
  • Solidity
import { LangChainCodeSplitter } from '@sharc-code/splitter';

const splitter = new LangChainCodeSplitter(
  1500,  // chunkSize
  150    // chunkOverlap (overlap preserves context across chunks)
);

const chunks = await splitter.split(content, 'markdown', 'docs/README.md');

API Reference

CodeChunk

interface CodeChunk {
  content: string;
  metadata: {
    startLine: number;
    endLine: number;
    language?: string;
    filePath?: string;
  };
}

Splitter Interface

interface Splitter {
  split(code: string, language: string, filePath?: string): Promise<CodeChunk[]>;
  setChunkSize(chunkSize: number): void;
  setChunkOverlap(chunkOverlap: number): void;
}

AstCodeSplitter

class AstCodeSplitter implements Splitter {
  constructor(chunkSize?: number, chunkOverlap?: number);

  // Split code into chunks
  split(code: string, language: string, filePath?: string): Promise<CodeChunk[]>;

  // Check for syntax errors
  checkSyntaxErrors(code: string, language: string): { hasErrors: boolean; errorCount: number };

  // Configuration
  setChunkSize(chunkSize: number): void;
  setChunkOverlap(chunkOverlap: number): void;

  // Cleanup native resources
  dispose(): void;

  // Static utilities
  static getSupportedLanguages(): string[];
  static isLanguageSupported(language: string): boolean;
}

LangChainCodeSplitter

class LangChainCodeSplitter implements Splitter {
  constructor(chunkSize?: number, chunkOverlap?: number);

  split(code: string, language: string, filePath?: string): Promise<CodeChunk[]>;
  setChunkSize(chunkSize: number): void;
  setChunkOverlap(chunkOverlap: number): void;
}

Context Injection

The AST splitter automatically adds context comments to extracted code chunks, improving search relevance:

TypeScript/JavaScript/Java/C++/Go/Rust/C#/Scala

// Context: class UserService > module auth (services/user.ts)
async authenticate(user: string): Promise<boolean> {
  return this.validateCredentials(user);
}

Python

# Context: class UserService (services/user.py)
def authenticate(self, user: str) -> bool:
    return self.validate_credentials(user)

Context Hierarchy

The splitter tracks nested containers and builds a context path:

// Input: deeply nested method
namespace App {
  module Auth {
    class UserService {
      authenticate() { ... }
    }
  }
}

// Output chunk:
// Context: namespace App > module Auth > class UserService (auth/user.ts)
authenticate() { ... }

Recommended Chunk Sizes

| Content Type | Chunk Size | Overlap | Splitter | Rationale | |--------------|------------|---------|----------|-----------| | Code (AST) | 3500 | 0 | AST | Complete semantic units, no overlap needed | | Documentation | 1500 | 150 | LangChain | Prose flows between sections | | Config/Data | 1500 | 100 | LangChain | Related keys grouped together | | Fallback Code | 1500 | 100 | LangChain | Conservative chunking |

Memory Management

Tree-sitter uses native C++ bindings that allocate memory outside Node.js's garbage collector. For long-running processes:

const splitter = new AstCodeSplitter();

try {
  // Process many files...
  for (const file of files) {
    const chunks = await splitter.split(file.content, file.language, file.path);
    // ... use chunks
  }
} finally {
  // Free native memory when done
  splitter.dispose();
}

The splitter also automatically cleans up parse trees after each split() call.

Error Handling

Syntax Error Detection

const splitter = new AstCodeSplitter();

// Check for errors before indexing
const { hasErrors, errorCount } = splitter.checkSyntaxErrors(code, 'typescript');

if (hasErrors) {
  console.warn(`File has ${errorCount} syntax errors, skipping...`);
} else {
  const chunks = await splitter.split(code, 'typescript');
}

Automatic Fallback

If AST parsing fails or the language isn't supported, the AST splitter automatically falls back to LangChain:

const splitter = new AstCodeSplitter();

// Vue files aren't AST-supported, will use LangChain
const chunks = await splitter.split(vueCode, 'vue', 'App.vue');
// Console: "Language vue not supported by AST, using LangChain splitter for: App.vue"

Development

# Install dependencies
bun install

# Build
bun run build

# Type check
bun run typecheck

# Watch mode
bun run dev

# Clean build artifacts
bun run clean

Dependencies

| Package | Purpose | |---------|---------| | tree-sitter | AST parsing engine | | tree-sitter-* | Language grammars (9 languages) | | langchain | Text splitting utilities |

Use in SHARC

This package is used by:

  • @sharc-code/mcp - MCP server for AI assistants
  • @sharc/core - Core indexing engine (internal)

For end-to-end semantic code search, see the main SHARC documentation.

License

MIT - See LICENSE for details.

Contributing

All code modifications must be done via Pull Request. See CLAUDE.md for guidelines.