@silyze/kb-scanner-csv
v1.0.0
Published
CSV implementation of DocumentScanner<T> for @silyze/kb
Readme
@silyze/kb-scanner-csv
CSV implementation of DocumentScanner<T> for @silyze/kb, built on top of csv-parser and @silyze/kb-scanner-text.
Features
- Parses CSV strings into structured rows.
- Converts rows into readable text format.
- Uses token-based chunking via
TextScannerfor embedding compatibility. - Fully async via
AsyncReadStreamandAsyncTransform.
Installation
npm install @silyze/kb-scanner-csvUsage
import CsvScanner from "@silyze/kb-scanner-csv";
const scanner = new CsvScanner();
const csv = `name,age
Mihail,24
Simeon,24`;
async function run() {
const chunks = await scanner.scan(csv).transform().toArray();
console.log(chunks);
}
run().then();Configuration
CsvScanner supports the same configuration options as TextScanner, plus any options from csv-parser:
type CsvScannerConfig = TextScannerConfig & csv.Options;Examples:
tokensPerPage: Number of tokens per chunk (default: 512).overlap: Chunk overlap (e.g.,0.5for 50%).model: OpenAI model passed totiktoken.separator: CSV column separator (e.g.,;or,).
How it works
- Accepts a CSV string as input.
- Parses rows using
csv-parser. - Converts each row to a key=value string.
- Joins all rows into a single text block.
- Scans and chunks it using
TextScanner.
This allows structured tabular data to be processed and embedded similarly to raw text, enabling it to work seamlessly in the @silyze/kb pipeline.
Example Output
For the input:
name,age
Mihail,24
Simeon,24Output after scan and transform may be:
['name="Mihail",age="24"\nname="Simeon",age="24"'];The text will be chunked based on your configured token size and overlap.
