jsspacynlp
v0.1.0
Published
TypeScript/JavaScript client for jsspacynlp lemmatization service
Downloads
16
Maintainers
Readme
jsspacynlp Client
TypeScript/JavaScript client library for jsspacynlp lemmatization service.
Features
- 🚀 Promise-based async API
- 📦 TypeScript support with full type definitions
- 🔄 Automatic retry with exponential backoff
- 📊 Batch processing for large datasets
- 🌊 Streaming support for memory-efficient processing
- 🎯 NoSketchEngine vertical format export
- 📝 CSV and JSON export utilities
- 🌐 Works in Node.js and browsers
Installation
npm install jsspacynlpQuick Start
import { SpacyNLP } from 'jsspacynlp';
const nlp = new SpacyNLP({
apiUrl: 'http://localhost:8000',
});
// Lemmatize a single text
const result = await nlp.lemmatize('The cats are running.', 'en_core_web_sm');
console.log(result);
// Lemmatize multiple texts
const results = await nlp.lemmatize(
['First text.', 'Second text.'],
'fr_dep_news_trf'
);API Documentation
SpacyNLP Client
Constructor
const nlp = new SpacyNLP({
apiUrl: 'http://localhost:8000', // API server URL
timeout: 30000, // Request timeout in ms
retries: 3, // Number of retry attempts
retryDelay: 1000, // Initial retry delay in ms
});Methods
lemmatize(texts, model, fields?)
Lemmatize text(s) using specified model.
const result = await nlp.lemmatize(
'Hello world',
'en_core_web_sm',
['text', 'lemma', 'pos'] // Optional: specify fields
);Parameters:
texts:string | string[]- Text or array of texts to processmodel:string- Name of the spaCy modelfields:string[](optional) - Fields to include in response
Returns: Promise<LemmatizeResponse>
health()
Check server health status.
const health = await nlp.health();
console.log(health.status); // "healthy"
console.log(health.models_loaded); // ["en_core_web_sm", ...]
console.log(health.uptime_seconds); // 3600models()
List available models.
const { available_models } = await nlp.models();
for (const model of available_models) {
console.log(model.name, model.language, model.type);
}info()
Get server information.
const info = await nlp.info();
console.log(info.version); // "0.1.0"
console.log(info.spacy_version); // "3.7.2"Batch Processing
For processing large datasets efficiently:
import { SpacyNLP, BatchProcessor } from 'jsspacynlp';
const nlp = new SpacyNLP();
// Create batch processor
const processor = new BatchProcessor(nlp, {
model: 'fr_dep_news_trf',
batchSize: 1000, // Texts per batch
fields: ['text', 'lemma', 'pos'],
onProgress: (processed, total) => {
console.log(`Progress: ${processed}/${total}`);
}
});
// Process large array
const texts = [...]; // Array of 10,000+ texts
const result = await processor.process(texts);
// Access results
for (const doc of result.documents) {
console.log(doc.text);
for (const token of doc.tokens) {
console.log(token.text, token.lemma, token.pos);
}
}Streaming (Memory Efficient)
For extremely large datasets:
const processor = new BatchProcessor(nlp, {
model: 'en_core_web_sm',
batchSize: 1000,
});
// Process as stream
for await (const batchResult of processor.processStream(hugeTextArray)) {
// Process each batch as it arrives
console.log(`Batch processed: ${batchResult.documents.length} documents`);
// Export batch to file
fs.appendFileSync('output.vertical', batchResult.toVertical() + '\n');
}Result Utilities
LemmatizationResult
The result object provides helper methods:
const result = await nlp.lemmatize(['Hello world', 'Testing'], 'en_core_web_sm');
// Get all tokens from all documents
const allTokens = result.allTokens();
// Filter tokens
const nouns = result.filterTokens(token => token.pos === 'NOUN');
const stopWords = result.filterTokens(token => token.is_stop === true);
// Export formats
const vertical = result.toVertical(); // NoSketchEngine format
const csv = result.toCSV(); // CSV format
const json = result.toJSON(); // Array of token objectsNoSketchEngine Vertical Format
const result = await processor.process(texts);
const vertical = result.toVertical();
// Output format:
// word1\tlemma1\tpos1\ttag1
// word2\tlemma2\tpos2\ttag2
//
// word3\tlemma3\tpos3\ttag3 (new document)
fs.writeFileSync('corpus.vertical', vertical);CSV Export
const csv = result.toCSV();
// text,lemma,pos,tag
// Hello,hello,INTJ,UH
// world,world,NOUN,NNJSON Export
const json = result.toJSON();
// [
// [
// { text: "Hello", lemma: "hello", pos: "INTJ", tag: "UH" },
// { text: "world", lemma: "world", pos: "NOUN", tag: "NN" }
// ]
// ]Error Handling
import { SpacyNLP, SpacyNLPError } from 'jsspacynlp';
try {
const result = await nlp.lemmatize('test', 'invalid_model');
} catch (error) {
if (error instanceof SpacyNLPError) {
console.error('API Error:', error.message);
console.error('Status Code:', error.statusCode);
if (error.details?.available_models) {
console.log('Available models:', error.details.available_models);
}
} else {
console.error('Unexpected error:', error);
}
}TypeScript Types
The library includes full TypeScript definitions:
import {
SpacyNLP,
SpacyNLPConfig,
LemmatizeResponse,
Token,
Document,
LemmatizationResult,
ModelInfo,
BatchProcessorConfig,
SpacyNLPError,
} from 'jsspacynlp';Available Fields
When calling lemmatize(), you can specify which fields to include:
text- Token text (always included)lemma- Lemmatized form (always included)pos- Part-of-speech tagtag- Fine-grained POS tagdep- Dependency relationent_type- Named entity typeis_alpha- Is alphabetic (boolean)is_stop- Is stop word (boolean)
Default fields: ['text', 'lemma', 'pos', 'tag', 'dep']
Examples
Basic Usage
import { SpacyNLP } from 'jsspacynlp';
const nlp = new SpacyNLP({ apiUrl: 'http://localhost:8000' });
const result = await nlp.lemmatize(
'Les chats courent dans le jardin.',
'fr_dep_news_trf'
);
// Access parsed documents
for (const doc of result.documents) {
for (const token of doc.tokens) {
console.log(`${token.text} -> ${token.lemma} (${token.pos})`);
}
}Batch Processing with Progress
import { SpacyNLP, BatchProcessor } from 'jsspacynlp';
const nlp = new SpacyNLP();
const texts = loadTexts(); // Load 50,000 texts
const processor = new BatchProcessor(nlp, {
model: 'fr_dep_news_trf',
batchSize: 1000,
onProgress: (processed, total) => {
const percent = ((processed / total) * 100).toFixed(1);
console.log(`Processing: ${percent}% (${processed}/${total})`);
},
});
const result = await processor.process(texts);
console.log(`Processed ${result.documents.length} documents`);Export to NoSketchEngine
import fs from 'fs';
import { SpacyNLP, BatchProcessor } from 'jsspacynlp';
const nlp = new SpacyNLP();
const processor = new BatchProcessor(nlp, {
model: 'en_core_web_trf',
batchSize: 1000,
});
const texts = loadCorpus();
const result = await processor.process(texts);
// Export to vertical format
const vertical = result.toVertical();
fs.writeFileSync('corpus.vertical', vertical, 'utf-8');Filter and Analyze Tokens
const result = await nlp.lemmatize(texts, 'en_core_web_sm');
// Get all nouns
const nouns = result.filterTokens(t => t.pos === 'NOUN');
// Get unique lemmas
const uniqueLemmas = new Set(nouns.map(t => t.lemma));
// Count token frequencies
const frequencies = new Map<string, number>();
for (const token of result.allTokens()) {
frequencies.set(token.lemma, (frequencies.get(token.lemma) || 0) + 1);
}Testing
# Run tests
npm test
# Run with coverage
npm run test:coverage
# Watch mode
npm run test:watchBuilding
# Build TypeScript to JavaScript
npm run build
# Output in dist/ directoryLicense
MIT License - See LICENSE file for details.
