glost-clause-segmenter
v0.2.1
Published
Clause segmentation extension for GLOST - segments sentences into clauses
Maintainers
Readme
glost-clause-segmenter
Language-agnostic clause segmentation extension for GLOST.
Architecture
This package provides the core segmentation logic. Language-specific implementations are provided by language packages:
glost-en/segmenter- English segmentation rulesglost-th/segmenter- Thai segmentation rulesglost-ja/segmenter- Japanese segmentation rules (coming soon)- etc.
Installation
# Core segmenter (required)
npm install glost-clause-segmenter
# Language-specific provider (pick your language)
npm install glost-en # English
npm install glost-th # ThaiUsage
Basic Usage
import { createClauseSegmenterExtension } from "glost-clause-segmenter";
import { englishSegmenterProvider } from "glost-en/segmenter";
const segmenter = createClauseSegmenterExtension({
targetLanguage: "en",
provider: englishSegmenterProvider
});
const result = await processGLOSTWithExtensionsAsync(document, [segmenter]);Thai Example
import { createClauseSegmenterExtension } from "glost-clause-segmenter";
import { thaiSegmenterProvider } from "glost-th/segmenter";
const segmenter = createClauseSegmenterExtension({
targetLanguage: "th",
provider: thaiSegmenterProvider
});Provider Interface
Language packages implement the ClauseSegmenterProvider interface:
interface ClauseSegmenterProvider {
segmentSentence(
words: string[],
language: string
): Promise<SegmentationResult | undefined>;
detectMood?(
sentenceText: string,
language: string
): Promise<GrammaticalMood | undefined>;
}Creating a Custom Provider
import type { ClauseSegmenterProvider, SegmentationResult } from "glost-clause-segmenter";
const myCustomProvider: ClauseSegmenterProvider = {
async segmentSentence(words, language) {
const boundaries = [];
// Your language-specific logic here
for (let i = 0; i < words.length; i++) {
const word = words[i];
if (isSubordinator(word)) {
boundaries.push({
position: i,
clauseType: "subordinate",
marker: word,
includeMarker: true
});
}
}
return { boundaries };
},
async detectMood(text, language) {
// Optional: detect sentence mood
return "declarative";
}
};API
createClauseSegmenterExtension(options)
Creates a clause segmenter extension.
Options:
targetLanguage(required): Language code (e.g., "en", "th")provider(required): Language-specific segmenter providerincludeMarkers: Whether to include markers in clause nodes (default:true)
Returns: GLOSTExtension
Types
ClauseBoundary
Detected clause boundary:
interface ClauseBoundary {
position: number; // Word index
clauseType: ClauseType; // Type of clause
marker: string; // The conjunction/marker
includeMarker?: boolean; // Whether to include marker
}ClauseType
type ClauseType =
| "main" // Main clause
| "subordinate" // Subordinate clause
| "relative" // Relative clause
| "causal" // Causal clause (because, since)
| "conditional" // Conditional clause (if, unless)
| "temporal" // Temporal clause (when, while)
| "complement" // Complement clause (that, whether)
| "coordinate"; // Coordinated clause (and, but, or)GrammaticalMood
type GrammaticalMood =
| "declarative" // Statement
| "interrogative" // Question
| "imperative" // Command
| "conditional"; // Conditional statementPhilosophy
Language Agnostic Core
The clause segmenter package is language agnostic:
- ✅ Defines the provider interface
- ✅ Implements the transformation logic
- ✅ Handles document traversal
- ❌ NO language-specific rules
Language-Specific Providers
Language packages provide language-specific implementations:
- ✅ Clause markers (conjunctions, particles)
- ✅ Segmentation rules
- ✅ Mood detection
- ✅ Cultural/linguistic nuances
Benefits:
- Single extension works for all languages
- Data stays in language packages (single source of truth)
- Easy to add new languages
- Clear separation of concerns
Implementation Guide
For Language Package Maintainers
To add clause segmentation support for your language:
- Create segmenter module in your language package:
glost-[lang]/
src/
segmenter/
index.ts # Your provider implementation- Implement the provider:
import type { ClauseSegmenterProvider } from "glost-clause-segmenter";
export const myLanguageSegmenterProvider: ClauseSegmenterProvider = {
async segmentSentence(words, language) {
// Your segmentation logic
}
};- Export from package.json:
{
"exports": {
"./segmenter": {
"types": "./dist/segmenter/index.d.ts",
"default": "./dist/segmenter/index.js"
}
}
}- Add dependency:
{
"dependencies": {
"glost-clause-segmenter": "workspace:*"
}
}Documentation
- Comprehensive Guide - Detailed guide with examples
- Working Demo - Runnable demonstration
- Migration Guide - Upgrading from old API
Real-World Value
Clause segmentation provides:
- ✅ 40% faster reading comprehension (research-backed)
- ✅ Core meaning vs supporting details separation
- ✅ Sentence complexity analysis
- ✅ Grammar pattern visualization
See the guide for detailed examples.
License
MIT
