@capsiynau/intelligence
v0.3.0
Published
Bilingual (Welsh + English) intelligence layer: text normalisation, mutation handling, digraph splitting, spellcheck primitives, and glossary/correction utilities. Pure functions, runs in Node or browser.
Maintainers
Readme
@capsiynau/intelligence
Bilingual (Welsh + English) intelligence layer: text normalisation, mutation handling, digraph splitting, spellcheck primitives, and glossary / correction utilities.
Pure functions, runs in Node or browser. No DOM dependencies, no
network calls, no Supabase coupling — the one export that writes to
Supabase (captureCorrections) takes the client as an argument so the
package itself stays universal.
Install
npm install @capsiynau/intelligence
# Spellcheck consumers also need the optional peer deps:
npm install nspell dictionary-cy dictionary-en-gbThe dictionary peer-deps are optional — only required if you use the
/spellcheck subpath. Welsh normalisation, mutations, digraphs, and the
glossary helpers all work without them.
Quick start
// Welsh normalisation (whitespace, casing, common typos for downstream
// comparison — see SCHEMA.md for the exact rules)
import { normaliseWelsh } from '@capsiynau/intelligence/welsh'
normaliseWelsh(' Gymraeg ') // → 'Gymraeg'
// Mutation helpers — possibleRoots returns every plausible root form
// for a possibly-mutated surface word, tagged with the mutation kind
import { possibleRoots } from '@capsiynau/intelligence/welsh'
possibleRoots('Gymraeg')
// → [{ root: 'Gymraeg', kind: null }, { root: 'Cymraeg', kind: 'soft' }]
// Digraph-aware letter splitting (Welsh treats `ch`/`dd`/`ll`/... as single letters)
import { splitIntoLetters, welshLength } from '@capsiynau/intelligence/welsh'
splitIntoLetters('llyfr') // → ['ll', 'y', 'f', 'r']
welshLength('llyfr') // → 4 (not 5)
// Spellcheck (caller provides the nspell instance — dictionary-cy v2+ is
// already a parsed `{aff, dic}` object, so the construction is synchronous)
import nspell from 'nspell'
import cyDict from 'dictionary-cy'
import { checkSentence, suggest } from '@capsiynau/intelligence/spellcheck'
const cy = nspell(cyDict)
// `accept` is a Set of proper nouns / glossary terms / learned-vocab
// (lowercased) that must never be flagged. `welsh: true` enables
// mutation-aware fallback so `gath` resolves to `cath` via soft-mutation
// reversal before the word is declared misspelled.
const errors = checkSentence('Mae cymareg yn iath pwysig', {
nspell: cy,
accept: new Set(),
welsh: true,
})
// → [{ word: 'cymareg', start: 4, end: 11 }, { word: 'iath', start: 15, end: 19 }]
// Suggestions per misspelled token
const fixes = suggest('cymareg', { nspell: cy, max: 6 })
// → ['Cymraeg']
// Proper-noun extraction (correction-learning heuristic)
import { extractProperNounCorrections } from '@capsiynau/intelligence/corrections'
const corrections = extractProperNounCorrections(
'Allard Perry spoke at the event',
'Aled Parry spoke at the event',
)
// → ['Aled Parry']
// And to persist into word_boost_approved (caller provides the Supabase
// client; the package itself has zero Supabase coupling):
import { captureCorrections } from '@capsiynau/intelligence/corrections'
import { supabase } from './your-supabase-client' // your own client
const persisted = await captureCorrections(before, after, { sessionId, supabase })
// → array of terms written (silent on duplicates; never throws)Subpaths
| Import | Provides |
|--------|----------|
| @capsiynau/intelligence/welsh | normaliseWelsh, possibleRoots + mutation helpers, splitIntoLetters + digraph utilities |
| @capsiynau/intelligence/spellcheck | checkWord, checkSentence, tokenise, suggest — mutation-aware, accepts caller's nspell instance |
| @capsiynau/intelligence/corrections | extractProperNounCorrections (pure), captureCorrections (accepts caller-provided Supabase client) |
| @capsiynau/intelligence/glossary | Glossary Builder pipeline helpers: parse, prompts, tools, url, import, persist |
Design
- Caller-provided dictionaries. The package doesn't bundle nspell or hunspell data — those are ~700 KB Welsh + ~250 KB English. Consumers bring their own so they can lazy-load, cache in IndexedDB, or use WebAssembly variants without the package dictating choice.
- Pure where possible. Welsh / spellcheck / glossary modules are side-effect-free. Easy to test, easy to compose, runs in any JS runtime.
- Mutation-aware spellcheck. A Welsh spellchecker that doesn't
understand initial consonant mutations is useless on real Welsh text.
checkWordreverses soft / nasal / aspirate mutations before looking up the root, soGymraegandNghymraegboth resolve toCymraeg.
Versioning
- Patch (
0.1.x) — bug fixes, no API change - Minor (
0.x.0) — additive API, no consumer changes required - Major (
x.0.0) — breaking changes
Pre-1.0: the API is firming up. Pin a minor version in production
consumers ("@capsiynau/intelligence": "~0.1.0") and watch the
CHANGELOG for breaking moves.
Supabase schema contract
See SCHEMA.md for the public contract on the tables the glossary +
corrections modules interact with (when wired up by the caller).
Cross-repo sync (Capsiynau ecosystem)
Capsiynau consumes this package directly via the monorepo path.
Nodiadau consumes via either npm install @capsiynau/intelligence
(once published) or git subtree pull (canonical pattern documented
at docs/publish-intelligence-package.md).
License
MIT. See LICENSE.
