npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@pictalk-speech-made-easy/conllu-parser

v1.1.0

Published

A TypeScript parser for CoNLL-U and CoNLL-UL (Universal Dependencies) format with full type definitions

Downloads

12

Readme

conllu-parser

A TypeScript parser for CoNLL-U and CoNLL-UL (Universal Dependencies / Universal Lattices) formats with full type definitions.

Features

  • 🔷 Full TypeScript support with comprehensive type definitions
  • 📖 Parse CoNLL-U and CoNLL-UL files into structured objects
  • ✍️ Serialize back to either format
  • 🔀 Auto-detect format from input (10 columns → CoNLL-U, 8-9 → CoNLL-UL)
  • 🔍 Query utilities for finding tokens by POS, dependency relation, etc.
  • 🌳 Tree navigation helpers for dependency trees
  • 🕸️ Lattice utilities for navigating morphological ambiguity graphs
  • 🔄 Bidirectional conversion between CoNLL-U sentences and CoNLL-UL lattices
  • 🌍 Universal Dependencies compliant — supports all UD features

Installation

npm install conllu-parser

Quick Start — CoNLL-U

import { parseConllu, reconstructText, findByUpos, getRoot } from 'conllu-parser';

const conllu = `# sent_id = example
# text = The cat sat.
1	The	the	DET	_	Definite=Def|PronType=Art	2	det	_	_
2	cat	cat	NOUN	_	Number=Sing	3	nsubj	_	_
3	sat	sit	VERB	_	Tense=Past	0	root	_	SpaceAfter=No
4	.	.	PUNCT	_	_	3	punct	_	_
`;

const doc = parseConllu(conllu);

console.log(doc.sentences[0].metadata.text); // "The cat sat."

const nouns = findByUpos(doc.sentences[0], 'NOUN');
console.log(nouns.map(t => t.form)); // ["cat"]

const root = getRoot(doc.sentences[0]);
console.log(root?.form); // "sat"

console.log(reconstructText(doc.sentences[0])); // "The cat sat."

Quick Start — CoNLL-UL

import {
  parseConllul,
  groupArcsByForm,
  getArcsFrom,
  findPaths,
  isLinearLattice,
  latticeToSentence,
} from 'conllu-parser';

// Morphological lexicon with ambiguity
const conllul = `# sent_id = en-book
0	1	book	book	NOUN	N#s	Number=Sing	_
0	1	book	book	AUX	V#inf	VerbForm=Inf	_
0	1	book	book	VERB	V#inf	VerbForm=Inf	_
`;

const doc = parseConllul(conllul);
const lattice = doc.lattices[0];

// See all competing analyses for a surface form
const grouped = groupArcsByForm(lattice);
console.log(grouped.get('book')?.length); // 3 (NOUN, AUX, VERB)

// Navigate the lattice graph
const arcs = getArcsFrom(lattice, 0);
console.log(arcs.map(a => a.upos)); // ["NOUN", "AUX", "VERB"]

// Check ambiguity and convert if linear
if (isLinearLattice(lattice)) {
  const sentence = latticeToSentence(lattice);
}

Auto-detection

import { parse, serialize } from 'conllu-parser';

// Automatically detects CoNLL-U (10 cols) vs CoNLL-UL (8-9 cols)
const doc = parse(input);

if (doc.format === 'conllu') {
  console.log(doc.sentences.length);
} else {
  console.log(doc.lattices.length);
}

// Serialize back to the original format
const output = serialize(doc);

API Reference

Auto-detecting Functions

parse(input: string): ParsedDocument

Parse either format, auto-detecting from column count. Returns a ConlluDocument or ConllulDocument discriminated by the format field.

serialize(doc: ParsedDocument): string

Serialize either document type back to its format.

detectFormat(input: string): 'conllu' | 'conllul'

Detect the format without parsing.

CoNLL-U Functions

parseConllu(input: string): ConlluDocument

Parse a CoNLL-U formatted string into a structured document.

serializeConllu(doc: ConlluDocument): string

Serialize back to CoNLL-U format.

findByUpos(sentence: Sentence, upos: UPOS): Token[]

Find all tokens with a specific Universal POS tag.

findByDeprel(sentence: Sentence, deprel: DepRel): Token[]

Find all tokens with a specific dependency relation.

getRoot(sentence: Sentence): Token | undefined

Get the root token of a sentence's dependency tree.

getChildren(sentence: Sentence, tokenId: string | number): Token[]

Get all tokens that depend on a given token.

getHead(sentence: Sentence, token: Token): Token | undefined

Get the head (parent) of a token in the dependency tree.

reconstructText(sentence: Sentence): string

Reconstruct the original text from tokens, respecting SpaceAfter annotations.

CoNLL-UL Functions

parseConllul(input: string): ConllulDocument

Parse a CoNLL-UL formatted string into lattice structures.

serializeConllul(doc: ConllulDocument, options?): string

Serialize back to CoNLL-UL format. Pass { includeAnchors: false } to omit the 9th column.

getVertices(lattice: Lattice): number[]

Get all unique vertex indices in a lattice, sorted ascending.

getArcsFrom(lattice: Lattice, vertex: number): LatticeArc[]

Get all arcs leaving a vertex — these are competing morphological analyses.

getArcsTo(lattice: Lattice, vertex: number): LatticeArc[]

Get all arcs arriving at a vertex.

getFormsAtVertex(lattice: Lattice, vertex: number): string[]

Get distinct surface forms at a given vertex.

groupArcsByForm(lattice: Lattice): Map<string, LatticeArc[]>

Group all arcs by surface form. Useful for displaying ambiguity (e.g., "tapping" → VERB/Ger, VERB/Part, NOUN).

findPaths(lattice, start?, end?, maxPaths?): LatticeArc[][]

Enumerate all paths through the lattice. Each path is a possible morphological analysis. Use maxPaths (default 1000) to limit computation on highly ambiguous lattices.

isLinearLattice(lattice: Lattice): boolean

Check if a lattice has no ambiguity (each vertex has at most one outgoing arc). Linear lattices map directly to CoNLL-U.

getAmbiguityCount(lattice: Lattice, maxPaths?): number

Count distinct paths through the lattice. Returns -1 if maxPaths is exceeded.

findArcsByUpos(lattice: Lattice, upos: UPOS): LatticeArc[]

Find all arcs with a specific POS tag.

findArcsByLemma(lattice: Lattice, lemma: string): LatticeArc[]

Find all arcs with a specific lemma.

Conversion Functions

latticeToSentence(lattice: Lattice): Sentence

Convert a linear (unambiguous) CoNLL-UL lattice to a CoNLL-U sentence. Vertex IDs are converted from 0-based to 1-based. Throws if the lattice has ambiguity.

sentenceToLattice(sentence: Sentence): Lattice

Convert a CoNLL-U sentence to a linear CoNLL-UL lattice. Multi-word tokens become source spans.

Type Definitions

CoNLL-U Types

interface ConlluDocument {
  format: 'conllu';
  metadata: DocumentMetadata;
  sentences: Sentence[];
}

interface Sentence {
  metadata: SentenceMetadata;
  tokens: Token[];
}

interface Token {
  id: string;           // "1", "1-2", or "1.1"
  form: string;         // Word form
  lemma: string;        // Lemma
  upos: UPOS | '_';     // Universal POS tag
  xpos: string;         // Language-specific POS tag
  feats: MorphFeatures | '_';
  head: number | '_';   // Head token ID (0 for root)
  deprel: DepRel | '_';
  deps: EnhancedDep[] | '_';
  misc: MiscFeatures | '_';
}

CoNLL-UL Types

interface ConllulDocument {
  format: 'conllul';
  metadata: DocumentMetadata;
  lattices: Lattice[];
}

interface Lattice {
  metadata: SentenceMetadata;
  sourceSpans: SourceTokenSpan[];  // Multi-segment surface forms
  arcs: LatticeArc[];             // All edges in the lattice
}

interface LatticeArc {
  from: number;         // Start vertex (0-based)
  to: number;           // End vertex (0-based)
  form: string;         // Word form
  lemma: string;        // Lemma
  upos: UPOS | '_';     // Universal POS tag
  xpos: string;         // Language-specific POS tag
  feats: MorphFeatures | '_';
  misc: MiscFeatures | '_';
  anchors: Anchor | '_'; // Link to gold disambiguation (e.g. goldid=3)
}

interface SourceTokenSpan {
  fromVertex: number;   // Start vertex of the span
  toVertex: number;     // End vertex of the span
  sourceForm: string;   // Surface form of the source token
  misc: string;
}

POS Tags (UPOS)

All 17 Universal POS tags are supported:

type UPOS =
  | 'ADJ' | 'ADP' | 'ADV' | 'AUX' | 'CCONJ' | 'DET'
  | 'INTJ' | 'NOUN' | 'NUM' | 'PART' | 'PRON' | 'PROPN'
  | 'PUNCT' | 'SCONJ' | 'SYM' | 'VERB' | 'X';

Morphological Features

Full support for Universal Dependencies morphological features:

interface MorphFeatures {
  Gender?: string;     // Masc, Fem, Neut, Com
  Number?: string;     // Sing, Plur, Dual
  Case?: string;       // Nom, Acc, Dat, Gen, Voc, Loc, Ins
  Definite?: string;   // Def, Ind, Spec, Cons
  VerbForm?: string;   // Fin, Inf, Part, Conv, Ger, Sup
  Mood?: string;       // Ind, Imp, Cnd, Sub, Opt
  Tense?: string;      // Past, Pres, Fut, Imp, Pqp
  Person?: string;     // 1, 2, 3
  Voice?: string;      // Act, Pass, Mid
  Degree?: string;     // Pos, Cmp, Sup
  PronType?: string;   // Prs, Art, Int, Rel, Dem, Ind
  // Layered features
  'Gender[lex]'?: string;
  'Number[ctxt]'?: string;
  // And more...
  [key: string]: string | undefined;
}

Format Reference

CoNLL-U (10 columns)

ID  FORM  LEMMA  UPOS  XPOS  FEATS  HEAD  DEPREL  DEPS  MISC

Supported features: document-level comments, sentence metadata, all 10 token columns, multi-word tokens (1-2), empty nodes (2.1), enhanced dependencies, layered morphological features.

CoNLL-UL (8-9 columns)

FROM  TO  FORM  LEMMA  UPOS  XPOS  FEATS  MISC  [ANCHORS]

CoNLL-UL extends CoNLL-U to represent morphological ambiguity via lattice structures. Key differences from CoNLL-U:

  • Vertex-based indexing (0-based FROM/TO) instead of linear token IDs
  • Multiple arcs from the same vertex represent competing analyses
  • Source token spans (e.g., 0-3 BCLM) declare surface forms spanning multiple vertices
  • ANCHORS column (optional 9th column) links arcs to gold disambiguation via goldid=N
  • No dependency columns (HEAD, DEPREL, DEPS) — CoNLL-UL is pre-syntactic

A linear (unambiguous) CoNLL-UL lattice maps directly to CoNLL-U.

License

MIT