npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@ade_oshineye/yaket

v0.5.3

Published

TypeScript port of the YAKE keyword extraction core pipeline

Downloads

110

Readme

Yaket

Yaket is a TypeScript keyword extraction library that ports the core YAKE pipeline into a form that works in Node, browser-style bundles, and Cloudflare Workers.

It is designed for teams that want upstream-like YAKE behavior, deterministic results, and a typed API that can plug into ingestion pipelines such as Bobbin or future consumers such as flux-search.

Attribution

Yaket is an independent TypeScript port/reimplementation of the YAKE approach and is based on the upstream Python YAKE project:

The underlying research is:

  • Ricardo Campos, Vitor Mangaravite, Arian Pasquali, Alipio Jorge, Celine N. S. Santos, and Adam Jatowt. YAKE! Keyword Extraction from Single Documents using Multiple Local Features. Information Sciences 509 (2020), 257-289. DOI: 10.1016/j.ins.2019.09.013

Yaket aims to preserve the core YAKE behavior where practical, while adapting the implementation to JavaScript/TypeScript runtimes and edge-compatible packaging.

Why use it

  • Upstream-shaped YAKE core: KeywordExtractor, DataCore, SingleWord, ComposedWord, and the core scoring/dedup flow are implemented in TypeScript.
  • Edge-safe extraction path: stopwords are bundled, and the extraction core avoids Node-only runtime dependencies.
  • Pipeline-friendly API: one-shot extraction, reusable extractor instances, Bobbin-compatible adapter output, and document-oriented helpers are all available.
  • Surface-form preserving results: returned keyword values keep the observed case from the source text, while normalizedKeyword carries the normalized matching form for downstream logic.
  • Verification-heavy: regression fixtures, Python parity checks, property-based tests, Cloudflare runtime tests, and a benchmark harness are checked in.

30-Second Summary

Yaket extracts weighted keywords from a single document using a YAKE-style local-feature pipeline.

It is designed for cases where you want:

  1. a deterministic keyword extractor
  2. no LLM dependency
  3. browser/edge compatibility
  4. a practical JavaScript alternative to the Python YAKE package

Quick Start

Requires Node.js 20+

npm install @ade_oshineye/yaket
import { extract } from "@ade_oshineye/yaket";

const keywords = extract(
  "Google is acquiring data science community Kaggle.",
  { language: "en", n: 3, top: 5 },
);

console.log(keywords);

Expected shape:

[
  ["science community Kaggle", 0.022868570857866696],
  ["community Kaggle", 0.04778970771086575],
]

Installation

Install from npm:

npm install @ade_oshineye/yaket

The package ships ESM output and exposes Worker/browser-safe entry points:

  • @ade_oshineye/yaket
  • @ade_oshineye/yaket/browser
  • @ade_oshineye/yaket/worker

Usage

Algorithm Summary

At a high level, Yaket:

  1. preprocesses text into sentences and tokens
  2. generates single-word and multi-word candidates up to n
  3. scores single words using local YAKE-style features such as frequency, spread, position, casing, and co-occurrence-derived relations
  4. scores composed phrases from those single-word scores
  5. deduplicates the ranked list with seqm, levs, or jaro

See docs/architecture.md for the pipeline structure and docs/algorithm-drift.md for known deviations from upstream YAKE.

Options Reference

Common options:

| Option | Meaning | Default | |---|---|---| | language | language code | en | | n | maximum n-gram size | 3 | | top | number of results to return | 20 | | dedupFunc | dedup function (seqm, levs, jaro) | seqm | | dedupLim | dedup threshold | 0.9 | | windowSize | co-occurrence window | 1 | | stopwords | explicit stopword iterable override | bundled set for lan |

For the complete public API, see docs/api-reference.md.

Canonical option names are:

  • language
  • dedupLim
  • dedupFunc
  • windowSize

Legacy aliases such as lan, dedup_lim, windowsSize, and window_size are still accepted for backward compatibility, but new code should prefer the canonical names.

If you prefer the most concise one-shot API, extract() is an alias for extractKeywords().

Reusable extractor

import { KeywordExtractor } from "@ade_oshineye/yaket";

const extractor = new KeywordExtractor({
  language: "en",
  n: 3,
  top: 10,
});

const keywords = extractor.extractKeywords(
  "Cloudflare Workers process requests close to users.",
);

Detailed keyword results

import { extractKeywordDetails } from "@ade_oshineye/yaket";

const details = extractKeywordDetails("Machine learning improves software delivery.", {
  language: "en",
  n: 2,
  top: 5,
});

extractKeywordDetails() returns:

type KeywordResult = {
  keyword: string;
  normalizedKeyword: string;
  score: number;
  ngramSize: number;
  occurrences: number;
  sentenceIds: number[];
};

keyword preserves the source-text surface form; normalizedKeyword is the normalized comparison key used for deduplication and downstream matching.

Document-oriented pipelines

import { extractFromDocument, serializeDocumentKeywordResult } from "@ade_oshineye/yaket";

const result = extractFromDocument({
  id: "doc-1",
  language: "en",
  title: "Edge runtimes",
  body: "Cloudflare Workers process requests close to users.",
});

const serialized = serializeDocumentKeywordResult(result);

Document helpers also support lightweight pipeline hooks:

  • beforeExtractText(text, context) for pre-normalization before extraction
  • afterExtractKeywords(keywords, context) for post-ranking pipeline shaping

Bobbin-compatible adapter

import { extractYakeKeywords } from "@ade_oshineye/yaket";

const keywords = extractYakeKeywords(
  "Platform ecosystems reward integration.",
  5,
  3,
);

This preserves Bobbin's current output shape:

type BobbinYakeResult = {
  keyword: string;
  score: number;
};

Custom hooks

import { extractKeywordDetails } from "@ade_oshineye/yaket";

const details = extractKeywordDetails("models model models", {
  n: 1,
  candidateNormalizer: {
    normalize(token) {
      return token.endsWith("s") ? token.slice(0, -1) : token;
    },
  },
  lemmatizer: {
    lemmatize(token) {
      return token;
    },
  },
});

Available extension points:

  • TextProcessor
  • StopwordProvider
  • SimilarityStrategy
  • CandidateNormalizer
  • Lemmatizer
  • SingleWordScorer
  • MultiWordScorer
  • KeywordScorer
  • candidateFilter

lemmatizer stays hook-based in Yaket. Upstream-style string backends such as "spacy" or "nltk" are intentionally not implemented in the extraction core.

Yaket also exports:

  • YakeResult
  • YakeOptions

The two first-class internal scoring hooks are:

  1. singleWordScorer for replacing the internal YAKE single-word score formula
  2. multiWordScorer for replacing the internal YAKE multi-word score formula

Example:

import { extractKeywordDetails } from "@ade_oshineye/yaket";

const details = extractKeywordDetails("agent swarms coordinate teams", {
  language: "en",
  n: 2,
  multiWordScorer: {
    score(candidate) {
      return candidate.size === 2 ? 0.001 : 10;
    },
  },
});

Stopwords and languages

import { STOPWORDS, getStopwordText, supportedLanguages } from "@ade_oshineye/yaket";

console.log(supportedLanguages.includes("en"));
console.log(getStopwordText("en").split("\n").length > 0);
console.log(STOPWORDS.en.includes("the"));

Language lookup uses the first two letters of the requested language code. If a specific stopword list is unavailable, Yaket currently resolves to an empty stopword list.

To extend or replace stopwords without mutating global state:

import { createStaticStopwordProvider, createStopwordSet } from "@ade_oshineye/yaket";

const stopwords = createStopwordSet("en", { add: ["yaket"], remove: ["the"] });

const provider = createStaticStopwordProvider({
  en: stopwords,
  pt: ["um", "uma"],
});

STOPWORDS is exported as a frozen map of bundled raw stopword text for users who want direct access to the packaged lists.

Highlighting

import { TextHighlighter, extractKeywords } from "@ade_oshineye/yaket";

const keywords = extractKeywords("Machine learning improves software delivery.");
const highlighted = new TextHighlighter().highlight(
  "Machine learning improves software delivery.",
  keywords,
);

CLI

yaket --text-input "Google is acquiring Kaggle" --language en --ngram-size 3 --top 5 --verbose

Supported flags:

  • --text-input
  • --input-file
  • --language
  • --ngram-size
  • --dedup-func
  • --dedup-lim
  • --window-size
  • --top
  • --verbose
  • --help

Cloudflare Compatibility

Yaket keeps the extraction core free of runtime filesystem access and Node-only extraction dependencies.

Verification currently includes:

  • source guards for extraction modules
  • browser-target bundling smoke tests
  • a real Cloudflare Workers test lane via @cloudflare/vitest-pool-workers

Run it with:

npm run test:cloudflare

Benchmarks

The repository includes a benchmark harness that compares:

  • Yaket
  • upstream Python YAKE
  • the original Bobbin YAKE-like implementation
  • a simple TF-IDF baseline

Current checked-in report:

  • docs/benchmarks/komoroske-2026-04-06.md

Additional dataset-oriented benchmark support is available for Inspec and SemEval-style evaluation via scripts/benchmark-datasets.ts.

npm run benchmark:datasets

Run it with:

npm run benchmark

Architecture

  • Architecture overview: docs/architecture.md
  • API reference: docs/api-reference.md
  • Use cases: docs/use-cases.md
  • Algorithm drift: docs/algorithm-drift.md
  • Dataset benchmarks: docs/benchmarks/inspec-semeval.md
  • Bobbin integration guide: docs/integrations/bobbin.md
  • Generic pipeline guide: docs/integrations/pipelines.md
  • Releasing guide: docs/releasing.md
  • Contributing: CONTRIBUTING.md
  • Roadmap: docs/roadmap.md
  • Deferred work: TODO.md
  • Audit notes: docs/audits/implementation-audit-2026-04-16.md

Limitations

  • The tokenizer is close to YAKE, but still not a literal segtok port.
  • Dedup seqm behavior is still approximate rather than a byte-for-byte Python clone.
  • Multilingual support exists through bundled stopwords, but broad multilingual parity coverage is deferred.
  • Bobbin adapter validation now covers the Bobbin YAKE, topic-extractor, topic-system, and extraction-quality tests in the reference Bobbin checkout, but that validation still needs to be kept current as Bobbin evolves.

Comparison To Alternatives

| Tool | Strength | Tradeoff vs Yaket | |---|---|---| | TF-IDF | simple, cheap, corpus-aware | less phrase-aware and less YAKE-like on single documents | | RAKE | simple phrase extraction | weaker local-feature scoring and usually cruder ranking | | KeyBERT | embedding-based semantic relevance | larger dependency/runtime cost and often slower | | Yaket | deterministic YAKE-style local-feature extraction in JS | still has some drift from upstream Python YAKE in tokenization and some heuristic edge cases |

For a concrete checked-in comparison, see the Komoroske benchmark report.

Main Use Cases

Yaket is especially well-suited for:

  1. blog/CMS/knowledge-base tagging
  2. newsletter and article topic extraction
  3. search indexing and hybrid retrieval metadata
  4. RAG chunk enrichment without an LLM call
  5. browser extensions and client-side page analysis
  6. chat and Slack bot topic tagging

See docs/use-cases.md for more detail.

Live Demo

An interactive demo page lives in demo/index.html and is intended to be served through GitHub Pages.

GitHub Pages URL:

  • https://adewale.github.io/yaket/

Development

npm install
npm run typecheck
npm test
npm run test:cli:coverage
npm run test:cloudflare
npm run build
npm run check:package
npm run benchmark
npm run verify

test/python-parity.test.ts performs a live comparison against upstream Python YAKE when PYTHONPATH points at a YAKE checkout. The default path used during local development is /tmp/yake.

Mutation testing is configured via Stryker and can be run separately when you want a slower audit-focused pass:

npm run test:mutation

When Not To Use Yaket

  • If you need corpus-wide topic modeling rather than single-document keyword extraction.
  • If you need production-grade lemmatization out of the box today.
  • If exact upstream Python tokenization parity across all languages is a hard requirement right now.

License

MIT