npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

ssfst

v2.0.0

Published

Subsequential Finite State Transducer. Given an input text, produces a new text by applying a fixed set of rewrite rules.

Downloads

25

Readme

Subsequential Finite State Transducer

Build Status Coverage Status Code Climate Dependencies

Given an input text, produces a new text by applying a fixed set of rewrite rules. The algorithm builds a minimal subsequential transducer and uses the "leftmost largest match" replacement strategy with skips. No overlap between the replaced parts is possible. The time needed to compute the transducer is linear in the size of the input dictionary. For any text t of length |t| the time it takes to perform a rewrite is also linear O(|t|+|t'|) where t' denotes the resulting output string.
Check out the Online Sandbox.

Usage

npm i --save ssfst

Example: Text Rewriting

const ssfst = require('ssfst');

const spellingCorrector = new ssfst.init([
    { input: 'acheive', output: 'achieve'},
    { input: 'arguement', output: 'argument'},
    { input: 'independant', output: 'independent'},
    { input: 'posession', output: 'possession'},
    { input: 'mercy less', output: 'merciless' }
]);

spellingCorrector.process('independant'); // => "independent"
spellingCorrector.process('mercy less arguement'); // => "merciless argument"
spellingCorrector.process('they acheived a lot'); // => "they achieved a lot"

The init factory function takes a collection of pairs and returns a transducer. The transducer can be initialized by any iterable object.

function* dictGen() {
    yield { input: 'dog', output: '<a href="https://en.wikipedia.org/wiki/Dog">dog</a>' };
    yield { input: 'fox', output: '<a href="https://en.wikipedia.org/wiki/Fox">fox</a>' };
}

const transducer = ssfst.init(dictGen());
transducer.process('The quick brown fox jumped over the lazy dog.');
/* => The quick brown <a href="https://en.wikipedia.org/wiki/Fox">fox</a> jumped over the lazy <a href="https://en.wikipedia.org/wiki/Dog">dog</a>. */

Working with large datasets

Loading the full rewrite dictionary in memory is not optimal when working with large datasets. In this case we want to build the transducer by adding the entries asynchronously one at a time. This is achieved by using an async iterable.

For example, if our dataset is stored in a file, we can read its contents one line at a time.

Berlin,Germany
Buenos Aires,Argentina
London,United Kingdom
Sofia,Bulgaria
Tokyo,Japan

This is the dictionary text file. Each line contains an entry and its input and output values are separated by a comma. We implement a generator function which reads it asynchronously line by line and yields an object which is consumed by the initialization of the transducer.

const fs = require('fs');
const readline = require('readline');
const ssfst = require('ssfst');

async function* readLinesGenAsync() {
    const lineReader = readline.createInterface({
        input: fs.createReadStream(__dirname + '/capitals.txt')
    });

    for await (const line of lineReader) {
        const [input, output] = line.split(',');
        yield { input, output };
    }
}

We pass the async iterable to the initAsync factory function.

const transducer = await ssfst.initAsync(readLinesGenAsync());

Example: Key-Value Store

Due to its minimality, the subsequential transducer can also be used to efficiently store key-value pairs.

const val = transducer.process('Sofia'); // => Bulgaria
const invalid = transducer.process('Unknown Key'); // => Unknown Key

If there's no value for a given key, it will return the key itself, which simply reduces to processing a text without applying any rewrite rules.

Use with TypeScript

import * as ssfst from 'ssfst';

Run Locally

git clone https://github.com/deniskyashif/ssfst.git
cd ssfst
npm i

Sample implementations can be found at examples/.

Run the Tests

npm t

References

This implementation follows the construction presented in "Efficient Dictionary-Based Text Rewriting using Subsequential Transducers" by S. Mihov, K. Schulz