npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@milicazm/selfies-js

v2.0.1

Published

JavaScript/TypeScript implementation of SELFIES - a molecular string representation for machine learning in chemistry

Readme

selfies-js

npm version License

JavaScript/TypeScript implementation of SELFIES (SELF-referencIng Embedded Strings), a molecular string representation designed for machine learning applications in chemistry.

About

SELFIES is a string-based representation of molecules where every SELFIES string corresponds to a chemically valid molecule. This property makes SELFIES particularly useful for generative models and optimization algorithms in computational chemistry, where traditional representations like SMILES can produce invalid outputs.

This library provides a complete JavaScript/TypeScript implementation compatible with the Python SELFIES library (v2.2.0). It has been validated on 597,707 molecules from 14 datasets with 99.997% structure equivalence.

Use Cases

  • Generative models for drug discovery and materials science
  • Molecular optimization algorithms
  • Variational autoencoders (VAEs) for molecular design
  • Reinforcement learning applications
  • Web-based molecular tools and visualizations
  • Machine learning pipelines requiring molecular representations

Features

  • Complete encoder and decoder with aromatic molecule support
  • Automatic kekulization of aromatic SMILES
  • TypeScript type definitions
  • Zero runtime dependencies
  • Compatible with Node.js 14+
  • Semantic constraint checking for valid molecular structures

Installation

npm install @milicazm/selfies-js

or

yarn add @milicazm/selfies-js

Quick Start

import { encoder, decoder } from 'selfies-js';

// SMILES to SELFIES
const selfies = encoder('CCO');
console.log(selfies); // [C][C][O]

// SELFIES to SMILES
const smiles = decoder('[C][C][O]');
console.log(smiles); // CCO

// Aromatic molecules
const benzene = encoder('c1ccccc1');
console.log(benzene); // [C][=C][C][=C][C][=C][Ring1][=Branch1]

API Reference

Encoding & Decoding

encoder(smiles, strict?, attribute?)

Converts a SMILES string to SELFIES.

import { encoder } from 'selfies-js';

// Basic encoding
const selfies = encoder('CCO');
// Returns: '[C][C][O]'

// With strict checking (validates bond constraints)
const selfies2 = encoder('CCO', true);

// With attribution tracking
const [selfies3, attribution] = encoder('CCO', true, true);

Parameters:

  • smiles (string): Input SMILES string
  • strict (boolean, optional): Check semantic constraints (default: true)
  • attribute (boolean, optional): Return attribution information (default: false)

Returns: SELFIES string, or [selfies, attribution] if attribute=true

decoder(selfies, printError?)

Converts a SELFIES string to SMILES.

import { decoder } from 'selfies-js';

const smiles = decoder('[C][C][O]');
// Returns: 'CCO'

// With error printing disabled
const smiles2 = decoder('[C][C][O]', false);

Parameters:

  • selfies (string): Input SELFIES string
  • printError (boolean, optional): Print errors to console (default: true)

Returns: SMILES string

Utility Functions

lenSelfies(selfies)

Returns the number of symbols in a SELFIES string.

import { lenSelfies } from 'selfies-js';

lenSelfies('[C][=C][F]'); // Returns: 3

splitSelfies(selfies)

Tokenizes a SELFIES string into individual symbols.

import { splitSelfies } from 'selfies-js';

const symbols = Array.from(splitSelfies('[C][=C][F]'));
// Returns: ['[C]', '[=C]', '[F]']

getAlphabetFromSelfies(selfiesList)

Extracts the alphabet of symbols from a collection of SELFIES strings.

import { getAlphabetFromSelfies } from 'selfies-js';

const alphabet = getAlphabetFromSelfies(['[C][C][O]', '[C][=C][F]']);
// Returns: Set(['[C]', '[O]', '[=C]', '[F]'])

Machine Learning Utilities

selfiesEncoder.label_encode(selfiesList)

Encodes SELFIES strings as sequences of integers (label encoding).

import { selfiesEncoder } from 'selfies-js';

const encoded = selfiesEncoder.label_encode(['[C][C][O]', '[C][=C][F]']);
// Returns integer sequences suitable for ML models

selfiesEncoder.one_hot_encode(selfiesList)

Encodes SELFIES strings as one-hot matrices.

import { selfiesEncoder } from 'selfies-js';

const encoded = selfiesEncoder.one_hot_encode(['[C][C][O]', '[C][=C][F]']);
// Returns one-hot encoded matrices

Bond Constraints

Manage semantic constraints for valid molecular structures.

import { 
  getSemanticConstraints,
  setSemanticConstraints 
} from 'selfies-js';

// Get current constraints
const constraints = getSemanticConstraints();

// Set preset constraints
setSemanticConstraints('octet_rule');  // Strict octet rule
setSemanticConstraints('hypervalent'); // Allow hypervalent atoms
setSemanticConstraints('default');     // Default settings

// Set custom constraints
setSemanticConstraints({
  'C': 4,  // Carbon: max 4 bonds
  'N': 3,  // Nitrogen: max 3 bonds
  'O': 2,  // Oxygen: max 2 bonds
  '?': 8   // Default for other elements
});

Examples

Encoding Complex Structures

import { encoder, decoder } from 'selfies-js';

// Caffeine
const caffeine = encoder('CN1C=NC2=C1C(=O)N(C(=O)N2C)C');
console.log(caffeine);
// [C][N][C][=N][C][=C][Ring1][Branch1][C][=Branch1][C][=O][N]...

// Aromatic molecules
const toluene = encoder('Cc1ccccc1');
const pyridine = encoder('c1ccncc1');
const naphthalene = encoder('c1ccc2ccccc2c1');

// Roundtrip verification
const roundtrip = decoder(encoder('CCO'));
console.log(roundtrip); // CCO

Building a Molecular Dataset

import { encoder, getAlphabetFromSelfies } from 'selfies-js';

// Convert SMILES dataset to SELFIES
const smilesDataset = ['CCO', 'C=CF', 'c1ccccc1'];
const selfiesDataset = smilesDataset.map(smiles => encoder(smiles));

// Extract alphabet for ML model
const alphabet = getAlphabetFromSelfies(selfiesDataset);
console.log(alphabet.size); // Number of unique symbols

// Create symbol-to-index mapping
const vocab = Array.from(alphabet);
const symbolToIdx = new Map(vocab.map((sym, i) => [sym, i]));

Using in Machine Learning

import { selfiesEncoder } from 'selfies-js';

const molecules = [
  '[C][C][O]',
  '[C][=C][F]',
  '[C][C](C)[C]'
];

// Label encoding for RNNs/LSTMs
const labelEncoded = selfiesEncoder.label_encode(molecules, {
  pad_to_len: 10,
  pad_with: '[nop]'
});

// One-hot encoding for CNNs
const oneHotEncoded = selfiesEncoder.one_hot_encode(molecules, {
  pad_to_len: 10,
  pad_with: '[nop]'
});

// Decode back from labels
const decoded = selfiesEncoder.label_decode(labelEncoded);

CLI Tool

For quick testing, use the included CLI tool:

# Roundtrip test (encode + decode)
node selfies-cli.js "CCO"

# Encode only
node selfies-cli.js encode "c1ccccc1"

# Decode only
node selfies-cli.js decode "[C][=C][F]"

# Using npm script
npm run selfies -- "CCO"

Examples Directory

Check out the examples/ directory for:

  • basic-usage.js - Simple encoding/decoding examples
  • ml-encoding.js - Preparing data for machine learning
  • pharmaceutical-molecules.js - Complex drug-like structures

Run examples:

node examples/basic-usage.js
node examples/ml-encoding.js
node examples/pharmaceutical-molecules.js

Validation

This implementation has been tested against the Python SELFIES library (v2.2.0) on 597,707 molecules from multiple chemical datasets:

  • Encoder success rate: 100% on valid molecules (597,420/597,707)
  • Structure equivalence: 99.997% (597,404/597,420 successful encodings)
  • Semantic constraints: 100% agreement (287 violations correctly identified)
  • Decoder accuracy: 100% (identical SMILES output to Python version)
  • Exact SELFIES match: 69.6% (415,700/597,420 molecules)

The lower exact match rate is due to kekulization variants in aromatic molecules, where both implementations produce valid but different arrangements of double bonds. The high structure equivalence rate (99.997%) confirms that decoded molecules are chemically identical.

Compatibility Notes

This implementation maintains API compatibility with Python SELFIES v2.2.0. The decoder produces identical output. The encoder may produce different kekulization patterns for aromatic molecules, but these are chemically equivalent to the Python output.

Known Limitations

Complex fused aromatic systems (<0.1% of molecules) may exhibit different kekulization patterns compared to the Python implementation. While this results in different SELFIES encodings, the decoded structures are chemically equivalent and canonical SMILES comparison confirms structural identity. Examples include certain naphthalene derivatives with multiple fused rings. This does not affect chemical correctness or the validity of the SELFIES representation.

TypeScript Support

Full TypeScript definitions are included:

import type { 
  ConstraintType,
  EncodingType,
  AttributionMap 
} from 'selfies-js';

function processMolecule(smiles: string): string {
  const selfies: string = encoder(smiles);
  return selfies;
}

Browser Usage

SELFIES-JS works in browsers via bundlers (webpack, rollup, vite):

import { encoder, decoder } from 'selfies-js';

// Use in React, Vue, Angular, etc.
function MoleculeConverter({ smiles }) {
  const selfies = encoder(smiles);
  return <div>{selfies}</div>;
}

Citation

If you use this library in your research, please cite the original SELFIES paper:

@article{krenn2020self,
  title={Self-referencing embedded strings (SELFIES): A 100% robust molecular string representation},
  author={Krenn, Mario and H{\"a}se, Florian and Nigam, AkshatKumar and Friederich, Pascal and Aspuru-Guzik, Al{\'a}n},
  journal={Machine Learning: Science and Technology},
  volume={1},
  number={4},
  pages={045024},
  year={2020},
  publisher={IOP Publishing}
}

License

Apache License 2.0

Related

Acknowledgments

This library is a JavaScript/TypeScript implementation based on the Python SELFIES library by Mario Krenn, Alston Lo, and the Aspuru-Guzik group.