tokensize

v1.0.0

Published

3 years ago

The `tokenizer` function uses the `js-tiktoken` library to encode the input string into tokens using the GPT-2 encoding scheme. It then decodes the tokens back into strings, maps the tokens to their positions in the input string using the `mapTokensToChun

0High
0Medium
0Low

charlkruger

NPM Module Documentation

The tokenizer function takes a string as input and returns an object with the following properties:

count: the number of tokens in the input string
characters: the number of characters in the input string
text: the original input string
tokens: an array of objects, where each object represents a token and its position in the input string. Each token object has the following properties:
- token: the token string
- start: the starting index of the token in the input string
- end: the ending index of the token in the input string

The tokenizer function uses the js-tiktoken library to encode the input string into tokens using the GPT-2 encoding scheme. It then decodes the tokens back into strings, maps the tokens to their positions in the input string using the mapTokensToChunks function, and returns the resulting object.

Usage

To use this module, you can import the tokenizer function and call it with a string argument. Here's an example:

import { tokenizer } from 'your-module-name';

const input = 'This is a sample input string.';
const result = await tokenizer(input);

console.log(result);
/*
{
  count: 7,
  characters: 28,
  text: 'This is a sample input string.',
  tokens: [
    { token: 'This', start: 0, end: 3 },
    { token: 'Ġis', start: 5, end: 7 },
    { token: 'Ġa', start: 8, end: 8 },
    { token: 'Ġsample', start: 10, end: 16 },
    { token: 'Ġinput', start: 18, end: 22 },
    { token: 'Ġstring', start: 24, end: 29 },
    { token: '.', start: 29, end: 29 }
  ]
}
*/

Pkg
Stats

Discover Tips

General search

Package details

User packages

Sponsor

About

Twitter

GitHub

Twitter

GitHub

Site

Open Software & Tools

Framework

Server

Data Store

Caching

CSS / Styling

Typeface

Avatars

Data Viz

Date formatting

Infinite scrolling

Markdown rendering

Repository url parsing

User data

Compiling

Types

Odds & Ends

tokensize

v1.0.0

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

NPM Module Documentation

Usage