tokeniser-package

v1.0.3

Published

6 months ago

A simple tokenizer for encoding/decoding text into numeric tokens.

0High
0Medium
0Low

adi1007

tokenizer encode decode text

Hey Reader 📗🤓!

This is going to be a fun journey to learning Agentic AI. There is excitement in the air and can't wait to contribute more to this repo 🔥🔥

The repo contents are as below:

✅ Tokeniser

What is a Tokeniser? 🤔

-> A tokenizer can be considered as the initial step in the input -> GPT -> output process. Computers unlike humans only understand numbers, so in order to process inputs from a human, the first step naturally that comes is conversion of the input into something which is transformer(GPT) understandable. Here comes the role of a tokenizer. A tokenizer is basically a program that takes an input and converts/ encodes it into a format which the transformer is trained to understand, it can be as simple a converting GPT -> "71620" (mapping of a character to its alphabetic position).

Tokeniser Class Explanation:

💥💥 Logic : The fundamental logic behind the Tokeniser Class is incremental chunking of the word & the base conversion is ASCII based. eg: "Cat chases a dog" will be encoded as ['67','97|116','32','99','104|97','115|101|115','32','97','32','100','111|103']

Encoding is done by taking each word and dividing it into chunks, eg: Cat -> 'C' (chunk of size 1) -> 'at'(chunk of size 2) . For a chunk size greater than that of 1, the encoded string will be seperated by a |(pipe).

For better optimization, the encoding is memoized too 🔥🔥

How to get Started?? 🤔🤔

SIMPLE!!

Step 1: npm i tokeniser-package

Step 2: import the Tokenizer Class from tokeniser-package i.e.

   import { Tokenizer } from "tokeniser-package";

Step 3: use the exposed methods "encode" / "decode" from the class instance

Pkg
Stats

Discover Tips

General search

Package details

User packages

Sponsor

About

Twitter

GitHub

Twitter

GitHub

Site

Open Software & Tools

Framework

Server

Data Store

Caching

CSS / Styling

Typeface

Avatars

Data Viz

Date formatting

Infinite scrolling

Markdown rendering

Repository url parsing

User data

Compiling

Types

Odds & Ends

tokeniser-package

v1.0.3

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme