big-file-tokenizer
v1.0.1
Published
Tokenize texts in a big file using streams in Node js and get token count for every unique word in a file <br/> It can tokenize big files for example 100MB files takes approximately 25 seconds to 30 seconds to process. <br/>
Downloads
189
Maintainers
Readme
big-file-tokenizer
Tokenize texts in a big file using streams in Node js and get token count for every unique word in a file It can tokenize big files for example 100MB files takes approximately 25 seconds to 30 seconds to process.
Usage:
import readText, { tokenEventEmitter } from "big-file-tokenizer";
//how many times each word has appeared in the whole file
tokenEventEmitter.on('tokens_count_complete',(data)=>{ console.log(data); });
//only unique words in the whole file
tokenEventEmitter.on('tokens_unique_complete',(data)=>{ console.log(data); });
readText('data/newdata.txt');
Two files will get autogenerated in root project:
1.token count file
2.unique tokens file
