npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

web-tokenizer

v0.0.1

Published

web中文分词器

Downloads

6

Readme

Web Tokenizer

基于jieba.js实现的web中文分词器

📦 Install

pnpm add web-tokenizer

🔨 Usage

ES6

import Tokenizer from 'web-tokenizer';

let loading = true;

new Tokenizer({
    onInit(instance) {
        loading = true
        instance.extract("自然语言处理( Natural Language Processing, NLP)是计算机科学领域与人工智能领域中的一个重要方向。它研究能实现人与计算机之间用自然语言进行有效通信的各种理论和方法。自然语言处理是一门融语言学、计算机科学、数学于一体的科学。因此,这一领域的研究将涉及自然语言,即人们日常使用的语言,所以它与语言学的研究有着密切的联系,但又有重要的区别。自然语言处理并不是一般地研究自然语言,而在于研制能有效地实现自然语言通信的计算机系统,特别是其中的软件系统。因而它是计算机科学的一部分。自然语言处理主要应用于机器翻译、舆情监测、自动摘要、观点提取、文本分类、问题回答、文本语义对比、语音识别、中文OCR等方面。").then((res) => {
            console.log(res);
        });
    },
});

从指定地址获取worker线程

默认的workers地址从unpkg获取,可以配置workersPath指定workers的地址。

拷贝node_modules/web-tokenizer/dist/workers目录到静态服务器,再将workersPath设置为该服务器地址。

例如,在一个vite工程中,将上述的workers目录拷贝的public目录下,配置workersPath: '/workers'

new Tokenizer({
    workersPath: '/workers',
    onInit(instance) {
      instance.extract("自然语言处理( Natural Language Processing, NLP)是计算机科学领域与人工智能领域中的一个重要方向。它研究能实现人与计算机之间用自然语言进行有效通信的各种理论和方法。自然语言处理是一门融语言学、计算机科学、数学于一体的科学。因此,这一领域的研究将涉及自然语言,即人们日常使用的语言,所以它与语言学的研究有着密切的联系,但又有重要的区别。自然语言处理并不是一般地研究自然语言,而在于研制能有效地实现自然语言通信的计算机系统,特别是其中的软件系统。因而它是计算机科学的一部分。自然语言处理主要应用于机器翻译、舆情监测、自动摘要、观点提取、文本分类、问题回答、文本语义对比、语音识别、中文OCR等方面。").then((res) => {
        console.log(res);
      });
    },
});

Browser

<script src="https://unpkg.com/web-tokenizer@latest/dist/web-tokenizer.iife.js"></script>

Options

workersPath

type: string

worker线程的地址。参见从指定地址获取worker线程

workerType

type: "specific" | "share"

使用专用Worker或者共享Worker,默认使用专用Worker(specific)。

onInit

type: (instance: Tokenizer) => void;

初始化完成时的回调。Workers需要一些时间初始化jieba.js,初始化完成后才能使用分词功能。

Instance

Tokenizer实例提供extractcut两个异步分词方法,返回内容的区别参见jieba.js#result