@tamim.jabr/parser

v1.0.6

Published

4 years ago

It is a package to help you parse a string into different types of sentences

0High
0Medium
0Low

tamim.jabr

Parser

It is a package to help you parse a string into different types of sentences

How to install it?

npm i @tamim.jabr/parser

How to import it?

import { TokenizerFactory, Document } from '@tamim.jabr/parser'

How to use it?

A tokenizer is created by sending the grammar object

import { TokenizerFactory, Document } from '@tamim.jabr/parser'

const doc = new Document()
const tokenizerFactory = new TokenizerFactory()
const tokenizer = tokenizerFactory.getTokenizer(
  'Hello! it is the string that will be parsed! did you know that? really? good for you.'
)
doc.parse(tokenizer)
// sentences is an array with objects of the type Sentence
const sentences = doc.getSentences()

for (let i = 0; i < sentences.length; i++) {
  const singleSentence = sentences[i]
  console.log(singleSentence.getWordTokens())
  console.log(singleSentence.getEndType())
  console.log(singleSentence.toString())
}
//the following image shows the output from the console:

// it is possible to get only one type of the sentences using the following methods:
const regularSentences = doc.getRegularSentences()
const questionSentences = doc.getQuestionSentences()
const exclamationSentences = doc.getExclamationSentences()

Public Interface (Methods to use):

On the document object:

parse(tokenizer). The method takes a tokenizer as a parameter. Tokenizer can we get using the tokenizer factory to get a tokenizer that is compatible with the parser because the parser only support sentences that end with one of the following: ! ? .
getSentences() return an array of Sentence objects
getRegularSentences() return an array with only RegularSentence objects
getExclamationSentences()return an array with only ExclamationSentence objects
getQuestionSentences() return an array with only QuestionSentence objects

On sentence object:

getWordTokens() returns words objects with tokenType and tokenValue for every object
getEndType() returns the end type of the sentnece which is one of the following: DOT, EXCLAMATION_MARK or QUESTION_MARK
toString() returns the sentence as string with one space between words and the end type character at the end.

Errors:

parse(tokenizer) throws error of the type InvalidEndtypeError when the there is no end for the sentence. example:

      const tokenizer = tokenizerFactory.getTokenizer('hello  ')
      document.parse(tokenizer)
      // error:Invalid end type of a sentence

parse(tokenizer) throws error of the type InvalidSentenceError when it detects an end type character without words before. example:

      const tokenizer = tokenizerFactory.getTokenizer('hello. !')
      document.parse(tokenizer)
      // error:! is an invalid sentence

Published

Vulnerabilities

Links

Maintainers

Keywords