@tamim.jabr/parser
v1.0.6
Published
It is a package to help you parse a string into different types of sentences
Readme
Parser
It is a package to help you parse a string into different types of sentences
How to install it?
npm i @tamim.jabr/parserHow to import it?
import { TokenizerFactory, Document } from '@tamim.jabr/parser'How to use it?
A tokenizer is created by sending the grammar object
import { TokenizerFactory, Document } from '@tamim.jabr/parser'
const doc = new Document()
const tokenizerFactory = new TokenizerFactory()
const tokenizer = tokenizerFactory.getTokenizer(
'Hello! it is the string that will be parsed! did you know that? really? good for you.'
)
doc.parse(tokenizer)
// sentences is an array with objects of the type Sentence
const sentences = doc.getSentences()
for (let i = 0; i < sentences.length; i++) {
const singleSentence = sentences[i]
console.log(singleSentence.getWordTokens())
console.log(singleSentence.getEndType())
console.log(singleSentence.toString())
}
//the following image shows the output from the console:
// it is possible to get only one type of the sentences using the following methods:
const regularSentences = doc.getRegularSentences()
const questionSentences = doc.getQuestionSentences()
const exclamationSentences = doc.getExclamationSentences()
Public Interface (Methods to use):
On the document object:
- parse(tokenizer). The method takes a tokenizer as a parameter. Tokenizer can we get using the tokenizer factory to get a tokenizer that is compatible with the parser because the parser only support sentences that end with one of the following: ! ? .
- getSentences() return an array of Sentence objects
- getRegularSentences() return an array with only RegularSentence objects
- getExclamationSentences()return an array with only ExclamationSentence objects
- getQuestionSentences() return an array with only QuestionSentence objects
On sentence object:
- getWordTokens() returns words objects with tokenType and tokenValue for every object
- getEndType() returns the end type of the sentnece which is one of the following: DOT, EXCLAMATION_MARK or QUESTION_MARK
- toString() returns the sentence as string with one space between words and the end type character at the end.
Errors:
- parse(tokenizer) throws error of the type InvalidEndtypeError when the there is no end for the sentence. example:
const tokenizer = tokenizerFactory.getTokenizer('hello ')
document.parse(tokenizer)
// error:Invalid end type of a sentence- parse(tokenizer) throws error of the type InvalidSentenceError when it detects an end type character without words before. example:
const tokenizer = tokenizerFactory.getTokenizer('hello. !')
document.parse(tokenizer)
// error:! is an invalid sentence