tif-parser

v0.3.3

Published

4 years ago

Tab Indented Format Parser used as first stage parser.

Downloads

0High
0Medium
0Low

cronvel

tif tab indented format parser stage csk

tifParser: Tab Indented Format Parser

This is a simple parser for Tab Indented Format (TIF, the spec of this format is currently in progress).

It transforms a text file into nested objects.

While it is too general to produce anything by itself, it is actually the first stage of more complicated and specific parsers.

License: MIT
Current status: alpha / unstable
Platform: Node.js only

Full BDD spec generated by Mocha:

Single line

when empty, should return a root node without child.

expect( tifParser.parse( '' ) ).to.eql( {
	depth: 0,
	children: []
} ) ;

with a single word, should return a root node with one child, no comment and the word as content.

expect( tifParser.parse( 'Hello' ) ).to.eql( {
	depth: 0,
	children: [ {
		depth: 1,
		line: 1,
		content: 'Hello',
		comment: undefined,
		children: []
	} ]
} ) ;

with a many words, should return a root node with one child, no comment and all words as content.

expect( tifParser.parse( 'Hello world! How are you?' ) ).to.eql( {
	depth: 0,
	children: [ {
		depth: 1,
		line: 1,
		content: 'Hello world! How are you?',
		comment: undefined,
		children: []
	} ]
} ) ;

with a comment, should return a root node with one child, no content and all words as comment.

expect( tifParser.parse( '# This is a comment' ) ).to.eql( {
	depth: 0,
	children: [ {
		depth: 1,
		line: 1,
		content: undefined,
		comment: 'This is a comment',
		children: []
	} ]
} ) ;

with content and comment, should return a root node with one child, having content and comment.

expect( tifParser.parse( 'Hello world! # This is a comment' ) ).to.eql( {
	depth: 0,
	children: [ {
		depth: 1,
		line: 1,
		content: 'Hello world!',
		comment: 'This is a comment',
		children: []
	} ]
} ) ;

should trim whitespace in both content and comment.

expect( tifParser.parse( '  \t Hello world! \t   #   \t  This is a comment \t   ' ) ).to.eql( {
	depth: 0,
	children: [ {
		depth: 1,
		line: 1,
		content: 'Hello world!',
		comment: 'This is a comment',
		children: []
	} ]
} ) ;

should thow error if the line is indented.

try {
	tifParser.parse( '\tHello world!' ) ;
	expect().fail( 'An exception must have been thrown' ) ;
}
catch ( error ) {
	expect( error.message ).to.match( /Indentation error/ ) ;
}

Custom comment mark

with a comment, should return a root node with one child, no content and all words as comment.

expect( tifParser.parse( '# This is *NOT* a comment' , { commentMark: '//' } ) ).to.eql( {
	depth: 0,
	children: [ {
		depth: 1,
		line: 1,
		content: '# This is *NOT* a comment',
		comment: undefined,
		children: []
	} ]
} ) ;

with a comment, should return a root node with one child, no content and all words as comment.

expect( tifParser.parse( '// This is a comment' , { commentMark: '//' } ) ).to.eql( {
	depth: 0,
	children: [ {
		depth: 1,
		line: 1,
		content: undefined,
		comment: 'This is a comment',
		children: []
	} ]
} ) ;

with content and comment, should return a root node with one child, having content and comment.

expect( tifParser.parse( 'Hello world! # This is *NOT* a comment' , { commentMark: '//' } ) ).to.eql( {
	depth: 0,
	children: [ {
		depth: 1,
		line: 1,
		content: 'Hello world! # This is *NOT* a comment',
		comment: undefined,
		children: []
	} ]
} ) ;

with content and comment, should return a root node with one child, having content and comment.

expect( tifParser.parse( 'Hello world! // This is a comment' , { commentMark: '//' } ) ).to.eql( {
	depth: 0,
	children: [ {
		depth: 1,
		line: 1,
		content: 'Hello world!',
		comment: 'This is a comment',
		children: []
	} ]
} ) ;

Escape sequences

\n should be replaced by a newline.

expect( tifParser.parse( 'This content have\\na newline' ) ).to.eql( {
	depth: 0,
	children: [ {
		depth: 1,
		line: 1,
		content: 'This content have\na newline',
		comment: undefined,
		children: []
	} ]
} ) ;

\r should be replaced by a carriage return.

expect( tifParser.parse( 'This content have\\ra carriage return' ) ).to.eql( {
	depth: 0,
	children: [ {
		depth: 1,
		line: 1,
		content: 'This content have\ra carriage return',
		comment: undefined,
		children: []
	} ]
} ) ;

\t should be replaced by a tab.

expect( tifParser.parse( 'This content have\\ta tab' ) ).to.eql( {
	depth: 0,
	children: [ {
		depth: 1,
		line: 1,
		content: 'This content have\ta tab',
		comment: undefined,
		children: []
	} ]
} ) ;

> should be replaced by a >.

expect( tifParser.parse( 'Parent\n\t\\> Not\n\ta block' ) ).to.eql( {
	depth: 0,
	children: [ {
		depth: 1,
		line: 1,
		content: 'Parent',
		comment: undefined,
		children: [
			{
				depth: 2,
				line: 2,
				content: '> Not',
				comment: undefined,
				children: []
			} ,
			{
				depth: 2,
				line: 3,
				content: 'a block',
				comment: undefined,
				children: []
			}
		]
	} ]
} ) ;

\ should be replaced by a single backslash.

expect( tifParser.parse( 'This content have \\\\ a single backslash' ) ).to.eql( {
	depth: 0,
	children: [ {
		depth: 1,
		line: 1,
		content: 'This content have \\ a single backslash',
		comment: undefined,
		children: []
	} ]
} ) ;

# should escape the comment.

expect( tifParser.parse( 'This is not \\# a comment' ) ).to.eql( {
	depth: 0,
	children: [ {
		depth: 1,
		line: 1,
		content: 'This is not # a comment',
		comment: undefined,
		children: []
	} ]
} ) ;

\# should NOT escape the comment.

expect( tifParser.parse( 'This is \\\\# a comment' ) ).to.eql( {
	depth: 0,
	children: [ {
		depth: 1,
		line: 1,
		content: 'This is \\',
		comment: 'a comment',
		children: []
	} ]
} ) ;

\# should escape the comment.

expect( tifParser.parse( 'This is not \\\\\\# a comment' ) ).to.eql( {
	depth: 0,
	children: [ {
		depth: 1,
		line: 1,
		content: 'This is not \\# a comment',
		comment: undefined,
		children: []
	} ]
} ) ;

\\# should NOT escape the comment.

expect( tifParser.parse( 'This is \\\\\\\\# a comment' ) ).to.eql( {
	depth: 0,
	children: [ {
		depth: 1,
		line: 1,
		content: 'This is \\\\',
		comment: 'a comment',
		children: []
	} ]
} ) ;

should be able to process a mixing of escape sequences.

expect( tifParser.parse( 'mixing\\nmultiple\\rescape\\t\\t\\tsequences\\#\\#\\\\\\t\\n\\t' ) ).to.eql( {
	depth: 0,
	children: [ {
		depth: 1,
		line: 1,
		content: 'mixing\nmultiple\rescape\t\t\tsequences##\\\t\n\t',
		comment: undefined,
		children: []
	} ]
} ) ;

Sample file

#1 comparison.

var parsed = tifParser.parse( fs.readFileSync( 'sample1.txt' ).toString() ) ;

var expected = {
	depth: 0,
	children: [
		{
			depth: 1,
			line: 2,
			content: undefined,
			comment: "Starting comment",
			children: []
		},
		{
			depth: 1,
			content: "Big One",
			line: 4,
			comment: undefined,
			children: [
				{
					depth: 2,
					line: 5,
					content: "medium1",
					comment: undefined,
					children: []
				},
				{
					depth: 2,
					line: 6,
					content: "medium2",
					comment: undefined,
					children: [
						{
							depth: 3,
							line: 7,
							content: undefined,
							comment: "Comment #1",
							children: []
						},
						{
							depth: 3,
							line: 8,
							content: "little1",
							comment: undefined,
							children: []
						},
						{
							depth: 3,
							line: 9,
							content: undefined,
							comment: "Comment #2",
							children: []
						},
						{
							depth: 3,
							line: 10,
							content: "little2",
							comment: undefined,
							children: []
						}
					]
				},
				{
					depth: 2,
					line: 11,
					content: "medium3",
					comment: "with a comment",
					children: [
						{
							depth: 3,
							line: 12,
							content: undefined,
							comment: "Comment #3",
							children: []
						}
					]
				},
				{
					depth: 2,
					line: 13,
					content: "medium4 with many words",
					comment: undefined,
					children: []
				},
				{
					depth: 2,
					line: 14,
					content: "medium5 with many words",
					comment: "and a comment",
					children: []
				}
			]
		},
		{
			depth: 1,
			line: 18,
			content: "Big2",
			comment: undefined,
			children: [
				{
					depth: 2,
					line: 22,
					content: "not a #comment here",
					comment: undefined,
					children: []
				}
			]
		},
		{
			depth: 1,
			line: 20,
			content: undefined,
			comment: "The next node is not a child of this comment",
			children: []
		} ,
		{
			depth: 1,
			line: 25,
			content: "Big3",
			comment: undefined,
			children: [
				{
					depth: 2,
					line: 26,
					content: "This is\na multi-line\nblock of\ntext",
					comment: undefined,
					children: []
				} ,
				{
					depth: 2,
					line: 31,
					content: "This is\n  another one\n\tand this # is not a comment\n> This is still the same block of text",
					comment: undefined,
					children: []
				} ,
				{
					depth: 2,
					line: 36,
					content: "A regular text-node",
					comment: undefined,
					children: []
				} ,
				{
					depth: 2,
					line: 38,
					content: "One line",
					comment: undefined,
					children: []
				} ,
			]
		} ,
		{
			depth: 1,
			line: 39,
			content: "This does not belong to the previous block",
			comment: undefined,
			children: []
		}
	]
} ;

expect( tree.diff( parsed , expected ) ).to.equal( null ) ;
expect( parsed ).to.eql( expected ) ;

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

tifParser: Tab Indented Format Parser

TOC

Single line

Custom comment mark

Escape sequences

Sample file