ched-shred

v0.3.0

Published

2 years ago

Lightweight C syntax parser intended for reading header files.

Downloads

0High
0Medium
0Low

duncancross

C headers syntax parser programming metaprogramming language

ched-shred

Lightweight C syntax parser intended for reading .h header files.

Still in early development, not yet ready for any real use!

Interfaces

// a token is an individual unit of C syntax
// 'int' | '{' | '}' | ...
type TokenString = string;

interface Macro extends TokenString[] {
  macroArgumentNames? : string[];
  isVariadic? : boolean;
}

interface MacroSet {
  [macroName:string] : MacroSet;
}

Preprocessor

Main Class

The default export of the 'ched-shred/preprocessor' module is a Readable stream class named Preprocessor. The output of this stream a series of token strings representing the preprocessed code.

Output (Object Mode): TokenStrings
Constructor parameters:
- initialPath : string
- options (optional) : {...}
  - resolvePath(string, string) => string
    - Default: (p1, p2) => path.resolve(path.dirname(p1), p2)
  - createReadStream() : string => stream.Readable
    - Default: (p) => fs.createReadStream(p, 'utf8')
  - trigraphMode: 'replace' | 'ignore' | 'error'
    - Default: 'replace'
  - initialMacros: MacroSet
Properties:
- .macros : MacroSet

Helper Functions

createMacroSet()

Creates and returns a new MacroSet. If a plain object is passed as a parameter, the properties of this object will be added to the new macro set.

Low-Level Transforms

The 'ched-shred/preprocessor' module also contains a number of Transform stream classes that replicate each phase of the C preprocessor.

TrigraphReplaceTransform

Replace the nine trigraphs (??=, ??/, ??', ??(, ??), ??!, ??<, ??>, ??-) with the symbols they represent.

Input (Chunked): Text
Output (Chunked): Text

TrigraphErrorTransform

Pass through input to output unchanged, but throw an error if a trigraph is detected in the input.

Input (Chunked): Text
Output (Chunked): Text

ContinuedLineTransform

Join together one line with the next when the first ends in a \ backslash, with optional whitespace after it.

Input (Chunked): Text
Output (Chunked): Text

CommentToWhitespaceTransform

Replace comments (of the style /*...*/ and //...) with a single space character, ignoring those inside string literals.

Input (Chunked): Text
Output (Chunked): Text

Note that /*...*/-style comments do not nest.

DirectiveSplitTransform

Split text content into preprocessor directives and sections of code. Each code section only includes complete lines of code, so they can be tokenized independently. Directives and code sections are both guaranteed to end with a whitespace character (an extra space will be appended to the final line if there is no whitespace before the end), so that the text will always produce a set of complete tokens if processed by TokenizeTransform.

Input (Chunked): Text
Output (Object Mode):
- ["", "...code section\n"]
- ["#directive", "...directive parameters\n"]

Note that any whitespace between # and the name of a directive is removed in the output.

Comments and line continuators need to be handled first before this transform is applied, or the results are likely to become mangled.

Depending on the input stream there may sometimes be a run of two or more code sections without a directive.

TokenizeTransform

Split the incoming text stream into a series of atomic token strings.

Input (Chunked): Text
Output (Object Mode): TokenStrings

MacroExpansionTransform

Pass through a stream of token strings, expanding macros as they are encountered. Will throw an error if .end() is called while a function-like macro is left unfinished.

Input (Object Mode): TokenStrings
Output (Object Mode): TokenStrings
Constructor parameters:
- macros : MacroSet
Properties:
- macros : MacroSet

Syntax Parser

The 'ched-shred/c-syntax' module contains one main Transform class:

TokenParseTransform

This transform stream takes in a series of preprocessed tokens and emits one object for each complete top-level declaration that it has read.

Input (Object Mode): TokenStrings
Output (Object Mode): Declaration objects, one of the following:
- InitDeclaration
- FunctionDefinition
- LegacyFunctionDefinition

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

ched-shred

Interfaces

Preprocessor

Main Class

Helper Functions

createMacroSet()

Low-Level Transforms

TrigraphReplaceTransform

TrigraphErrorTransform

ContinuedLineTransform

CommentToWhitespaceTransform

DirectiveSplitTransform

TokenizeTransform

MacroExpansionTransform

Syntax Parser

TokenParseTransform