npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

tree-sitter-sysml

v0.1.0

Published

SysML v2 grammar for tree-sitter

Downloads

99

Readme

tree-sitter-sysml

pipeline status parse coverage npm crates

Tree-sitter grammar for SysML v2, the next-generation systems modeling language from the OMG.

SysML v2 replaces the diagram-centric SysML v1 with a textual notation designed for Model-Based Systems Engineering (MBSE). This parser turns that textual notation into concrete syntax trees that editors, linters, and developer tools can consume.

Why This Exists

SysML v2 is a large language — roughly 120 grammar rules covering packages, definitions, usages, constraints, requirements, state machines, actions, flows, views, and more. The only existing parser is the Xtext-based pilot implementation from the OMG, which is tightly coupled to Eclipse.

This tree-sitter grammar provides a standalone, incremental parser with no IDE dependency. Our primary use case is embedding it in Rust CLI tools and MCP (Model Context Protocol) servers for AI-assisted systems engineering — but it works anywhere tree-sitter does: Neovim, Helix, Zed, VS Code, Emacs, and any application using the tree-sitter C library.

Status

Parse coverage is tested on every push against 393 real-world SysML v2 files from 8 independent sources (see badge above).

| Metric | Value | |--------|-------| | Corpus Tests | 192 passing | | Negative Tests | 18 (12 syntactic, 6 structural) | | External File Coverage | 393 files across 8 corpora | | Bindings | C, Rust, Go, Python, Node.js, Swift | | Queries | highlights, tags, locals, folds, indents |

See parse-coverage.md for per-corpus breakdown and details on any unparseable files.

How the Corpus Was Assembled

Most tree-sitter grammars have the luxury of millions of open-source files to test against. SysML v2 does not — the language was published in 2023 and adoption is early. We assembled test material from every public source we could find:

| Source | Files | Description | |--------|-------|-------------| | OMG Training sysml/src/training/ | 100 | Official tutorial files covering all major constructs | | OMG Examples sysml/src/examples/ | 95 | Additional worked examples from the spec authors | | OMG Validation sysml/src/validation/ | 56 | Validation suite from the reference implementation | | OMG Standard Library sysml.library/ | 58 | Library definitions (KerML + SysML base types) | | Sensmetry Advent | 44 | Community examples from "Advent of SysML v2" | | GfSE Models | 36 | German systems engineering society models | | SYSMOD | 1 | SYSMOD methodology example | | Sensmetry SmartHome | 3 | Smart home hub example | | Total | 393 | |

The training files were the development target — every grammar change was validated against all 100 training files. The remaining corpora serve as independent validation: the grammar was never specifically tuned to pass them, so their pass rate reflects genuine generalization.

We need more corpus. If you have SysML v2 files (from coursework, research, industry projects, or personal experiments), we would love to test against them. Even files that break the parser are valuable — especially those. See Contributing.

Grammar Approach

The Brute-Force Strategy

This grammar was developed empirically, not derived from the SysML v2 KEBNF specification. The approach:

  1. Start with the simplest possible grammar rules
  2. Try to parse a training file
  3. When it fails, look at the error, add or modify the rule
  4. Regenerate, re-test all files, repeat

This "brute-force" loop ran for hundreds of iterations. The result is a grammar that reliably parses real SysML v2, but makes pragmatic trade-offs that a spec-derived grammar would not.

Trade-offs

Over-acceptance (deliberate). The grammar does not enforce context-sensitive body rules. For example, a control_node (only valid inside action bodies) will parse without error inside a part body. This keeps the grammar simpler and more resilient to spec evolution, at the cost of accepting some invalid programs. Editors and linters should handle semantic validation — the parser's job is to produce a usable tree.

Flat member lists. Rather than maintaining separate member type lists for structural vs. behavioral contexts (which the spec requires), every body accepts a unified _usage_member rule. This avoids exponential conflict growth in the LR parse table.

Expression precedence is approximated. Binary operators use prec.left following standard mathematical convention, which may not match the SysML v2 spec in edge cases.

Could This Be Done Better?

Almost certainly. Some ideas we haven't tried:

  • Derive the grammar from the KEBNF — The SysML v2 specification includes a formal grammar in KEBNF notation. A careful translation to tree-sitter rules could produce a more precise parser, but KEBNF uses features (like ordered alternation) that don't map directly to tree-sitter's GLR parser.
  • Use an external scanner — For constructs like implicit action bodies (brace-less blocks), an external scanner could maintain context state. We avoided this to keep the grammar self-contained.
  • Context-sensitive body rules — Separate member lists per body type (structural, behavioral, etc.) would reject more invalid syntax but at significant grammar complexity cost.
  • Hybrid approach — Use the empirical grammar as a baseline, then systematically tighten it against the KEBNF rule by rule.

If you have experience with tree-sitter grammars for large languages and want to suggest improvements to the approach, we'd welcome the discussion. Open an issue.

Construct Coverage

| Category | Status | Constructs | |----------|--------|------------| | Packages | ✅ | package, library package, import, alias | | Definitions | ✅ | part, item, port, action, state, constraint, requirement, use case, interface, allocation, analysis, case, verification, occurrence, individual, connection, flow, attribute, enumeration, metadata | | Usages | ✅ | All definition types as usages, plus ref, end, connect, bind, event occurrence, timeslice, snapshot, variant, exhibit, concern, stakeholder, actor, objective | | Specialization | ✅ | :>, specializes, :>>, redefines, subsets, references | | Multiplicity | ✅ | [n], [n..m], [n..*], ordered, nonunique | | Comments | ✅ | //, /* */, //* */, doc, comment about, locale | | Connections | ✅ | connect, bind, interface, allocation | | Flows | ✅ | flow, flow def, message, succession flow | | Actions | ✅ | first/then, perform, accept/send, if/while/for/loop, assign, terminate | | States | ✅ | state def/state, entry/do/exit, transitions with triggers and guards | | Constraints | ✅ | constraint def/constraint, assert, require, assume | | Requirements | ✅ | requirement def/requirement, satisfy, verify, subject, actor, stakeholder | | Expressions | ✅ | Arithmetic, comparison, logical operators, invocations, select (.?), collect (.), index (#) | | Views | ✅ | view, viewpoint, rendering, expose, filter | | Metadata | ✅ | @metadata, #prefixAnnotation | | Variations | ✅ | variation, variant | | Calculations | ✅ | calc def/calc, return |

Installation

Rust

[dependencies]
tree-sitter-sysml = "0.1"

Node.js

npm install tree-sitter-sysml

Go

import "github.com/nomograph-ai/tree-sitter-sysml/bindings/go"

Python

pip install tree-sitter-sysml

Usage

Rust

use tree_sitter::Parser;

fn main() {
    let mut parser = Parser::new();
    parser.set_language(&tree_sitter_sysml::LANGUAGE.into()).unwrap();

    let source = r#"
        package Vehicle {
            part def Engine {
                attribute horsePower : Real;
            }
            part engine : Engine;
        }
    "#;

    let tree = parser.parse(source, None).unwrap();
    println!("{}", tree.root_node().to_sexp());
}

Node.js

const Parser = require('tree-sitter');
const SysML = require('tree-sitter-sysml');

const parser = new Parser();
parser.setLanguage(SysML);

const tree = parser.parse(`
package Vehicle {
    part def Engine {
        attribute horsePower : Real;
    }
    part engine : Engine;
}
`);

console.log(tree.rootNode.toString());

Python

import tree_sitter_sysml as tssysml
from tree_sitter import Language, Parser

SYSML_LANGUAGE = Language(tssysml.language())
parser = Parser(SYSML_LANGUAGE)

tree = parser.parse(b"""
package Vehicle {
    part def Engine {
        attribute horsePower : Real;
    }
    part engine : Engine;
}
""")

print(tree.root_node.sexp())

Project Layout

tree-sitter-sysml/
├── grammar.js              # The grammar definition (~2400 lines)
├── src/
│   ├── parser.c            # Generated parser (do not edit)
│   ├── grammar.json        # Generated grammar metadata
│   ├── node-types.json     # Generated node type definitions
│   └── tree_sitter/        # Tree-sitter C library headers
├── queries/
│   ├── highlights.scm      # Syntax highlighting queries
│   ├── tags.scm            # Code navigation (symbol tags)
│   ├── locals.scm          # Scope-aware variable resolution
│   ├── folds.scm           # Code folding regions
│   └── indents.scm         # Auto-indentation rules
├── test/
│   ├── corpus/             # 192 tree-sitter corpus tests
│   │   ├── actions.txt     #   Control flow, send, accept, assign
│   │   ├── attributes.txt  #   Attribute definitions and usages
│   │   ├── calculations.txt#   Calc definitions with return
│   │   ├── connections.txt #   Connect, bind, interface, allocation
│   │   ├── constraints.txt #   Constraint definitions and assertions
│   │   ├── definitions.txt #   All definition types
│   │   ├── expressions.txt #   Operators, invocations, special exprs
│   │   ├── flows.txt       #   Flow definitions and messages
│   │   ├── metadata.txt    #   Metadata annotations
│   │   ├── packages.txt    #   Packages, imports, aliases, comments
│   │   ├── requirements.txt#   Requirements, satisfy, verify
│   │   ├── states.txt      #   State machines, transitions
│   │   ├── successions.txt #   First/then succession chains
│   │   ├── usages.txt      #   All usage types
│   │   └── views.txt       #   Views, viewpoints, rendering
│   └── invalid/            # 18 negative tests (should fail to parse)
│       ├── syntactic/      #   12 tests: bad tokens, missing delimiters
│       └── structural/     #   6 tests: wrong nesting contexts
├── examples/               # 5 curated SysML v2 example files
│   ├── vehicle.sysml       #   Part definitions, attributes, ports
│   ├── requirements.sysml  #   Requirements with satisfy/verify
│   ├── state-machine.sysml #   State definitions with transitions
│   ├── use-cases.sysml     #   Use case with actors and objectives
│   └── verification.sysml  #   Verification with test cases
├── bindings/               # Language bindings
│   ├── c/                  #   C header and pkg-config
│   ├── rust/               #   Rust crate (lib.rs)
│   ├── go/                 #   Go module
│   ├── node/               #   Node.js addon + binding test
│   ├── python/             #   Python package + binding test
│   └── swift/              #   Swift package
├── scripts/
│   ├── fetch-corpora.sh    # Download external test corpora
│   ├── test-corpus.sh      # Run parser against external files
│   ├── check-test-count.sh # Verify corpus test count
│   ├── validate-external.sh# Validate against external corpora
│   └── validate-training.js# Validate against OMG training files
├── docs/
│   ├── parse-coverage.md   # Detailed coverage report and edge cases
│   └── prd-pre-submission.md # Development planning document
├── tree-sitter.json        # Tree-sitter configuration
├── package.json            # Node.js package metadata
├── Cargo.toml              # Rust crate metadata
├── pyproject.toml          # Python package metadata
├── go.mod / go.sum         # Go module metadata
├── CMakeLists.txt          # CMake build system
├── Makefile                # Make build system
├── binding.gyp             # Node.js native addon build
├── Package.swift           # Swift package definition
└── eslint.config.mjs       # ESLint config (tree-sitter conventions)

Development

Prerequisites

  • Node.js 18+
  • tree-sitter CLI: npm install -g tree-sitter-cli

Quick Start

git clone https://gitlab.com/nomograph/tree-sitter-sysml.git
cd tree-sitter-sysml
npm install
npx tree-sitter generate   # ~2 minutes — the grammar is large
npx tree-sitter test        # 192 tests

Parse a File

npx tree-sitter parse examples/vehicle.sysml

Test Against External Corpora

bash scripts/fetch-corpora.sh          # Clone all external repos
bash scripts/test-corpus.sh all        # Parse every .sysml file
bash scripts/test-corpus.sh all --errors-only  # Show only failures

Lint

npx eslint grammar.js

Editor Support

Neovim (nvim-treesitter)

require('nvim-treesitter.parsers').get_parser_configs().sysml = {
  install_info = {
    url = 'https://gitlab.com/nomograph/tree-sitter-sysml',
    files = { 'src/parser.c' },
    branch = 'master',
  },
  filetype = 'sysml',
}

Helix

The grammar can be added to languages.toml once published to the tree-sitter org.

Zed

Tree-sitter grammars in the tree-sitter org are automatically available in Zed.

Intended Use: Rust CLI and MCP Tooling

This grammar was built to power a Rust-based CLI and Model Context Protocol (MCP) server for AI-assisted systems engineering. The intended workflow:

  1. Parse SysML v2 models into concrete syntax trees using tree-sitter-sysml
  2. Extract structured information (definitions, relationships, requirements, constraints) via tree-sitter queries
  3. Serve that information to LLMs through MCP, enabling AI assistants to understand and reason about system models
  4. Generate SysML v2 from natural language descriptions, with the parser validating output

The Rust binding (tree-sitter-sysml crate) is the primary integration point. The grammar's over-accepting nature is actually an advantage here — when AI generates SysML, a lenient parser that produces a usable tree (even for slightly malformed output) is more useful than a strict parser that rejects it entirely.

Contributing

Contributions are welcome. See CONTRIBUTING.md for detailed guidelines.

What We Need Most

More corpus files. The biggest risk to this grammar is constructs we haven't seen. If you have SysML v2 files — from any source — please share them (or point us to public repositories). Files that break the parser are especially valuable.

To test your files against the grammar:

npx tree-sitter parse your-file.sysml

If it produces an ERROR node, please open an issue with the file (or a minimal reproducing snippet).

Negative tests. We have 18 tests for syntax that should be rejected. We need more — especially for:

  • Invalid nesting (definitions inside usages, behavioral constructs in structural contexts)
  • Malformed expressions
  • Edge cases around keyword-as-identifier ambiguity

Grammar approach feedback. If you've built tree-sitter grammars for large languages and see a better way to structure ours, we want to hear it. The brute-force empirical approach got us to 98%, but there may be architectural improvements that would make the grammar more maintainable or more precise.

Query improvements. The highlight, tag, and local queries cover all node types, but the fold and indent queries are minimal. Contributions to improve editor integration are welcome.

Priority Areas

| Area | Impact | Effort | |------|--------|--------| | Corpus contributions | High | Low | | Negative test cases | High | Low | | Query improvements (folds, indents) | Medium | Low | | Specification alignment documentation | Medium | Medium | | Context-sensitive body rules | High | High |

Known Limitations

  • Over-acceptance: Any member type parses in any body context (see Grammar Approach)
  • 6 unparseable files: 2 intentionally unsupported, 4 regressions from OMG 2026-02 release (see parse-coverage.md)
  • No semantic validation: The parser checks syntax, not type correctness or constraint satisfaction
  • Expression precedence: Approximated with left-association, may differ from spec in edge cases
  • Keyword-as-identifier: Most cases handled, but some ambiguity remains (see parse-coverage.md)

References

Changelog

See CHANGELOG.md for release history.

License

MIT

Author

Andrew Dunn — Nomograph Labs