al-sem
v0.0.12
Published
Static semantic analysis engine for Microsoft Business Central AL code
Maintainers
Readme
al-sem — Static semantic analyzer for AL (Microsoft Business Central)
A static semantic analysis engine for Microsoft Business Central AL code: it builds a
cross-file SemanticModel (symbol index, call graph, event graph, per-routine effect
summaries) and runs evidence-backed queries over it — finding cross-file bugs per-file
linters can't catch, and answering absence-safety questions for tooling and LLM reviewers.
| Metric | Value |
|--------|-------|
| Language | TypeScript (Bun runtime — runs .ts directly, no build step) |
| Version | al-sem on npm — published with provenance |
| Detectors | 35 (34 default + 1 opt-in) |
| Surfaces | analyze · digest · prove · policy · events · diff · fingerprint · MCP (9 tools) |
| Runtime dependencies | 5 (@modelcontextprotocol/sdk, cbor-x, commander, fflate, yaml) |
| Platforms | win32-x64 · linux-x64 · darwin-arm64 |
| Tests | 2717 passing · tsc + Biome clean |
The kind of bugs per-file linters miss because they require the whole call graph: DB-ops-inside-loops walked across procedures, Commit inside a posting transaction span, event-subscriber cycles, integration events with no subscribers anywhere, MinVersion drift against actual call sites, and dozens more. Pure static analysis (no profile, no runtime), but accurate enough to be actionable in CI and in an editor. Tuned on a real Continia BC extension — see precision study.
Surfaces
Every surface runs the same pipeline over one SemanticModel and reuses the dependency cache.
| Surface | Description |
|---------|-------------|
| analyze | Cross-file performance / correctness / compatibility findings (35 detectors), terminal / JSON / SARIF / HTML. |
| digest | Scoped, transitive, cited summary of what changed code does — external effects + the witness paths that reach them. Built for PR review. |
| prove | Tristate absence-safety queries over a routine (may-commit, commits-on-success-path, writes-table:<name>, publishes-event:<name>, reaches-ui, throws-error) with proof obligations. |
| policy | Declarative YAML rules over capability facts (Kleene tri-state), 8 bundled defaults. |
| events | Event blast-radius — fanout (per-event subscriber counts) and chains (relay trees). |
| diff | Cross-version delta across ABI, schema, events, capabilities, permissions. |
| fingerprint | Emit a CapabilitySnapshot for later diff. |
| MCP | 9 tools over the same model (findings/hotspots/path explain + al_sem_digest, al_sem_prove). |
diff and fingerprint accept either a workspace directory or a raw .app symbol package, so
two .app versions can be compared with no source checkout.
Status: code-complete through Phase 4 (record-flow framework) plus the L6 policy layer, the
behavior digest + prove surfaces, event blast-radius, snapshot diff/fingerprint, HTML report,
and JSON surface contracts (versioned envelopes + JSON Schemas + CI drift guard). Published to npm
with provenance attestation. See STATUS.
Install
bun add al-semal-sem ships a native tree-sitter parser per platform that downloads via a postinstall hook. Bun requires you to opt in:
// package.json
{ "trustedDependencies": ["al-sem"] }Supported platforms: win32-x64, linux-x64, darwin-arm64 shipped today;
darwin-x64 (Intel macOS) is pending the next tree-sitter-al release. Other
platforms fail at first parse with a clear NativeParserUnavailableError.
Install-time environment overrides (for air-gapped / mirrored environments):
AL_SEM_NATIVE_PARSER_PATH=/abs/path/to/lib— use a preseeded artifact, skip download.AL_SEM_NATIVE_PARSER_OFFLINE=1— require the canonical artifact to already exist; never download.AL_SEM_NATIVE_PARSER_BASE_URL=https://internal-mirror/...— fetch from an internal mirror instead of GitHub.
If trustedDependencies is not configured, preseed the artifact via AL_SEM_NATIVE_PARSER_PATH.
Quick start
bunx al-sem analyze . --min-severity high --format terminalOr run the bundled demo against a small intentionally-buggy workspace:
bash demo/run-demos.sh all — walks the cross-file detectors the standard
AL cops can't replicate and writes a sample HTML report to demo/report.html.
Sample output:
Analysed 1234 routines (1230 with bodies, 4 parse-incomplete); 251/251 source units parsed; 0 opaque app(s).
HIGH (12):
[d1-db-op-in-loop] Database operation inside a loop — A loop in PostSalesDoc reaches FindSet on Sales Line.
ws:src/Codeunit/SalesPostHelper.Codeunit.al:204:13 in Sales-Post Helper :: PostSalesDoc
confidence: likely
fix (medium): Move the database operation outside the loop, or batch it into a set-based operation.
[d3-missing-setloadfields] Missing SetLoadFields ...By default --format auto emits terminal on a TTY and compact JSON on a pipe.
--format json always emits the compact summary; --format sarif emits SARIF
2.1.0 for GitHub code-scanning; --format html emits a self-contained visual
report (per-finding interprocedural evidence-path flows + a publisher→subscriber
event graph, no external assets) for sharing or blog embedding; --dump-model
opts into the legacy full-model dump (debug-only, can exceed 500 MB).
CI integration
- name: al-sem
run: |
bunx al-sem analyze . \
--baseline .al-sem-baseline.json \
--fail-on high \
--format sarif > al-sem.sarif
- uses: github/codeql-action/upload-sarif@v3
with: { sarif_file: al-sem.sarif }The first run is noisy by design — generate a baseline once and commit it:
bunx al-sem analyze . --baseline .al-sem-baseline.json --update-baselineSubsequent runs report only NEW findings; the baseline survives nearby edits
because fingerprints exclude line numbers. --update-baseline without
--baseline is a no-op and writes a warning to stderr.
Machine-readable output
Every --format json output is a versioned, self-describing document with a published JSON
Schema — see docs/CONTRACTS.md for the envelope format, document kinds,
and how to validate them.
CLI options
| Flag | Default | Description |
|------|---------|-------------|
| --alpackages <dir> | <ws>/.alpackages if present | Explicit path to the dependency .alpackages directory. |
| --format <fmt> | auto | auto | terminal | json | sarif | html. html emits a self-contained visual report (evidence-path flows + event graph). |
| --deterministic | off | Pin timestamps for byte-stable output. |
| --no-dep-summaries | off | Skip behavioral dependency cold run (structural ABI only). Cached separately from the full-mode cache, so the second run with this flag is warm. |
| --dep-cache-dir <dir> | ~/.al-sem/cache/ | Override the dependency cache directory. |
| --dump-model | off | Emit the full SemanticModel (debug-only, can be >500 MB). |
| --min-severity <sev> | none | Drop findings below critical \| high \| medium \| low \| info. |
| --detector <ids> | all | Comma-separated allow-list of detector ids. |
| --scope <scope> | primary | primary drops findings whose actionable anchor is in a dependency. |
| --limit <n> | unlimited | Cap output at the first N findings (after filtering and scope). |
| --group-by <by> | off | Terminal-only grouped output: object \| routine \| table \| detector \| file. |
| --baseline <file> | none | Suppress fingerprints present in the baseline file. |
| --update-baseline | off | Rewrite the baseline file from this run's findings. |
| --fail-on <sev> | none | Exit 1 if any finding at this severity or above (after baseline / filters). |
Cache maintenance
bunx al-sem cache prune # remove stale dep-cache entries
bunx al-sem cache prune --dry-run # classify without deletingStale = version-stamp mismatch with this build, corrupt file, mis-named file, or
tampered content hash. Valid current-version artifacts are kept untouched.
--dep-cache-dir <dir> overrides the cache location for both analyze and
cache prune.
Other commands
All of these run the same pipeline as analyze and reuse the dependency cache.
digest — transitive behavior summary for changed code
bunx al-sem digest . --changed-files src/Sales.Codeunit.al --format json
bunx al-sem digest . --diff pr.diff # scope from a unified diff (or "-" for stdin)Scopes to the changed roots (--changed-files / --changed-routines / --diff, or the
auto-detecting --changed alias) and returns the external effects each reachable routine causes
(COMMIT, DB writes, events, HTTP, UI, …), each backed by a witness path to the file:line that
produces it and a conditionality (unconditional-on-success / conditional / …). Unresolved
callsites in the cone are surfaced explicitly, so a "no effect" answer is never a silent gap.
prove — tristate absence-safety query
bunx al-sem prove . --routine "Sales-Post::PostDocument" --question may-commit
bunx al-sem prove . --routine MyHandler --question writes-table:"Sales Header"Answers yes / no / unknown for one routine and one question, with the proof obligations that
back the answer. unknown (never a confident wrong no) when an unresolved callsite or analysis
gap in the cone makes absence unprovable.
policy — declarative rules over capability facts
bunx al-sem policy check . # workspace's own app(s), bundled default rules
bunx al-sem policy check . --scope all # include dependency-anchored findings
bunx al-sem policy explain no-commit-in-event-subscribersA policy is a YAML file of rules whose when/except predicates match capability facts
(op, resource, root kind, confidence, …) under Kleene tri-state semantics. al-sem auto-detects
al-sem.policy.yaml in the workspace, else applies the 8 bundled defaults (no Commit in event
subscribers / triggers, no interactive UI or ledger writes from API roots, etc.). --format
human | json | sarif. Like analyze, policy check defaults to --scope primary (the
workspace's own app); --scope all reports model-wide.
events — event blast-radius reports
bunx al-sem events fanout . # per-event publisher → subscriber counts + coverage
bunx al-sem events chains . # publisher → subscriber relay trees (cycle/depth-bounded)Both default to --scope primary ("primary participates": the publisher or any subscriber is in
the workspace's own app); --scope all enumerates the entire merged event graph. --format
human | json.
diff — compare two snapshots / workspaces / .app files
bunx al-sem diff old.app new.app # cross-version .app diff (no checkout needed)
bunx al-sem diff ./baseline.cbor.gz . # persisted snapshot vs live workspaceEach side may be a workspace directory, a persisted snapshot artifact, or a raw .app. Reports
deltas across five axes — ABI/contract, schema, events, capabilities, permissions — with
--format human | json | sarif and --fail-on <sev>.
fingerprint — emit a CapabilitySnapshot
bunx al-sem fingerprint . --format cbor.gz --out snapshot.cbor.gz
bunx al-sem fingerprint some.app --format json # snapshot a raw .app directlyPersist a snapshot for later diff (the CI-friendly path: snapshot each release, diff the
artifacts) or inspect per-root capability fingerprints (--format human). Accepts a workspace
directory or a .app file.
Library usage
import {
analyzeWorkspace,
projectFinding,
filterFindings,
applyBaseline,
loadBaseline,
computeExitCode,
} from "al-sem";
const result = await analyzeWorkspace({ workspaceRoot: "./", deterministic: true });
const compact = result.findings.map((f) => projectFinding(f, result.model));
const high = filterFindings(compact, { minSeverity: "high" });
// CI gate: load a baseline, drop known findings, fail on remaining "high" or worse.
const baseline = loadBaseline(".al-sem-baseline.json");
const newOnly = applyBaseline(high, baseline);
process.exitCode = computeExitCode(newOnly, "high");Re-exports from the package root, by area:
| Area | Exports |
|------|---------|
| Pipeline | analyzeWorkspace, indexWorkspace, AnalyzeWorkspaceOptions, AnalyzeWorkspaceResult, IndexWorkspaceResult |
| Model types | Finding, FindingSummary, FindingLocation, Diagnostic, DetectorStats, SemanticModel, Routine, ObjectDecl, Table, SourceAnchor, … (everything from ./model/index.ts) |
| Projection | projectFinding, filterFindings, FilterOptions, groupFindings, FindingGroup, GroupBy, fingerprintOf |
| Output | buildCompactReport, CompactReport, formatCompactJson, formatSarif |
| Baseline / CI | loadBaseline, saveBaseline, applyBaseline, BaselineFile, computeExitCode, parseFailOn |
| Sources | SourceUnit, SourceProvider, ExternalSourceProvider |
indexWorkspace(options) stops after L2 (discovery + indexing only), for callers
that drive resolveModel themselves. analyzeWorkspace runs the full pipeline.
MCP server
al-sem also ships an MCP server (bunx al-sem-mcp or bun run src/mcp/server.ts)
exposing nine tools. Seven are progressive-disclosure views over analyze —
list_findings, list_rollups (multi-detector view), get_finding, list_hotspots,
get_routine_summary, explain_path, get_analysis_health — plus al_sem_digest
(behavior digest for changed code) and al_sem_prove (absence-safety query). See
docs/MCP.md for wiring instructions.
Detectors
| Detector | Category | Flags |
|----------|----------|-------|
| d1-db-op-in-loop | Performance | Database operation reachable inside a loop, interprocedurally; severity by op class. |
| d2-event-fanout-in-loop | Performance | Event raised inside a loop whose subscribers touch the database. |
| d3-missing-setloadfields | Performance | Record retrieval whose loaded field set doesn't cover the fields accessed (same routine + directly-resolved callees). |
| d4-repeated-lookup-in-loop | Performance | Identical Get/FindFirst/FindLast called repeatedly in a loop with a literal key. |
| d5-set-based-opportunity | Performance | Loop body is a single Modify on the iterating record — ModifyAll candidate. |
| d7-recursive-event-expansion | Correctness | Event subscriber chain forms a cycle (runtime infinite recursion). |
| d8-commit-in-transaction | Correctness | Commit inside a posting transaction span — breaks atomicity. |
| d9-transaction-span-summary | Info | Transaction span describes its routine / table / event reach. |
| d10-self-modifying-loop | Correctness | Modify/Validate/Delete on the loop-iterating record. |
| d11-modify-without-get | Correctness | Modify/Validate on a record that was never loaded (no Get/Find/Init/Insert) in this routine. |
| d12-dead-integration-event | Hygiene | Published IntegrationEvent has no subscribers anywhere. |
| d13-cross-app-internal-call | Hygiene | Calls a routine marked Access=Internal in another app. |
| d14-dead-routine | Hygiene | local procedure unreachable from any entry-point or non-local procedure. |
| d16-obsolete-routine-call | Compatibility | Calls a routine marked [Obsolete(...)] (info Pending, high Removed). |
| d17-min-version-drift | Compatibility | Calls into a dependency whose installed version exceeds the declared MinVersion (app-level precision; per-routine pending upstream metadata). |
| d18-constant-filter-in-loop | Performance | SetRange/SetFilter with literal-only arguments inside a loop — the same filter is applied every iteration; hoist it out. |
| d19-unused-parameter | Hygiene | Procedure parameter declared but never referenced in the body. Skips triggers and event-subscribers (signatures dictated by the publisher). |
| d20-unreachable-after-exit | Correctness | Statement that follows Exit;, Error(...), or CurrReport.Quit at the same nesting level — control leaves the routine before it can run. |
| d21-read-without-load | Correctness | TestField / CalcFields / CalcSums on a record never loaded earlier in the routine — read returns the AL default. D11's read-side sibling. |
| d22-flowfield-without-calcfields | Correctness | Reads a FlowField with no prior CalcFields(<that field>) on the same record-var — silent zero/empty result. |
| d29-subscriber-modify-on-event-record | Correctness | Subscriber to an OnAfter*Modify / OnBefore*Delete event mutates the inbound record parameter — re-fires the same event, recursive-trigger risk. |
| d32-constant-boolean-parameter | Hygiene | local procedure Boolean parameter where every resolved primary-app caller passes the same literal — dead parameter, candidate for flattening. |
| d33-unfiltered-bulk-write | Correctness | DeleteAll (critical) or ModifyAll (high) on a local non-temp record with no prior SetRange/SetFilter since the last Reset — whole-table impact. |
| d34-commit-in-loop | Correctness | Commit inside a loop, direct or transitive via callee summary. Per-iteration commits break atomicity; nested-loop case escalates to critical. |
| d35-commit-in-event-subscriber | Correctness | Commit reachable from an [EventSubscriber] routine. Publisher cannot roll back what the subscriber committed. |
| d36-late-setloadfields | Performance | SetLoadFields / AddLoadFields placed AFTER a Get/Find, with no later load — the partial-record optimisation cannot apply. |
| d37-validate-without-persist | Correctness | Validate on a record with no subsequent Modify/Insert before the record is reloaded or the routine exits — the field write is silently discarded. |
| d38-subscriber-to-obsolete-event | Upgrade | [EventSubscriber] bound to a publisher routine carrying [Obsolete(...)]. Pending → info (plan migration); Removed → high (subscriber will stop firing). |
| d39-record-left-dirty-across-chain | Correctness | Caller forwards a record to a helper that exits dirty (path-proven Validate with no subsequent Modify/Insert on at least one exit path), and the caller never persists after the call — the field write is silently discarded across the chain. Strictly interprocedural; only fires on path-proven dirtyAtExit === "yes" from the P6.T2 walker. |
| d40-transitive-load-missing (opt-in) | Correctness | Caller forwards a record to a helper that reads or mutates without loading. Strictly interprocedural — closes D11/D21's by-var-parameter precision gap. Currently opt-in (Phase 4 straight-line walker; Phase 6's full walker re-enables by default after the loop-loaded false-positive class is closed). Enable via --detector d40-transitive-load-missing. |
| d41-transitive-filter-loss | Correctness | Caller sets filters on a record, forwards it by-var to a helper that calls Reset, and then performs a filter-sensitive op (FindFirst/FindLast/FindSet/Find/Next/CalcSums/DeleteAll/ModifyAll/Count/IsEmpty) on the record without re-filtering — the filters are silently lost and the subsequent op runs on the unfiltered set. Strictly interprocedural; the post-call-use requirement prevents flagging intentional reset helpers. |
| d42-cross-call-wrong-setloadfields | Performance | Caller narrowed a record's load via SetLoadFields/AddLoadFields then forwards it to a helper that reads a field outside the narrow — the runtime issues an extra SQL round-trip to fetch the missing field, defeating the partial-load optimisation. Strictly interprocedural; only fires when both sides are concrete (caller narrow and callee requiredLoadedFieldsAtEntry from the Phase 6 walker). |
| d43-event-ishandled-skip | Correctness | Invoker raises an IsHandled-guarded integration event whose subscriber set may set the guard, skipping the invoker's own guarded table writes — the writes are silently bypassed. Dispatch-site (invoker-centric) analysis. |
| d44-event-multi-subscriber-overlap | Correctness | Multiple subscribers to one event write the same table (execution-order-dependent outcome), plus a read-after-write hazard class across subscribers. |
| d45-event-transitive-table-exposure | Correctness | A primary publisher's event reaches, via an N-hop subscriber→publisher relay chain, a subscriber that writes a sensitive table — transitive table exposure the publisher doesn't see locally. |
Architecture (advanced)
A layered pipeline, each layer a pure transform over the previous:
L0 parser / symbols parse AL + read .app symbol packages
L1 providers discover workspace + external sources
L1.5 deps cached dependency artifacts merged into the index
L2 index → SemanticIndex (objects, routines, tables, features)
L3 resolve → SemanticModel (call graph, event graph, coverage)
L4 engine combined graph → Tarjan SCC → fixed-point RoutineSummary
L5 detectors walk the model + summaries → Finding[] (scoped to primary)
L6 projection compact FindingSummary + filter + group + fingerprintanalyzeWorkspace runs the whole pipeline:
discoverSources → buildSemanticIndex → resolveModel
→ buildCombinedGraph → computeSummaries → runDetectors
→ { model, findings, diagnostics, detectorStats }Key design principles
- The engine never throws. Failures — unparseable files, missing symbols, resolution
gaps — surface as
Diagnostic[], never exceptions. There is no "silent clean". - Determinism is a contract. With
deterministic: true, output is byte-stable: timestamps are pinned, every derived collection has a canonical sort, Map/Set iteration never leaks into output unsorted.test/e2e.test.tsguards this. - Detectors are pure queries over the
SemanticModel+ summaries. They prune viaRoutineSummary, then use the shared path-walker with a detector-specific policy to build evidence-backedFindings. Each detector dedupes findings byidbefore sorting. - L4 summaries compose per-routine effects bottom-up over the call graph's SCC condensation, using a finite monotone fixed-point so recursive cycles converge.
- Dependency direction is one-way: al-sem knows nothing of al-perf. al-perf consumes al-sem as a library.
Source layout
src/
parser/ AL parsing (native bun:ffi tree-sitter) + AST helpers
symbols/ .app symbol-package reader
providers/ workspace + external source discovery
deps/ L1.5 — dependency artifact types, cache, pipeline orchestration
index/ SemanticIndex construction (objects, routines, intraprocedural features)
resolve/ call resolution, event graph, record types, coverage → SemanticModel
engine/ L4 — combined graph, SCC, effect lattice, summary engine, path-walker,
reverse call graph, entry points, transaction spans, attribute parser
detectors/ L5 — 34 default detectors plus 1 opt-in (D40), 35 total + shared
DetectorContext, confidence mapping, registry
policy/ L6 — declarative capability-fact rules (Kleene tri-state evaluator)
snapshot/ CapabilitySnapshot compose + serialize (json/cbor/cbor.gz)
diff/ cross-snapshot delta engine (ABI, schema, events, capabilities, permissions)
model/ shared types — entities, graph, summary, finding, identity, ids, analysisRole
cli/ commander CLI + terminal / JSON / SARIF formatters
mcp/ MCP server (nine tools)
index.ts public library entry pointKey files
| File | Purpose |
|------|---------|
| src/index.ts | Public library entry point (analyzeWorkspace, indexWorkspace, model + projection re-exports) |
| src/cli/index.ts | Commander CLI — analyze, digest, prove, policy, events, diff, fingerprint, cache |
| src/mcp/server.ts | MCP server (nine tools) |
| src/detectors/registry.ts | Detector registry (35 detectors) |
| docs/CONTRACTS.md | Public JSON document contracts (envelopes + schemas) |
| docs/superpowers/STATUS.md | Phase status + roadmap (source of truth) |
Development
bun install
bun test # run all tests
bun run typecheck # bunx tsc --noEmit
bun run lint # bunx biome check src test
bun run format # bunx biome format --write src testTech stack: Bun · TypeScript · bun:ffi + native tree-sitter-al shared library ·
commander · fflate (.app package extraction) · bun:test · Biome.
Design specs and implementation plans live under docs/superpowers/ —
specs/ for designs, plans/ for the phased TDD implementation plans.
Status
See docs/superpowers/STATUS.md for the current phase status and roadmap.
Author: Torben Leth License: MIT (see LICENSE)
