fossilizer
v0.1.0
Published
Evidence-based code documentation. Every claim traceable to source.
Downloads
83
Maintainers
Readme
fossilize
evidence-based code documentation. every claim traceable to source.
fossilize generates documentation from three sources of truth: what the AST says (function signatures, types, imports, exports), what the tests prove (behavioral contracts, error paths, edge cases), and what git says (who changed what, when, how often). every statement in the output is backed by evidence you can verify.
self-hosted: fossilize documents its own codebase -- 50 source files, 532 definitions, 1647 resolved cross-file links, 91 functions indexed, 29 with test evidence, high confidence overall. 431 tests across 21 test suites.
install
npm install fossilizequick start
npx fossilize .this parses your source and test files, builds a scope graph, maps tests to functions, loads coverage data, and generates evidence-backed documentation.
output goes to .fossilize/ by default:
docs.md-- markdown documentationdocs.json-- structured index (for agents, dashboards, tooling)docs.html-- standalone HTML report with navigation and searchsnapshot.json-- index snapshot for drift detectionarchitecture.mmd-- mermaid module dependency diagram
commands
fossilize [dir] -- generate documentation for a project
fossilize drift [dir] -- compare current code against the last snapshot, flag changes
fossilize arch [dir] -- show module dependency graph
fossilize init [dir] -- create a .fossilize.json config file (auto-detects directories)
fossilize index [dir] -- build and save the function index without generating docs
flags
--output, -o -- output format: markdown, json, html, both (md+json), all
--coverage, -c -- path to coverage file (lcov or istanbul json)
--test-dir, -t -- additional test directory (repeatable)
--out-dir, -d -- output directory (default .fossilize)
--internal -- include non-exported functions
--watch, -w -- watch for changes and regenerate (debounced, incremental)
--quiet, -q -- suppress human-readable output, emit machine-readable JSON to stdout. useful for CI pipelines, scripting, and tooling integration
config
create .fossilize.json in your project root (or run fossilize init):
{
"sourceDirs": ["src"],
"testDirs": ["tests"],
"coveragePath": "coverage/lcov.info",
"outDir": ".fossilize",
"output": "both"
}CLI flags override config file values. unknown fields in config produce warnings to stderr so typos are caught early.
what it generates
for each public function, fossilize produces:
- signature with params, return type, async status
- call graph -- who calls this function, what it calls
- middleware -- decorators (NestJS, Flask, Django) and Express/Koa/Hono middleware chains detected from route registrations
- test evidence -- which tests exercise it, what assertions they make, which describe block they belong to
- behavioral contract -- synthesized from test names and assertions: what inputs are handled, what outputs are verified, what errors are tested
- coverage -- runtime execution count, branch coverage
- git history -- last author, modification date, commit count
- confidence level -- high (signature + tests + coverage + git), medium (two evidence types), low (signature only)
drift detection
npx fossilize drift .compares the current index against the last saved snapshot. detects:
- new functions added
- functions removed
- signature changes
- file moves
- coverage loss
- middleware chain changes
- export status changes
exits with code 1 if drift is detected. useful in CI to catch undocumented changes.
architecture graph
npx fossilize arch .outputs a mermaid diagram showing module dependencies, entry points, leaf modules, circular dependencies, and max dependency depth.
behavioral contracts
derived entirely from test evidence. for each tested function, fossilize synthesizes:
- accepts -- parameter types plus what input patterns are tested (null handling, empty input, boundary values, invalid input)
- returns -- return type plus what output properties are asserted (exact values, truthiness, collection membership, numeric bounds)
- errors -- what error conditions are tested (throws, rejects, graceful handling)
- side effects -- what mutations are verified (writes, logs, event emissions, external calls)
contracts are scored by completeness: a function with signature + tests + error handling + input validation gets "high" confidence.
ci integration
use --quiet for machine-readable output in pipelines:
# fail CI if docs have drifted from code
npx fossilize drift . --quiet
# generate docs and capture summary
RESULT=$(npx fossilize . --quiet)
echo "$RESULT" | jq .totalFunctionsthe drift command exits non-zero when changes are detected, making it a natural CI gate.
languages
typescript and python, via tree-sitter. the parser detects language from file extension and selects the appropriate grammar automatically.
how it works
- parse -- tree-sitter parses source and test files into concrete syntax trees
- extract -- language-specific extractors pull definitions, references, exports, decorators, middleware chains, and test cases from the CSTs
- scope graph -- cross-file resolution of imports, calls, re-exports through barrel files, and type references
- coverage -- lcov and istanbul/v8 JSON coverage files are parsed, normalized, and matched to functions by name or line range
- git -- blame and log commands extract historical evidence per file
- index -- all evidence is collected per function into a structured index with confidence scoring
- contracts -- test names and assertion patterns are synthesized into behavioral descriptions
- generate -- pluggable renderers produce markdown, JSON, or HTML output
- drift -- index snapshots are compared to detect undocumented changes
programmatic API
import {
SourceParser,
ScopeGraphBuilder,
buildIndex,
generateMarkdown,
generateHtml,
synthesizeContracts,
buildArchitectureGraph,
detectDrift,
} from "fossilize";all pipeline stages are independently importable and composable. types are exported for consumers building custom tooling on top of the index.
license
MIT
