@kitschpatrol/codemeta
v0.6.0
Published
A CLI tool and TypeScript library to discover and map software project metadata from various ecosystems to the CodeMeta standard.
Maintainers
Readme
@kitschpatrol/codemeta
A CLI tool and TypeScript library to discover and map software project metadata from various ecosystems to the CodeMeta standard.
Overview
Discover, parse, and merge metadata from a variety of project manifests and files into a single codemeta.json file describing the software.
The CodeMeta vocabulary provides a standard way to describe software using JSON-LD and schema.org terms. Most software projects already have rich metadata in manifests and other files (e.g. package.json, Cargo.toml, pyproject.toml, LICENSE, etc.), but the name and structure of semantically equivalent metadata is often inconsistent across ecosystems.
This tool reads those manifests and merges metadata from the software development diaspora into one canonical CodeMeta v3.1 JSON-LD document.
More mature Python-based tools like codemetapy and codemeta-harvester perform a similar task, and either of these are emphatically recommended over this project for any use case not limited to a Node.js runtime.
This project should be considered "unofficial" in the sense that its author is not affiliated with the CodeMeta project / governing bodies. The package is released under the @kitschpatrol namespace on NPM to leave the codemeta package name available for the CodeMeta project core contributors.
Getting started
Dependencies
Node 22.17 or newer.
Installation
Invoke directly in a local project repository directory:
npx @kitschpatrol/codemetaOr, install globally for access across your system:
npm install --global @kitschpatrol/codemetaOr, install locally to access the CLI commands in a single project or to import the provided TypeScript APIs:
npm install @kitschpatrol/codemetaRunning
Navigate to the root of a local project and run the CLI to generate and emit CodeMeta JSON to stdout:
codemetaOr save directly to a file:
codemeta -o codemeta.jsonSupported metadata formats
This tool leverages the crosswalk data generously compiled by CodeMeta contributors to assist in automating the mapping of various metadata formats to the CodeMeta standard. Where crosswalk data is unavailable or incomplete, heuristics are used instead.
The green-checked entries below indicate metadata file formats and sources that @kitschpatrol/codemeta can discover, parse, and merge into a codemeta.json file for a given directory:
| Status | Ecosystem | Organization or Registry | Specifications | Crosswalk |
| ------ | --------------- | --------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- |
| ✅ | Agnostic | CodeMeta (v1) | codemeta.json | Yes |
| ✅ | Agnostic | CodeMeta (v2) | codemeta.json | Yes |
| ✅ | Agnostic | CodeMeta (v3) | codemeta.json | No |
| ✅ | Agnostic | CodeMeta (v3.1) | codemeta.json | No |
| ✅ | Go | Go Modules | go.mod | No |
| ✅ | Go | GoReleaser | .goreleaser.yaml (Also matches .yml) | No |
| ✅ | Java | Maven | pom.xml | Yes |
| ✅ | JavaScript | NPM | package.json | Yes |
| ✅ | Agnostic | Public Code | publiccode.yml (Also matches .yaml) | Yes |
| ✅ | Python | PyPi (Distutils) | setup.py setup.cfg | Yes |
| ✅ | Python | PyPi (PKG-INFO) | .egg-info/PKG-INFO | Yes |
| ✅ | Python | PyPi (pep-0621) | pyproject.toml | No |
| ✅ | Ruby | Ruby Gems | *.gemspec | Yes |
| ✅ | Rust | Crates | Cargo.toml | Yes |
| ✅ | Agnostic | | README.md (and variants) | No |
| ✅ | Agnostic | Documented below | metadata.json(and.yaml/.yml` variants) | No |
| ✅ | Agnostic | SPDX | LICENSE, LICENCE, COPYING, UNLICENSE (and .md/.txt variants) | No |
| ❌ | .NET | NuGet | *.nuspec | Yes |
| ❌ | Scholarly | Citation File Format (v1.2.0) | CITATION.cff | Yes |
| ❌ | Scholarly | DOAP (Description of a Project) | doap.rdf | Yes |
| ❌ | Astronomy | ASCL | pom.xml | Yes |
| ❌ | Biomedical | SciCrunch Registry | platform metadata | Yes |
| ❌ | Clojure | Leiningen | project.clj | Yes |
| ❌ | Dart | pub.dev | pubspec.yaml | Yes |
| ❌ | Data Catalog | W3C DCAT-2 | *.ttl, *.rdf, *.jsonld | Yes |
| ❌ | Data Catalog | W3C DCAT-3 | *.ttl, *.rdf, *.jsonld | Yes |
| ❌ | Debian | Debian Package | debian/control | Yes |
| ❌ | Earth Science | CSDMS Model Metadata | model_metadata.xml | Yes |
| ❌ | Geoscience | OntoSoft Software Repository | *.json, *.xml` | Yes |
| ❌ | Geoscience | USGS Model Catalog | portal metadata | Yes |
| ❌ | Geospatial | ISO 19115-1:2014 | *.xml | Yes |
| ❌ | Haskell | Hackage | *.cabal | Yes |
| ❌ | Julia | Pkg | Project.toml | Yes |
| ❌ | Knowledge Graph | Wikidata | Wikidata entity model | Yes |
| ❌ | Library | MODS | *.xml | Yes |
| ❌ | Licensing | SPDX 2.3 | *.spdx, *.spdx.json, *.spdx.rdf | Yes |
| ❌ | Life Sciences | bio.tools | biotools.json | Yes |
| ❌ | Mathematics | swMATH | portal metadata | Yes |
| ❌ | Octave | Octave Package | DESCRIPTION | Yes |
| ❌ | Perl | CPAN::Meta | META.json META.yml | Yes |
| ❌ | R | R Package Description | DESCRIPTION | Yes |
| ❌ | Scholarly | BibTeX | *.bib | Yes |
| ❌ | Scholarly | DataCite Metadata Schema | datacite.xml | Yes |
| ❌ | Scholarly | Dublin Core | *.xml, *.rdf | Yes |
| ❌ | Scholarly | Figshare Metadata | platform metadata | Yes |
| ❌ | Scholarly | Software Discovery Index | no public format spec | Yes |
| ❌ | Bioinformatics | Software Ontology | *.owl, *.rdf | Yes |
| ❌ | Scholarly | Trove Software Map | portal metadata | Yes |
| ❌ | Scholarly | VIVO | *.rdf | Yes |
| ❌ | Scholarly | Zenodo Metadata | *.zenodo.json | Yes |
| ❌ | Space Physics | SPASE | *.xml | Yes |
| ❌ | Agnostic | GitHub Repository Metadata | GitHub REST metadata | Yes |
metadata.json
Additionally, a minimalist metadata.json (or .yaml) file is supported, which can capture the minimal metadata required to populate a GitHub project's repository page's description, homepage, and topics.
| Key | Key Aliases | CodeMeta Property | Notes |
| ------------- | ---------------------------- | ----------------- | ----------------------------------------------------------------------------- |
| description | None | description | String description of project |
| homepage | url repository website | url | For repository values, git+ prefix and .git suffix are automatically stripped |
| keywords | tags topics | keywords | Array of strings, or a single comma-delimited string |
If multiple key aliases are present in the object, priority for populating the associated codemeta.json goes to the key, then falls through to key aliases in the order shown above. (E.g. homepage takes priority over url.)
This is a non-standard format that exists primarily for use in combination with github-action-repo-sync.
Usage
Library
API
The library exports the following functions and types:
generate(paths, options?)
Main entry point. Discovers metadata files in the given paths (files or directories), parses them, and returns a single composed CodeMeta object.
function generate(paths: string | string[], options?: GenerateOptions): Promise<CodeMeta>This command is idempotent. By default, an existing codemeta.json file in the scanned directory is treated as a generated artifact and excluded from input if primary metadata sources are present (e.g. package.json, Cargo.toml, pyproject.toml, etc.). This means the output is always a pure function of your project's existing source metadata files.
If no primary sources are found, an existing codemeta.json file is automatically kept as the source of truth.
In the rare case that you're maintaining parts of the codemeta.json by hand alongside other primary metadata sources, you can protect your additions to the file wile still merging updates from the other sources by setting the retain option to true.
GenerateOptions:
| Option | Type | Default | Description |
| ----------- | ------------------- | ------- | -------------------------------------------------------------------------------- |
| baseUri | string | | Base URI for @id. Auto-detected from codemeta.json if present. |
| enrich | boolean | false | Infer missing properties from existing metadata. |
| exclude | string[] | | Glob patterns to exclude during directory discovery. |
| retain | boolean | false | Include existing codemeta.json as input even when primary sources are present. |
| overrides | Partial<CodeMeta> | | Property values to set, overriding anything parsed from files. |
| recursive | boolean | false | Scan subdirectories when a path is a directory. |
When a directory is provided, generate calls discover() internally to find parseable files, then merges them in priority order. By default, existing codemeta.json files are excluded when primary metadata sources are present to ensure idempotent generation.
discover(directory, recursive?, ignore?, retain?)
Auto-detect metadata files in a directory. Returns an array of discovered files sorted by parser priority.
function discover(
directory: string,
recursive?: boolean,
ignore?: string[],
retain?: boolean,
): Promise<DiscoveredFile[]>By default, codemeta.json files are excluded from discovery when primary metadata sources (project manifests like package.json, Cargo.toml, etc.) are also found. Set retain to true to always include them.
Common build artifacts and dot-directories (node_modules, dist, target, __pycache__, venv, etc.) are ignored by default.
validate(meta)
Validate a CodeMeta object for completeness and consistency.
function validate(meta: Partial<CodeMeta>): ValidationResultReturns a ValidationResult with valid (boolean) and warnings (array). Checks for missing codeRepository, author, and license, and detects license conflicts.
Examples
Generate metadata from the current directory:
import { generate } from '@kitschpatrol/codemeta'
const meta = await generate('.')
console.log(JSON.stringify(meta, null, 2))Compose metadata from multiple specific files:
import { generate } from '@kitschpatrol/codemeta'
const meta = await generate(['package.json', 'codemeta.json'])
console.log(meta.name, meta.version, meta.license)Enrich, override, and validate:
import { generate, validate } from '@kitschpatrol/codemeta'
const meta = await generate('/path/to/project', {
baseUri: 'https://github.com/user/my-project',
enrich: true,
overrides: { name: 'My Project' },
recursive: true,
})
const { valid, warnings } = validate(meta)
for (const w of warnings) {
console.warn(`[${w.severity}] ${w.property}: ${w.message}`)
}Discover files without parsing them:
import { discover } from '@kitschpatrol/codemeta'
const files = await discover('/path/to/project')
for (const f of files) {
console.log(`${f.parserName}: ${f.filePath}`)
}CLI
Command: codemeta
Discover and parse software metadata from files and directories into CodeMeta JSON-LD.
Usage:
codemeta [paths..]| Positional Argument | Description | Type | Default |
| ------------------- | --------------------------------------------------- | -------- | ------- |
| paths | Paths to files or directories to scan for metadata. | string | ["."] |
| Option | Description | Type | Default |
| --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------- | ------- |
| --verbose-V | Enable verbose logging | boolean | |
| --output-o | Write output to file | string | |
| --basic | Output simplified metadata with predictable types (no JSON-LD boilerplate) | boolean | false |
| --enrich | Enable automatic inference and enrichment | boolean | false |
| --validate | Validate and report on metadata quality | boolean | false |
| --exclude | Filenames or globs to exclude from automatic discovery in directories | array | |
| --retain | Retain existing codemeta.json as input alongside primary metadata sources. Without this flag, an existing codemeta.json is only used when no primary sources (package.json, Cargo.toml,etc.) are found. | boolean | false |
| --recursive-r | Scan subdirectories for metadata | boolean | false |
| --set-s | Override a property (e.g. --set name="My Project") | array | |
| --base-uri | Base URI for identifiers | string | |
| --help-h | Show help | boolean | |
| --version-v | Show version number | boolean | |
Examples
Generate codemeta.json from the current directory, emitting to stdout:
codemetaScan a project recursively with enrichment, writing to a file:
codemeta /path/to/project -r --enrich -o codemeta.jsonCompose from specific files:
codemeta package.json pyproject.tomlOverride a property:
codemeta --set name="My Project" --set version="2.0.0"Validate the output:
codemeta --validateSet a base URI for the @id field:
codemeta --base-uri https://github.com/user/my-projectExclude files from discovery:
codemeta -r --exclude "examples/**" --exclude "vendor/**"Background
Motivation
Having a native JavaScript/TypeScript tool for generating codemeta.json makes it easy to integrate into Node.js-based CI pipelines or toolchains without introducing a Python dependency or requiring containerization.
My MetaScope and github-action-repo-sync projects both needed a Node-based tool for generating codemeta.json files.
Implementation notes
The behavior and output of the codemetapy binary served as a functional reference during development. This project is an independent clean-room implementation of similar functionality. Correctness was validated through comparison of CLI output from this tool and codemetapy against representative test fixtures. (Though where the behavior of codemetapy is inconsistent with the CodeMeta spec, the CodeMeta spec takes precedence — one example of this is the treatment of devDependencies in package.json files as softwareSuggestions instead of softwareDependencies.)
This tool always outputs CodeMeta v3.1 files. When ingesting codemeta.json files defined in the older CodeMeta 1 and CodeMeta v2 contexts, all simple key re-mappings as defined in the crosswalk table are applied. However, some more nuanced conditional transformations (like the reassignment of copyright holding agents in v1) are not implemented.
For development and building the project itself, we're stuck on Node.js version ^22 specifically until Node Tree-sitter issues related to more recent versions of Node get resolved.
Related projects
- codemetapy
Translate software metadata into the CodeMeta vocabulary (Python) - codemeta-harvester
Aggregate software metadata into the CodeMeta vocabulary from source repositories and service endpoints (Python) - bibliothecary
Manifest discovery and parsing for libraries.io (Ruby) - diggity
Generates SBOMs for container images, filesystems, archives, and more (Go) - SOMEF
Software Metadata Extraction Framework (Python) - Upstream Ontologist
A common interface for finding metadata about upstream software projects (Rust)
Slop factor
Medium.
The architecture, test fixture curation, and documentation required manual care and feeding, but the implementation was driven pretty heavily by Claude Code and has been subject to only moderate post-facto human scrutiny.
Maintainers
@kitschpatrol
Acknowledgments
Thank you to the CodeMeta Project Management Committee and contributors for their development and stewardship of the standard.
Jacob Peddicord's askalono project inspired the Dice-Sørensen scoring strategy used for classifying arbitrary license text.
Contributing
Issues and pull requests are welcome.
License
Apache-2.0 © Eric Mika
