metascope
v0.6.0
Published
A CLI tool and TypeScript library to easily extract metadata from all kinds of software repositories.
Maintainers
Readme
metascope
A CLI tool and TypeScript library to easily extract metadata from all kinds of software repositories.
[!NOTE]
Metascope is under development. Expect breaking changes until a 1.0 release.
Overview
Metascope aggregates metadata from a local code repository into a single monolithic JSON object. Given a project directory, it checks multiple sources in parallel — local git history, package manifests, the GitHub API, the NPM registry, lines of code analysis, and more — and returns a JSON object containing everything it could find.
From there, an (optional) template system lets you refine and transform the output to reflect exactly which fields you need, useful for archival purposes, populating dashboards, or feeding data into other tools. The template system also provides a spec-compliant implementation of the CodeMeta vocabulary, allowing easy generation of codemeta.json files for a semantically normalized view of a variety of project types.
Highlights:
A wide net
Metascope pulls project metadata from many available sources:package.json,pyproject.toml, NPM, PyPI, GitHub, git, filesystem stats, and more.Graceful degradation
Each source checks its own availability before extraction. Missing tools, unavailable APIs, or absent credentials are silently skipped — you always get back whatever data is available within the constraints of the calling context.Parallel extraction
After an initial codemeta pass for discovery hints (package name, repository URL, keywords), all remaining sources are checked and extracted concurrently.Typed templates
ThedefineTemplate()helper provides full autocomplete on available fields. TypeScript infers the return type from your template function, sogetMetadata()returns exactly the shape you need.CLI and library
Use it as a command-line tool for quick inspection or pipe-friendly JSON output, or import it as a library for programmatic access with full type safety.
Getting started
Dependencies
Metascope requires Node.js 22.17+. It is implemented in TypeScript, ships as ESM, and bundles complete type definitions.
Metascope also requires a recent version of git on your path for quickly identifying ignored files and aggregating repository statistics.
Optional external tools:
- GitHub CLI
Used as a fallback for GitHub API authentication if no token is provided via--github-tokenor$GITHUB_TOKEN. It's trivially installed from Homebrew:brew install gh.
Installation
Invoke directly on the current directory:
npx metascope...or install locally:
npm install metascope...or install globally:
npm install --global metascopeIf you're using PNPM, you can safely ignore the build scripts for the tree-sitter dependencies, since we're only interested in their bundled WASM implementations.
In your pnpm-workspace.yaml:
ignoredBuiltDependencies:
- tree-sitter-python
- tree-sitter-rubyUsage
CLI
Command: metascope
Extract metadata from a code repository.
Usage:
metascope [path]| Positional Argument | Description | Type | Default |
| ------------------- | ---------------------- | -------- | ------- |
| path | Project directory path | string | "." |
| Option | Description | Type | Default |
| ---------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------- | ------- |
| --template-t | Built-in template name (codemeta, codemetaJson, frontmatter, metadata, project) or path to a custom template file | string | |
| --github-token | GitHub API token (or set $GITHUB_TOKEN) | string | |
| --author-name | Optional author name(s) for ownership checks in templates | array | |
| --github-account | Optional GitHub account name(s) for ownership checks in templates | array | |
| --absolute | Output absolute paths. Use --no-absolute for relative paths. | boolean | true |
| --offline | Skip sources requiring network requests | boolean | false |
| --sources-s | Only run specific metadata sources (defaults to all) | array | |
| --no-ignore | Include files ignored by .gitignore in the file tree | boolean | false |
| --recursive-r | Search for metadata files recursively in subdirectories | boolean | false |
| --workspaces-w | Include workspace-specific metadata in monorepos; pass a boolean to enable or disable auto-detection, or pass one or more strings to explicitly define workspace paths | | true |
| --verbose | Run with verbose logging | boolean | false |
| --help-h | Show help | boolean | |
| --version-v | Show version number | boolean | |
Examples
Basic metadata extraction
Extract all available metadata from the current directory:
metascopeOutput is pretty-printed JSON when writing to a terminal, compact JSON when piped.
Scan a specific directory
metascope /path/to/projectUse a built-in template
metascope --template projectPass template data for ownership checks
Some preset templates return information based on the (relative) ownership status of a repo. This requires additional context data, which can be passed in via additional CLI flags:
metascope --template project --author-name "Jane Doe" --github-account janedoeMultiple values are supported:
metascope --template project --author-name "Jane Doe" "John Doe" --github-account janedoe johndoeUse a custom template file
metascope --template ./my-template.tsWhere my-template.ts might look like:
import { defineTemplate, helpers } from 'metascope'
export default defineTemplate(({ codemetaJson, github, gitStats }) => {
const codemeta = helpers.firstOf(codemetaJson)
const git = helpers.firstOf(gitStats)
const gh = helpers.firstOf(github)
return {
commits: git?.data.commitCount,
name: codemeta?.data.name,
stars: gh?.data.stargazerCount,
version: codemeta?.data.version,
}
})Run only specific sources
Extract metadata from only the sources you need, skipping everything else for faster results:
metascope --sources nodePackageJson gitStatsPipe compact JSON to another tool
metascope | jq '.github.stargazerCount'Provide a GitHub token
An optional GitHub token can allow access to metadata about private repositories, and raises the request limit if you're operating on a large collection of repositories:
metascope --github-token ghp_xxxxxxxxxxxxOr set the GITHUB_TOKEN environment variable, or authenticate via gh auth login. Metascope will attempt to find a credential without bothering you.
Verbose logging
metascope --verboseLogs source availability checks, extraction durations, and other diagnostics to stderr.
API
The metascope library exports getMetadata as its primary function, defineTemplate for type-safe template authoring, and a helpers namespace with utility functions for working with metadata in templates.
getMetadata
// Without a template — returns full MetadataContext
function getMetadata(options: GetMetadataOptions): Promise<MetadataContext>
// With a template — returns the template's return type
function getMetadata<T>(options: GetMetadataTemplateOptions<T>): Promise<T>The function accepts a project directory path, optional credentials, and an optional template (a built-in name or a template function). It returns a promise resolving to either the full MetadataContext or the shaped output of your template.
All undefined values and empty source objects are deep-stripped from the output before returning.
To run only a subset of sources, pass a sources array with the desired source key names. When omitted, all sources run (the default). This is useful for faster extraction when you only need specific data:
const result = await getMetadata({
path: '.',
sources: ['nodePackageJson', 'gitStats'],
})Templates can be combined with the sources option, but note that some of the built-in templates might suffer missing data if they rely on specific sources.
defineTemplate
function defineTemplate<T>(
transform: (context: MetadataContext, templateData: TemplateData) => T,
): Template<T>An identity wrapper that provides autocomplete and type inference when authoring templates. The optional second templateData argument provides user-supplied values (like author names or GitHub accounts) for parameterized ownership checks. Templates that don't need it can simply ignore the argument. Template developers can pass additional values as needed.
Examples
Get all metadata
import { getMetadata, helpers } from 'metascope'
const metadata = await getMetadata({ path: '.' })
console.log(helpers.firstOf(metadata.codemetaJson)?.data.name)
console.log(helpers.firstOf(metadata.github)?.data.stargazerCount)
console.log(helpers.firstOf(metadata.gitStats)?.data.commitCount)See output sample for this repository.
Get metadata from specific sources only
import { getMetadata, helpers } from 'metascope'
const metadata = await getMetadata({
path: '.',
sources: ['nodePackageJson', 'licenseFile'],
})
// Only the requested sources are populated
console.log(helpers.firstOf(metadata.nodePackageJson)?.data.name)
console.log(helpers.firstOf(metadata.licenseFile)?.data.spdxId)
// Other sources are undefined
console.log(metadata.github) // UndefinedGet shaped metadata via a template
import { defineTemplate, getMetadata, helpers } from 'metascope'
const template = defineTemplate(({ codemetaJson, github }) => ({
name: helpers.firstOf(codemetaJson)?.data.name,
stars: helpers.firstOf(github)?.data.stargazerCount,
}))
// Result is typed as { name: ..., stars: ... }
const result = await getMetadata({ path: '.', template })Provide credentials
import { getMetadata } from 'metascope'
const metadata = await getMetadata({
credentials: { githubToken: 'ghp_xxxxxxxxxxxx' },
path: '.',
})Credential resolution follows a precedence chain: explicit options > environment variables > CLI tool fallbacks (e.g. gh auth token). This makes metascope work in both CI environments and local development without configuration.
Pass template data
import { defineTemplate, getMetadata, helpers } from 'metascope'
const template = defineTemplate(({ codemetaJson }, { authorName }) => {
const codemeta = helpers.firstOf(codemetaJson)
return {
isAuthoredByMe: codemeta?.data.author?.some((a) => a.name === authorName),
name: codemeta?.data.name,
}
})
const result = await getMetadata({
path: '.',
template,
templateData: { authorName: 'Jane Doe' },
})Use a built-in template
import { getMetadata } from 'metascope'
const result = await getMetadata({ path: '.', template: 'frontmatter' })Sources
Metascope extracts data from a wide range of data sources:
Local Files
| Ecosystem | Organization | Metascope Key | Source Specifications |
| ---------- | ------------------------------------------------------------------------------------------------------- | ----------------------------- | --------------------------------------------------------------------------------------------------- |
| Agnostic | | readmeFile | README.md (and variants) |
| Agnostic | CodeMeta (v1) | codemetaJson | codemeta.json |
| Agnostic | CodeMeta (v2) | codemetaJson | codemeta.json |
| Agnostic | CodeMeta (v3.1) | codemetaJson | codemeta.json |
| Agnostic | CodeMeta (v3) | codemetaJson | codemeta.json |
| Agnostic | Documented below | metadataFile | metadata.json (and .yaml / .yml variants) |
| Agnostic | Git | gitConfig | .git/config |
| Agnostic | Public Code | publiccodeYaml | publiccode.yml (Also matches .yaml) |
| Agnostic | SPDX | licenseFile | LICENSE, LICENCE, COPYING, UNLICENSE (and .md/.txt variants) |
| Apple | Apple Info.plist | xcodeInfoPlist | Info.plist |
| Apple | Xcode Project | xcodeProjectPbxproj | *.xcodeproj/project.pbxproj |
| C++ | Arduino Library | arduinoLibraryProperties | library.properties |
| C++ | Cinder CinderBlock | cinderCinderblockXml | cinderblock.xml |
| C++ | openFrameworks Addon (Legacy) | openframeworksInstallXml | install.xml (Legacy format, replaced by addon_config.mk) |
| C++ | openFrameworks Addon | openframeworksAddonConfigMk | addon_config.mk |
| Go | Go Modules | goGoMod | go.mod |
| Go | GoReleaser | goGoreleaserYaml | .goreleaser.yaml (Also matches .yml) |
| Java | Maven | javaPomXml | pom.xml |
| Java | Processing Library | processingLibraryProperties | library.properties |
| Java | Processing Sketch | processingSketchProperties | sketch.properties (Not really specified...) |
| JavaScript | NPM | nodePackageJson | package.json |
| Obsidian | Obsidian | obsidianPluginManifestJson | manifest.json |
| Python | PyPi (Distutils) | pythonSetupCfg | setup.cfg |
| Python | PyPi (Distutils) | pythonSetupPy | setup.py |
| Python | PyPi (pep-0621) | pythonPyprojectToml | pyproject.toml |
| Python | PyPi (PKG-INFO) | pythonPkgInfo | .egg-info/PKG-INFO |
| Ruby | Ruby Gems | rubyGemspec | *.gemspec |
| Rust | Crates | rustCargoToml | Cargo.toml |
Local Tools
| Ecosystem | Organization | Metascope Key | Source Specifications |
| --------- | --------------------------- | ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
| Agnostic | | dependencyUpdates | Dependency freshness (outdated packages, libyears) |
| Agnostic | | fileStats | Filesystem metadata (file counts, directory counts, total size) |
| Agnostic | Git | gitStats | Git CLI statistics (commits, branches, tags, contributors) |
| Agnostic | None | codeStats | Lines of code analysis from tokei via bundled native bindings |
Remote Sources
You can skip network calls by passing --offline to the CLI.
| Ecosystem | Organization | Metascope Key | Source Specifications |
| ---------- | --------------------------------------------------------------------------------------- | ------------------------ | -------------------------------------------------------------------- |
| Agnostic | GitHub Repository Metadata | github | GitHub GraphQL metadata |
| JavaScript | NPM Registry | nodeNpmRegistry | NPM registry API (download counts, publish dates, latest version) |
| Obsidian | Obsidian Community Plugins | obsidianPluginRegistry | Obsidian community plugin stats (download counts) |
| Python | PyPI Registry | pythonPypiRegistry | PyPI registry API (download counts, publish dates, latest version) |
About metadata.json
Metascope supports a minimalist metadata.json (or .yaml) file is supported, which can capture the minimal metadata required to populate a GitHub project's repository page's description, homepage, and topics.
This is a non-standard format that exists primarily for use in combination with github-action-repo-sync.
| Key | Key Aliases | CodeMeta Property | Notes |
| ------------- | ---------------------------- | ----------------- | ----------------------------------------------------------------------------- |
| description | None | description | String description of project |
| homepage | url repository website | url | For repository values, git+ prefix and .git suffix are automatically stripped |
| keywords | tags topics | keywords | Array of strings, or a single comma-delimited string |
If multiple key aliases are present in the object, priority for populating the associated codemeta.json goes to the key, then falls through to key aliases in the order shown above. (E.g. homepage takes priority over url.)
If you have more metadata to define but your project lacks a canonical package specification format, then creating a codemeta.json file is recommended over the non-standard metadata.json.
Templates
Metascope provides a basic templating / output transformation functionality to compose its output into more compact and focused representations.
Built-in templates
Five built-in templates are available by name. Pass the name as the template option on the CLI or in the API.
codemeta
The CodeMeta template provides a standard way to describe software using JSON-LD and schema.org terms. Most software projects already have rich metadata in manifests and other files (e.g. package.json, Cargo.toml, pyproject.toml, LICENSE, etc.), but the name and structure of semantically equivalent metadata is often inconsistent across ecosystems.
It leverages the crosswalk data generously compiled by CodeMeta contributors to assist in automating the mapping of various metadata formats to the CodeMeta standard. Where crosswalk data is unavailable or incomplete, heuristics are used instead.
This tool always outputs CodeMeta v3.1 files. When ingesting codemeta.json files defined in the older CodeMeta 1 and CodeMeta v2 contexts, all simple key re-mappings as defined in the crosswalk table are applied. However, some more nuanced conditional transformations (like the reassignment of copyright holding agents in v1) are not implemented.
More mature Python-based tools like codemetapy and codemeta-harvester perform a similar task, and either of these are recommended if you need codemeta.json output and aren't limited to a Node.js runtime.
Note that Metascope and its its author is not affiliated with the CodeMeta project / governing bodies.
metascope --template codemetaSee an output sample from the codemeta template run against this repository.
codemetaJson
A JSON-friendly derivation of the codemeta template. Produces the same aggregated metadata but parses it through a strict schema, stripping JSON-LD artifacts (like @context and @type) to yield plain JSON suitable for consumption by tools that don't understand JSON-LD.
metascope --template codemetaJsonSee an output sample from the codemetaJson template run against this repository.
frontmatter
A compact, non-nested, polyglot overview of the project. Designed for Obsidian frontmatter — flat keys with natural language names, blending all available sources into a single trackable snapshot. Uses null for missing values to ensure stable keys.
metascope --template frontmatterSee an output sample from the frontmatter template run against this repository.
metadata
A minimal template that outputs the three fields used by metadata.json / metadata.yaml: description, homepage, and topics. Designed for use with github-action-repo-sync to populate a GitHub repository's description, homepage, and topics. Values from a metadata.json source file override what the codemeta template would otherwise produce.
metascope --template metadataSee an output sample from the metadata template run against this repository.
project
I needed this one for a legacy internal dashboard application. Includes ownership checks via authorName and githubAccount template data.
metascope --template project --author-name "Jane Doe" --github-account janedoeSee an output sample from the project template run against this repository.
Defining a custom template
Templates are pure functions that receive the full MetadataContext and an optional TemplateData object, and return whatever shape you like. They are applied after all sources have been extracted, so all available data is accessible.
Yes, you can just pipe output to jq and filter / transform as you please, but for complex templates with a lot of logic, TypeScript can be nicer to work with.
Use defineTemplate() for type inference and autocomplete.
Many helper functions for working with template data are also under the helpers namespace:
// In e.g. "metascope-template.ts":
import { defineTemplate, helpers } from 'metascope'
export default defineTemplate(({ codemetaJson, codeStats, github, gitStats }) => {
const codemeta = helpers.firstOf(codemetaJson)
const git = helpers.firstOf(gitStats)
const gh = helpers.firstOf(github)
const loc = helpers.firstOf(codeStats)
return {
commits: git?.data.commitCount,
forks: gh?.data.forkCount,
linesOfCode: loc?.data.total?.code,
name: codemeta?.data.name,
stars: gh?.data.stargazerCount,
version: codemeta?.data.version,
}
})Passing template data
The second argument to a template function is a TemplateData object with optional authorName and githubAccount fields. This lets templates parameterize ownership checks instead of hardcoding author names:
import { defineTemplate, helpers } from 'metascope'
export default defineTemplate(({ codemetaJson }, { authorName, githubAccount }) => {
const codemeta = helpers.firstOf(codemetaJson)
const authors = codemeta?.data.author?.map((a) => a.name) ?? []
const repo = codemeta?.data.codeRepository?.toLowerCase() ?? ''
return {
isMyProject: authors.includes(authorName),
isOnMyGitHub: typeof githubAccount === 'string' && repo.includes(`/${githubAccount}/`),
name: codemeta?.data.name,
}
})Values for the built-in templates are provided via the --author-name and --github-account CLI flags, or via the templateData option in the API. Templates that don't need this data can simply omit the second argument.
Using a custom template via the CLI
metascope --template ./metascope-template.tsTemplate files are loaded via jiti, so TypeScript works out of the box without a build step.
Background
Metascope was built to support automated generation of project dashboards, badges, and documentation where a single source of truth for project metadata is useful. Rather than querying each API individually, metascope handles the discovery, authentication, and aggregation in one pass for a wide variety of project types.
Related projects
- codemeta
Standard shared metadata vocabulary (JSON-LD) - codemetapy
Translate software metadata into the CodeMeta vocabulary (Python) - codemeta-harvester
Aggregate software metadata into the CodeMeta vocabulary from source repositories and service endpoints (Python) - bibliothecary
Manifest discovery and parsing for libraries.io (Ruby) - diggity
Generates SBOMs for container images, filesystems, archives, and more (Go) - SOMEF
Software Metadata Extraction Framework (Python) - Upstream Ontologist
A common interface for finding metadata about upstream software projects (Rust) - GrimoireLab
Platform for software development analytics and insights (Python) - OSS Review Toolkit
A suite of CLI tools to automate software compliance checks (Kotlin) - Git Truck
Repository visualization. (TypeScript) - Onefetch
Offline command-line Git information tool (Rust) - Sokrates
Polyglot source code examination tool (Java)
Slop factor
Medium.
The architecture and non-boilerplate parts of the documentation were human-driven, but sizable chunks of the implementation were mostly Claude Code's doing and have been subject to only moderate post-facto human scrutiny.
Maintainers
Acknowledgments
Thank you to the CodeMeta Project Management Committee and contributors for their development and stewardship of the standard.
Jacob Peddicord's askalono project inspired the Dice-Sørensen scoring strategy used for classifying arbitrary license text.
Contributing
Issues and pull requests are welcome.
License
MIT © Eric Mika
