company-dossier

v0.3.4

Published

3 days ago

Build a complete, sourced intelligence dossier on any company from public data — CLI, library and MCP server.

0High
0Medium
0Low

osint company-research competitive-intelligence dossier due-diligence business-intelligence company-data company-profile wayback dns-recon tech-stack mcp cli

company-dossier

Build a complete, sourced intelligence dossier on any company from public data — CLI, library and MCP server.

company-dossier compiles a structured, nine-section dossier on any company or domain using only PUBLIC sources: a live website crawl, DNS reconnaissance, the Internet Archive's Wayback Machine, a web-technology fingerprint, USASpending.gov federal contracts, and social-profile discovery. Every derived claim is annotated with its source, and sections without public data are clearly marked as gaps.

No API keys. No private databases. No login. Free.

🔗 https://companydossier.lol

Install

npm install -g company-dossier
# or run without installing:
npx company-dossier acme.com

Quickstart (CLI)

npx company-dossier acme.com

This writes an Acme DOSSIER/ folder containing one markdown file per section plus a machine-readable dossier.json.

# choose an output directory, stay quiet
company-dossier acme.com --out ./research --quiet

# research by name (no domain)
company-dossier "Acme Corporation"

# print JSON to stdout (good for piping)
company-dossier acme.com --json > acme.json

# only build specific sections
company-dossier acme.com --sections overview,tech,risk

Run company-dossier --help for all options.

The nine sections

Overview & identity — name, description, schema.org, keywords
People & org chart — contact emails, individual-pattern emails (gaps marked)
Hiring radar — careers pages and job URLs from the site/sitemap
Money trail — USASpending.gov federal contracts and obligations
Locations — structured addresses and phone numbers
Tech fingerprint — CMS, analytics, pixels, CDN, frameworks, email/DNS
News & timeline — Wayback history, growth, deleted pages, archived PDFs
Relationship web — social and external profiles
Risk flags — automated low-confidence technical signals (SPF/DMARC, churn)

Library usage

import { buildDossier, writeDossier } from 'company-dossier';

const result = await buildDossier('acme.com', {
  sections: ['overview', 'tech', 'risk'], // optional subset
});

// result.meta  — target, company name, sources & status
// result.json  — full structured data ({ meta, data })
// result.files — [{ path, content }] markdown + dossier.json

const folder = writeDossier(result, './research');
console.log('Written to', folder);

Individual collectors are also exported: collectWebsite, collectDns, collectWayback, extractTechStack, collectSearch.

MCP server

company-dossier ships an MCP server over stdio exposing a single tool, build_dossier, that returns the markdown and JSON.

{
  "mcpServers": {
    "company-dossier": {
      "command": "npx",
      "args": ["-y", "company-dossier-mcp"]
    }
  }
}

Tool input:

{ "target": "acme.com", "sections": ["overview", "tech", "risk"] }

Remote MCP server

In addition to the stdio server, company-dossier ships a remote (HTTP) MCP server over the MCP Streamable HTTP transport, exposing the same single tool, build_dossier. This is what you deploy so hosted assistants (ChatGPT Apps SDK, Claude connectors) can reach it over the network.

Run it locally:

npx company-dossier-mcp-http
# listening on http://0.0.0.0:8787  (override with PORT=...)

Endpoints: POST/GET/DELETE /mcp for the MCP session and GET /health (returns {"status":"ok"}). It listens on process.env.PORT || 8787.

Hosted endpoint

Deploy it (see deploy/README.md for one-command steps to Render, Fly.io, or any Docker host) and point a subdomain at it. The hosted MCP endpoint is then:

https://mcp.companydossier.lol/mcp

Claude connectors / Claude Desktop (custom connector): add a remote MCP server with URL https://mcp.companydossier.lol/mcp. The build_dossier tool becomes available.

ChatGPT (Apps SDK / connectors): add an MCP server pointing at https://mcp.companydossier.lol/mcp; ChatGPT discovers and calls the build_dossier tool.

Tool input (same as the stdio server):

{ "target": "acme.com", "sections": ["overview", "tech", "risk"] }

Docs-site output

company-dossier can turn a generated dossier into a themed, static Astro Starlight docs site — one page per section, an autogenerated sidebar, and a "case file" (ink-on-paper) theme.

# scaffold a Starlight site under "<Company> DOSSIER/site/"
company-dossier acme.com --out ./out --format site

# then build it (Node-only):
cd "./out/Acme DOSSIER/site" && npm install && npm run build   # → site/dist

Site-related flags:

| Flag | Default | Meaning | |------|---------|---------| | --format <folder\|site> | folder | site scaffolds the Starlight project. | | --deploy <none\|gh-pages> | none | gh-pages installs, builds, and publishes site/dist via the gh-pages package (an optional dep of the generated site). | | --subdomain <host> | — | Writes public/CNAME for a custom host. | | --no-noindex | (noindex on) | Opt out of the unlisted/noindex policy below. |

Unlisted by default (policy)

Per-company dossier sites are UNLISTED + NOINDEX by default. Every generated page carries:

<meta name="robots" content="noindex,nofollow"> (injected via Starlight head),
a visible disclaimer — "Auto-generated from public sources — may be inaccurate; not affiliated with the company" — plus a takedown/contact line, and
a public/robots.txt with Disallow: / (and public/.nojekyll).

Pass --no-noindex to allow indexing (robots meta is dropped and robots.txt becomes Allow: /).

The scaffold (<Company> DOSSIER/site/) contains package.json (astro + @astrojs/starlight, optional gh-pages), astro.config.mjs, src/content/config.ts (loose docs schema), src/content/docs/*.md (one per section + an index.md hero from the overview), src/styles/case-file.css, and public/ assets.

Library usage

import { buildDossier, writeDossier, generateSite } from 'company-dossier';

const result = await buildDossier('acme.com');
const folder = writeDossier(result, './out');           // "<Company> DOSSIER/"
const site = await generateSite(result, folder, {       // → folder + "/site/"
  noindex: true,        // default; set false to allow indexing
  subdomain: 'acme.example.com', // optional → public/CNAME
  title: 'Acme',        // optional override (defaults to company name)
});
console.log(site.siteDir, site.files.length);

Output

A <Company> DOSSIER/ folder with README.md, nine numbered markdown sections, and dossier.json.
With --format site, additionally a <Company> DOSSIER/site/ Astro Starlight project (build it with npm install && npm run build).
With --json, the structured dossier is printed to stdout instead.

Public sources only

This tool reads only publicly accessible data and clearly labels every claim with its source. It does not perform authentication, scraping behind logins, or access to paid databases. Network-blocked or empty sources are reported as gaps, never fabricated. Risk flags are automated signals, not legal or financial advice.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

company-dossier

Install

Quickstart (CLI)

The nine sections

Library usage

MCP server

Remote MCP server

Hosted endpoint

Docs-site output

Unlisted by default (policy)

Library usage

Output

Public sources only

License