company-dossier
v0.3.4
Published
Build a complete, sourced intelligence dossier on any company from public data — CLI, library and MCP server.
Maintainers
Readme
company-dossier
Build a complete, sourced intelligence dossier on any company from public data — CLI, library and MCP server.
company-dossier compiles a structured, nine-section dossier on any company or
domain using only PUBLIC sources: a live website crawl, DNS reconnaissance, the
Internet Archive's Wayback Machine, a web-technology fingerprint, USASpending.gov
federal contracts, and social-profile discovery. Every derived claim is annotated
with its source, and sections without public data are clearly marked as gaps.
No API keys. No private databases. No login. Free.
🔗 https://companydossier.lol
Install
npm install -g company-dossier
# or run without installing:
npx company-dossier acme.comQuickstart (CLI)
npx company-dossier acme.comThis writes an Acme DOSSIER/ folder containing one markdown file per section
plus a machine-readable dossier.json.
# choose an output directory, stay quiet
company-dossier acme.com --out ./research --quiet
# research by name (no domain)
company-dossier "Acme Corporation"
# print JSON to stdout (good for piping)
company-dossier acme.com --json > acme.json
# only build specific sections
company-dossier acme.com --sections overview,tech,riskRun company-dossier --help for all options.
The nine sections
- Overview & identity — name, description, schema.org, keywords
- People & org chart — contact emails, individual-pattern emails (gaps marked)
- Hiring radar — careers pages and job URLs from the site/sitemap
- Money trail — USASpending.gov federal contracts and obligations
- Locations — structured addresses and phone numbers
- Tech fingerprint — CMS, analytics, pixels, CDN, frameworks, email/DNS
- News & timeline — Wayback history, growth, deleted pages, archived PDFs
- Relationship web — social and external profiles
- Risk flags — automated low-confidence technical signals (SPF/DMARC, churn)
Library usage
import { buildDossier, writeDossier } from 'company-dossier';
const result = await buildDossier('acme.com', {
sections: ['overview', 'tech', 'risk'], // optional subset
});
// result.meta — target, company name, sources & status
// result.json — full structured data ({ meta, data })
// result.files — [{ path, content }] markdown + dossier.json
const folder = writeDossier(result, './research');
console.log('Written to', folder);Individual collectors are also exported: collectWebsite, collectDns,
collectWayback, extractTechStack, collectSearch.
MCP server
company-dossier ships an MCP server over
stdio exposing a single tool, build_dossier, that returns the markdown and JSON.
{
"mcpServers": {
"company-dossier": {
"command": "npx",
"args": ["-y", "company-dossier-mcp"]
}
}
}Tool input:
{ "target": "acme.com", "sections": ["overview", "tech", "risk"] }Remote MCP server
In addition to the stdio server, company-dossier ships a remote (HTTP) MCP
server over the MCP Streamable HTTP transport, exposing the same single tool,
build_dossier. This is what you deploy so hosted assistants (ChatGPT Apps SDK,
Claude connectors) can reach it over the network.
Run it locally:
npx company-dossier-mcp-http
# listening on http://0.0.0.0:8787 (override with PORT=...)Endpoints: POST/GET/DELETE /mcp for the MCP session and GET /health
(returns {"status":"ok"}). It listens on process.env.PORT || 8787.
Hosted endpoint
Deploy it (see deploy/README.md for one-command steps to
Render, Fly.io, or any Docker host) and point a subdomain at it. The hosted MCP
endpoint is then:
https://mcp.companydossier.lol/mcpClaude connectors / Claude Desktop (custom connector): add a remote MCP
server with URL https://mcp.companydossier.lol/mcp. The build_dossier tool
becomes available.
ChatGPT (Apps SDK / connectors): add an MCP server pointing at
https://mcp.companydossier.lol/mcp; ChatGPT discovers and calls the
build_dossier tool.
Tool input (same as the stdio server):
{ "target": "acme.com", "sections": ["overview", "tech", "risk"] }Docs-site output
company-dossier can turn a generated dossier into a themed, static
Astro Starlight docs site — one page per
section, an autogenerated sidebar, and a "case file" (ink-on-paper) theme.
# scaffold a Starlight site under "<Company> DOSSIER/site/"
company-dossier acme.com --out ./out --format site
# then build it (Node-only):
cd "./out/Acme DOSSIER/site" && npm install && npm run build # → site/distSite-related flags:
| Flag | Default | Meaning |
|------|---------|---------|
| --format <folder\|site> | folder | site scaffolds the Starlight project. |
| --deploy <none\|gh-pages> | none | gh-pages installs, builds, and publishes site/dist via the gh-pages package (an optional dep of the generated site). |
| --subdomain <host> | — | Writes public/CNAME for a custom host. |
| --no-noindex | (noindex on) | Opt out of the unlisted/noindex policy below. |
Unlisted by default (policy)
Per-company dossier sites are UNLISTED + NOINDEX by default. Every generated page carries:
<meta name="robots" content="noindex,nofollow">(injected via Starlighthead),- a visible disclaimer — "Auto-generated from public sources — may be inaccurate; not affiliated with the company" — plus a takedown/contact line, and
- a
public/robots.txtwithDisallow: /(andpublic/.nojekyll).
Pass --no-noindex to allow indexing (robots meta is dropped and robots.txt
becomes Allow: /).
The scaffold (<Company> DOSSIER/site/) contains package.json
(astro + @astrojs/starlight, optional gh-pages), astro.config.mjs,
src/content/config.ts (loose docs schema), src/content/docs/*.md
(one per section + an index.md hero from the overview),
src/styles/case-file.css, and public/ assets.
Library usage
import { buildDossier, writeDossier, generateSite } from 'company-dossier';
const result = await buildDossier('acme.com');
const folder = writeDossier(result, './out'); // "<Company> DOSSIER/"
const site = await generateSite(result, folder, { // → folder + "/site/"
noindex: true, // default; set false to allow indexing
subdomain: 'acme.example.com', // optional → public/CNAME
title: 'Acme', // optional override (defaults to company name)
});
console.log(site.siteDir, site.files.length);Output
- A
<Company> DOSSIER/folder withREADME.md, nine numbered markdown sections, anddossier.json. - With
--format site, additionally a<Company> DOSSIER/site/Astro Starlight project (build it withnpm install && npm run build). - With
--json, the structured dossier is printed to stdout instead.
Public sources only
This tool reads only publicly accessible data and clearly labels every claim with its source. It does not perform authentication, scraping behind logins, or access to paid databases. Network-blocked or empty sources are reported as gaps, never fabricated. Risk flags are automated signals, not legal or financial advice.
License
MIT © EVERJUST
