@aloth/cred1

v2026.6.16

Published

7 days ago

CRED-1 domain credibility dataset — JavaScript library and CLI for checking news source reliability

0High
0Medium
0Low

credibility misinformation fact-check domain dataset browser-extension news media-literacy fake-news disinformation journalism reliability cred1 cli

CRED-1: Open Domain Credibility Dataset

CRED-1 is an open, reproducible domain-level credibility dataset combining multiple openly-licensed source lists with computed enrichment signals. It provides credibility scores for 2,672 domains known to publish mis/disinformation, conspiracy theories, or other unreliable content.

🎓 Presented at ACM WebSci 2026 (Braunschweig). Landing page: aloth.github.io/agentic-ai-information-integrity/cred-1. First production integration: Trackless Links for iOS and macOS, with free codes for readers and attendees: gutscheinhub.de/ratgeber/trackless-links-cred-1-acm-websci-2026.

Paper: A. Loth, M. Kappes, and M.-O. Pahl, "CRED-1: An Open Multi-Signal Domain Credibility Dataset for Automated Pre-Bunking of Online Misinformation," Preprint, 2026. doi:10.2139/ssrn.6448466

Install

# CLI (global)
npm install -g @aloth/cred1

# Library (project dependency)
npm install @aloth/cred1

# Or try without installing
npx @aloth/cred1 check infowars.com

CLI Usage

# Single domain lookup
cred1 check infowars.com
# 🔴  infowars.com
#    Score:    0.073 / 1.000
#    Category: conspiracy
#    Level:    low
#    Sources:  2

# Domain not in dataset
cred1 check nytimes.com
# ⚪  nytimes.com
#    Not found in CRED-1 dataset — treat as unknown/neutral

# Batch processing (stdin)
echo -e "rt.com\ninfowars.com\nnytimes.com" | cred1 batch

# JSON output
cred1 check breitbart.com --json

# Search
cred1 search "news"
cred1 search "\.ru$"

# Statistics
cred1 stats
cred1 categories

Domain normalization is automatic — https://www.infowars.com/politics/ resolves to infowars.com.

MCP Server (Claude Desktop / Cursor / Windsurf)

CRED-1 ships an MCP server so AI assistants can check domain credibility directly.

Tools exposed

| Tool | Description | |------|-------------| | check_domain | Check a single domain (score, category, level, metadata) | | batch_check | Check up to 100 domains at once | | search_domains | Search domains by substring or regex pattern | | get_stats | Dataset statistics (total, per-category counts, version) | | get_categories | Category taxonomy with descriptions and score ranges |

Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "cred1": {
      "command": "npx",
      "args": ["-y", "@aloth/cred1", "--mcp"]
    }
  }
}

Note: --mcp is handled by the CLI wrapper — or run the dedicated binary directly.

Alternatively, if the package is installed globally:

{
  "mcpServers": {
    "cred1": {
      "command": "cred1-mcp"
    }
  }
}

Cursor

Add to .cursor/mcp.json (project) or ~/.cursor/mcp.json (global):

{
  "mcpServers": {
    "cred1": {
      "command": "npx",
      "args": ["-y", "@aloth/cred1", "--mcp"]
    }
  }
}

Windsurf

Add to ~/.codeium/windsurf/mcp_config.json:

{
  "mcpServers": {
    "cred1": {
      "command": "npx",
      "args": ["-y", "@aloth/cred1", "--mcp"]
    }
  }
}

OpenClaw / any MCP-compatible host

{
  "command": "cred1-mcp",
  "transport": "stdio"
}

Library Usage

import { checkDomain, searchDomains, getStats } from '@aloth/cred1';

// Single lookup
const result = checkDomain('infowars.com');
// { domain: 'infowars.com', score: 0.073, category: 'conspiracy', level: 'low', sources: 2, domainAge: 27.3, trancoRank: 15889 }

// Not found → null
const unknown = checkDomain('nytimes.com'); // null

// Search by pattern (substring or regex)
const russian = searchDomains('\\.ru$');

// Dataset statistics
const stats = getStats();
// { totalDomains: 2673, categories: { unreliable: 2001, fake: 233, ... }, version: '1.0.0' }

Traffic-Light Scoring

| Level | Score | Emoji | Meaning | |---------|-----------|-------|----------------------------------| | low | ≤ 0.20 | 🔴 | High credibility risk | | mixed | 0.21–0.50 | 🟡 | Unreliable or mixed signals | | ok | > 0.50 | 🟢 | Generally considered reliable | | neutral | not found | ⚪ | Unknown — absence ≠ trustworthy |

Key Features

2,672 domains with credibility scores (0.0–1.0)
Dual-mode — works as CLI tool and JavaScript library
Fully reproducible — Python pipeline rebuilds the dataset from scratch
Multi-signal scoring combining source labels, domain age, web popularity, fact-check frequency, and threat intelligence
Privacy-preserving — designed for on-device client-side deployment (no server calls needed)
Two openly-licensed sources — no proprietary data dependencies
Domain normalization — handles www., protocols, paths automatically

Dataset Schema

Compact Format (`cred1_compact.json`)

{
  "infowars.com": { "c": "c", "s": 0.073, "n": 2, "d": "1999-10-04", "r": 15889 }
}

| Field | Description | |-------|-------------| | c | Category code: f=fake, u=unreliable, m=mixed, c=conspiracy, s=satire, r=reliable | | s | Credibility score (0.0–1.0, lower = less credible) | | n | Number of independent source lists flagging this domain | | d | Domain registration date (optional) | | r | Tranco Top-1M rank (optional — lower rank = more popular) |

Full Format (`cred1_current.json`)

{
  "infowars.com": {
    "category": "fake",
    "credibility_score": 0.14,
    "domain_age_years": 26.4,
    "domain_registered": "1999-10-04T04:00:00Z",
    "iffy_factual": "VL",
    "iffy_bias": "FN",
    "iffy_score": 0.1,
    "factcheck_claims": 52,
    "safe_browsing_flagged": false,
    "score_age": 0.2,
    "score_cat": 0.05,
    "score_factcheck": 0.0,
    "score_iffy": 0.1,
    "score_safebrowsing": 0.05,
    "score_tranco": 0.1,
    "sources": 2,
    "tranco_rank": 4382
  }
}

See CODEBOOK.md for full field documentation.

Rebuilding the Dataset

cd pipeline/
python3 build_dataset.py              # Full pipeline
python3 build_dataset.py --step fetch # Download raw data only
python3 build_dataset.py --step merge # Parse + merge (requires prior fetch)
python3 enrich_dataset.py             # Add enrichment signals (API keys required)

Versioning

CRED-1 uses calendar versioning (CalVer) across all distribution channels:

| Channel | Format | Example | |---------|--------|---------| | GitHub Release | v2026-06-13 | Tag + Zenodo archive | | npm package | 2026.6.13 | Same date, dot-separated (valid semver) |

A new version is released weekly with rescored domains. The npm package updates automatically with each GitHub release — no separate version scheme needed.

To pin a specific dataset version:

npm install @aloth/[email protected]

Production Integrations

Trackless Links — Safari extension for iOS and macOS with real-time CRED-1 credibility warnings
HuggingFace — Dataset mirror for ML pipelines

Citation

@misc{loth2026cred1,
  author       = {Loth, Alexander and Kappes, Martin and Pahl, Marc-Oliver},
  title        = {{CRED-1}: An Open Multi-Signal Domain Credibility Dataset for Automated Pre-Bunking of Online Misinformation},
  year         = 2026,
  doi          = {10.2139/ssrn.6448466},
  url          = {https://github.com/aloth/cred-1}
}

License

Dataset: CC BY 4.0
Code & CLI: MIT

Author

Alexander Loth — alexloth.com · @xlth · ORCID