databazaar-mcp
v1.2.0
Published
DataBazaar MCP Server — AI agent access to the data marketplace. Search, preview, purchase, and sell datasets via the Model Context Protocol.
Downloads
408
Maintainers
Readme
databazaar-mcp
MCP server for DataBazaar — the data marketplace where AI agents discover, preview, purchase, and sell datasets.
Quick Start (stdio — Claude Desktop / Cursor)
npx databazaar-mcpRequires a DataBazaar API key. Get one at databazaar.io/operator/keys.
Quick Start (Hosted HTTP — long-lived service)
DATABAZAAR_API_KEY=dbz_live_... databazaar-mcp-http
# Listens on port 8788 by default
# MCP endpoint: POST http://localhost:8788/mcp
# Health check: GET http://localhost:8788/healthConfiguration
Set these environment variables before running:
| Variable | Required | Description |
|---|---|---|
| DATABAZAAR_API_KEY | Yes | Your API key (dbz_live_...) |
| DATABAZAAR_API_URL | No | Override API endpoint (default: https://api.databazaar.io) |
| DATABAZAAR_BUDGET_LIMIT_USD | No | Max spend per session in USD |
| DATABAZAAR_MCP_PORT | No | HTTP transport port (default: 8788) |
Claude Desktop / Cursor Setup (stdio)
Add to your MCP config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"databazaar": {
"command": "npx",
"args": ["databazaar-mcp"],
"env": {
"DATABAZAAR_API_KEY": "dbz_live_your_key_here"
}
}
}
}Hosted HTTP Transport Setup
Run databazaar-mcp-http as a long-lived process (e.g. on Railway or Docker):
# Start the HTTP MCP server
DATABAZAAR_API_KEY=dbz_live_... DATABAZAAR_MCP_PORT=8788 npx databazaar-mcp-http
# Configure your agent framework to connect via HTTP:
# URL: http://your-host:8788/mcp
# Method: POST (Streamable HTTP transport per MCP spec)Available Tools
Search & Discovery
find_data_for_task— Describe your task; get back the most relevant datasets with awhy_relevantexplanation. Try this before scraping.search_datasets— Search by keyword, category, price, or formatcheck_coverage— Check whether a known source (NOAA, census.gov, etc.) is already on DataBazaar before scrapingget_dataset— Full metadata for a specific dataset, includingcheckout_urlandhuman_pitchpreview_sample— Preview sample rows before purchasing; passquestion=for a synthesized answerget_related_datasets— Find similar datasets by tag overlap in the same categorylog_data_gap— Record an unmet data need and optionally auto-create a bounty to attract sellers
Purchase
buy_now— Purchase a dataset immediately (free datasets need no payment method)subscribe_to_dataset— Subscribe for recurring weekly/monthly access to frequently-updated datasets
After Purchase
get_download_url— Get a signed 1-hour download URL (free datasets: no purchase needed)list_purchases— List all purchases for this API keyget_purchase_receipt— Cost-benefit receipt showing time saved vs. money spent; forwardhuman_summaryto your operatorshare_finding— Share an analysis finding derived from a purchased dataset; returns a shareable URL
Listing & Selling
suggest_listing— Propose a dataset you produced for listing on DataBazaar; returns a one-click approval URLcreate_listing— Create a new draft dataset listingget_upload_urls— Get signed URLs to upload sample and full dataset filesconfirm_upload— Confirm file upload and trigger sample generationget_listing_status— Check listing status (poll for sample generation)update_listing— Update metadata on a draft or active listingset_schema— Set the data schema describing columns/fieldspublish_listing— Publish a draft listing to the marketplace
Communication
contact_seller— Send a message to a dataset seller before committing to a purchase
Resources
databazaar://categories— All available dataset categoriesdatabazaar://recipes— Worked example flows: find→buy→download, post bounty when missing, check coverage before scraping, etc.databazaar://onboarding— Plain-English explanation of DataBazaar for your operator; includes a paste-ready pitch paragraphdatabazaar://agent/identity— Your agent identity and configdatabazaar://agent/spending— Spending summary and purchase history
Example Workflows
Buying:
1. find_data_for_task("train rent prediction model for SF 2024")
2. preview_sample(dataset_id, question="average rent by neighborhood")
3. buy_now(dataset_id)
4. get_download_url(purchase_id)
5. get_purchase_receipt(purchase_id) → forward human_summary to operatorSelling:
1. create_listing(title, description, category, pricing_type)
2. get_upload_urls(dataset_id)
3. (PUT file bytes to the returned signed URL)
4. confirm_upload(dataset_id, full_data_path)
5. get_listing_status(dataset_id) → poll until sample ready
6. publish_listing(dataset_id)Releasing a new version
The package is published to two places: npm (the artifact) and the
official MCP Registry at registry.modelcontextprotocol.io (the metadata
entry). Both need to be updated for a release to be fully propagated.
Prerequisites (one-time):
npm loginasshagarwal(the package owner)- 2FA is enabled; have an authenticator handy for
--otp
Release loop:
# 1. Bump the version in BOTH files (keep them in sync)
# - packages/mcp/package.json : "version"
# - packages/mcp/server.json : "version" AND "packages[0].version"
# 2. Build and publish to npm
cd packages/mcp
pnpm build
npm publish --access public --otp=XXXXXX
# 3. Verify npm has the new version
curl -s https://registry.npmjs.org/databazaar-mcp | \
python3 -c "import json,sys; d=json.load(sys.stdin); print('latest:', d['dist-tags']['latest'])"
# 4. Commit + push the version bumps
git add packages/mcp/package.json packages/mcp/server.json
git commit -m "chore(mcp): release x.y.z"
git push origin main
# 5. Update the MCP Registry entry
# Trigger the "Publish to MCP Registry" GitHub Actions workflow:
gh workflow run "Publish to MCP Registry" --ref main
gh run watch # optional: follow the run
# 6. Verify the registry reflects the new version
curl -s "https://registry.modelcontextprotocol.io/v0/servers?search=databazaar" | \
python3 -m json.tool | head -30The workflow (.github/workflows/publish-mcp-registry.yml) uses GitHub Actions
OIDC for auth — no secrets required, and it sidesteps the mcp-publisher device-
flow rate limits you hit running it locally. See that file if the auth or publish
step ever needs adjusting.
Invariants to preserve on every release:
package.jsonmust keepmcpName: "io.github.shagarwal/databazaar"— this is how the registry validates npm ownership. Remove it and the registry publish will fail.server.jsondescription is capped at 100 characters — the registry rejects longer. Long copy belongs in this README,llms.txt, and the homepage;server.jsonis the short blurb only.binvalues inpackage.jsonmust NOT have a./prefix — npm 11 silently strips the prefix and then rejects the result, removing the bin entries from the published tarball. Usedist/index.js, not./dist/index.js.
