@agentutility/mcp-mediakit
v0.1.8
Published
MCP server for the @agentutility mediakit cluster — pay-per-call x402 tools, no API keys, USDC on Base.
Maintainers
Readme
@agentutility/mcp-mediakit
One endpoint per format. Pay per call.
50 endpoints for PDF, image, video, audio, office, OCR, transcription, watermarking, and format conversion. Whatever the user hands the agent, the agent can handle.
Pricing: pay-per-call in USDC on Base. No subscriptions, no API keys. See per-tool prices below.
Install — Claude Desktop
Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"agentutility-mediakit": {
"command": "npx",
"args": ["-y", "@agentutility/mcp-mediakit"],
"env": { "X402_PRIVATE_KEY": "0xYOUR_PRIVATE_KEY_HEX" }
}
}
}Restart Claude Desktop. 50 tools appear in the tool palette.
Install — Cursor
Add to .cursor/mcp.json:
{
"mcpServers": {
"agentutility-mediakit": {
"command": "npx",
"args": ["-y", "@agentutility/mcp-mediakit"],
"env": { "X402_PRIVATE_KEY": "0x..." }
}
}
}Funding
Send any amount of USDC on Base mainnet to the address derived from your X402_PRIVATE_KEY. The MCP server uses it to pay for tool calls automatically.
USDC on Base contract: 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913
Tools (50)
| Tool | Description |
|---|---|
| audio-loudnorm | (0.02 USDC/call) Audio loudness normalizer (EBU R128 LUFS). Podcast / Spotify / YouTube target presets (-23 / -16 / -14). Two-pass dynamic mode. Returns hosted MP3. |
| audio-transcribe | (0.01 USDC/call) Audio transcribe / speech-to-text / Whisper-large / multi-language ASR / OpenAI Whisper API compat. Server-side fetches the audio URL (max 25 MB), relays to Venice's audio/transcriptions endpoint with whisper-large-v3, and returns the transcript with detected language, duration, and per-segment timestamps when response_format='verbose_json' (default). Also supports raw text, SRT, and VTT outputs. |
| compress-pdf | (0.005 USDC/call) PDF compressor / PDF size reducer. CloudConvert optimize task. Profiles: web (default), print, archive, mrc (scanned), max. Reports % saved. |
| convert-html-to-markdown | (0.005 USDC/call) Convert HTML to Markdown. Strips nav, scripts, ads, and other boilerplate. Preserves headings, lists, tables, code blocks, links, and images. Accepts raw HTML or a URL. Returns clean Markdown ideal for LLM context windows or RAG ingestion. |
| convert-pdf | (0.20 USDC/call) Convert PDF to Markdown, HTML, JSON, or structured text via Datalab Marker. AI-powered, layout-aware, best-in-class for tables / equations / multi-column docs. For PDF→JPG/PNG see pdf-to-jpg; for PDF merge see pdf-merge; for PDF split see pdf-split; for PDF compression see compress-pdf. 30 pages max. |
| csv-to-ics | (0.01 USDC/call) CSV calendar to ICS / iCal file generator. RFC 5545 compliant. Auto-detects column mapping (summary/date/time/location). All-day + timed events. Up to 1000 rows. |
| csv-to-jsonl | (0.02 USDC/call) CSV to JSON / CSV to JSONL converter / data pipeline preprocessor. RFC 4180 parser. Type inference (booleans, integers, floats, ISO dates, null). Configurable delimiter, quote, header, rename, drop columns. |
| excel-to-csv | (0.005 USDC/call) Excel (.xlsx / .xls) → CSV / TSV / JSON converter. Multi-sheet handling. Returns each sheet by name. Adjacent to 'convert excel to google sheets' demand cluster. |
| excel-to-google-sheets | (0.005 USDC/call) Convert Excel to Google Sheets / XLSX to Google Sheets / spreadsheet import / Numbers to Google Sheets / Excel to gsheet. Outputs CSV that imports directly into Google Sheets via File → Import → Upload (or paste-into-cells). Multi-sheet handling, encoding control, quote style. Same handler as excel-to-csv / xlsx-to-csv under a Google-Sheets-named slug. |
| extract-tables | (0.10 USDC/call) Extract tables from PDF / table extractor / PDF to CSV / spreadsheet from PDF. Detects and extracts every table from a PDF document. Returns structured JSON or CSV per table. 30 pages max via Datalab Marker. |
| html-to-markdown | (0.005 USDC/call) HTML → Markdown converter. Accepts raw HTML or a URL. Strips nav/script/style/ad noise. Preserves headings, lists, tables, code blocks, links, images. |
| html-to-pdf | (0.08 USDC/call) URL to PDF / HTML to PDF / webpage screenshot to PDF. CloudConvert capture-website. Configurable page size, orientation, margins, wait conditions. Renders JS. |
| image-convert | (0.01 USDC/call) Universal image format converter (PNG, JPG, WEBP, AVIF, GIF, BMP, TIFF, ICO, HEIC, HEIF, PSD, SVG). Optional resize + quality. CloudConvert engine. |
| image-format-convert | (0.01 USDC/call) Image converter. Convert any image between PNG, JPG, WEBP, AVIF, GIF, BMP, TIFF, ICO, HEIC, HEIF, PSD, and SVG. Optional resize and quality. CloudConvert engine. Same backend as image-convert under a more search-friendly slug. |
| image-translate | (0.02 USDC/call) Image translator: vision-OCR + Venice translate. Demand-intel: 40 unmet signals for 'how to translate a picture'. |
| image-upscale | (0.02 USDC/call) Image upscale / 2x upscaler / 4x upscaler / super-resolution / sharpen image / enlarge image without loss. Upscales an image 2x or 4x via Venice's image/upscale endpoint (default model: venice-sd35). Returns a permanent fal-hosted URL. |
| json-yaml | (0.002 USDC/call) JSON ↔ YAML bidirectional converter. Auto-detects input format. Pure parse, no upstream API. |
| logo-detect | (0.03 USDC/call) Brand logo detection / brand recognition in images. Vision LLM. Returns brands with confidence, location, evidence (wordmark/logomark/lockup/color_scheme), element_type. Supports hint_brands. |
| merge-pdf | (0.01 USDC/call) PDF merger / combine PDFs / concatenate PDF files / join multiple PDFs into one. 2-50 input PDFs from URLs to single PDF. Preserves bookmarks. CloudConvert engine. |
| mp4-to-mp3 | (0.10 USDC/call) MP4 → MP3 audio extractor. Any video format (mov, webm, mkv, avi, m4v, flv) → MP3 via CloudConvert. Selectable bitrate (96/128/192 kbps). 60-min / 500MB max. |
| ocr | (0.20 USDC/call) OCR / optical character recognition / scanned document extractor / image-PDF to text. Run OCR on scanned PDFs and image-based documents. Datalab Marker engine — preserves layout, tables, math. Returns clean Markdown or plain text. 30 pages max. |
| office-to-pdf | (0.05 USDC/call) Office to PDF converter — DOCX/DOC, XLSX/XLS, PPTX/PPT, ODT/ODS/ODP, RTF, TXT, CSV, EPUB, MD, HTML, Apple Pages/Numbers/Keynote → PDF. CloudConvert engine. |
| pdf-compress | (0.005 USDC/call) PDF compressor / shrink PDF / PDF size reducer / smaller PDF for email. Three quality levels: ebook (lowest, web-quality), printer (medium), prepress (highest, archival). CloudConvert engine. |
| pdf-extract-tables | (0.10 USDC/call) PDF table extractor / table from PDF / scanned-table parsing / financial-table OCR / multi-page table consolidator / Datalab Marker tables. AI + OCR pipeline that finds every table in a PDF (digital or scanned) and returns row × column text matrices, page-by-page. Optional cell bounding boxes for downstream layout reconstruction. Optional page_range filter ('1-5', '3', '1,3,5'). Handles merged headers, multi-page financial statements, balance sheets, lab results, scanned reports. 30 pages max. Sibling of pdf-to-markdown using the same Datalab backend, but pre-parsed to tables only. |
| pdf-merge | (0.01 USDC/call) PDF merger / PDF combiner / PDF concatenator. 2-50 PDFs from URLs → single PDF. Preserves bookmarks. CloudConvert engine. |
| pdf-split | (0.04 USDC/call) PDF splitter / PDF page extractor. Two modes: page ranges (['1-3','5','7-end']) or one PDF per source page. CloudConvert engine. |
| pdf-to-jpg | (0.10 USDC/call) PDF to JPG / PNG / WEBP image converter. Renders every page at configurable DPI (36-600). Returns one image URL per page. CloudConvert backend. |
| pdf-to-markdown | (0.20 USDC/call) AI PDF extractor: PDF to Markdown / HTML / structured JSON via Datalab Marker. OCR + layout-aware. Best-in-class for tables, equations, multi-column docs. 30 pages max. |
| pdf-to-text | (0.20 USDC/call) PDF to text / extract text from PDF. AI-powered text extraction via Datalab Marker. OCR + layout-aware. Best-in-class for tables, equations, multi-column layouts. Outputs plain text or structured markdown. 30 pages max. |
| pdf-watermark | (0.02 USDC/call) PDF watermark / image watermark / video watermark — text or image overlay on PDFs, PNG/JPG/GIF, or MP4/MOV/WEBM. Configurable position, opacity, font, rotation, margin. CloudConvert engine. |
| pdf2md | (0.20 USDC/call) PDF to Markdown converter. AI PDF extractor. Datalab Marker — OCR + layout-aware. Best-in-class for tables, equations, multi-column. |
| receipt-ocr | (0.01 USDC/call) Receipt OCR. Reads any receipt photo and returns a structured JSON object with vendor, address, date, line items (qty / unit_price / total), subtotal, tax, tip, total, and payment method. Vision-LLM powered. Same backend as receipt-parser under a clearer slug for expense + accounting + reimbursement workflows. |
| receipt-parser | (0.01 USDC/call) Receipt → structured JSON (vendor, address, date, line items with qty/unit_price/total, subtotal, tax, tip, total, payment method). Vision LLM only. |
| speaker-diarize | (0.10 USDC/call) Speaker diarization / who-said-what transcription. Whisper v3 + speaker labels. Returns utterances grouped by speaker, plus per-speaker stats (count, seconds, words). 60 min max. |
| split-pdf | (0.04 USDC/call) PDF splitter / PDF page extractor / split PDF by range / PDF to multiple files. Split a PDF by page ranges into multiple PDFs. CloudConvert engine. |
| subtitles | (0.08 USDC/call) SRT / VTT subtitle generator from video or audio. Whisper v3. Word-wrapped, ready for VLC / Premiere / FFmpeg. |
| transcribe | (0.10 USDC/call) Video / audio transcription via Whisper v3. 90+ languages, translate-to-English mode, optional speaker diarization. 60-min max. |
| upscale-image | (0.10 USDC/call) AI image upscaler / super-resolution / image enlarger. ESRGAN. 1-8× scale factor. Best for photos and illustrations. fal.ai backend. |
| video-summarize | (0.10 USDC/call) Video summarizer / podcast summarizer / lecture notes generator. One call: Whisper v3 transcribes + Mistral summarizes. 5 styles: tldr, bullets, paragraph, executive, chapters. Returns summary + transcript. 60 min max. |
| video-thumbnail | (0.03 USDC/call) Video thumbnail / video frame extractor. First, middle, or last frame as JPG. fal.ai ffmpeg. Fast — no full transcode. |
| video-to-audio | (0.10 USDC/call) Video → audio extractor / video to audio converter. Extract MP3 audio track from any video URL (MP4, MOV, WEBM, MKV, AVI, M4V, FLV). Selectable bitrate (96/128/192 kbps). Useful for podcast extraction, audio archival, transcription pre-processing. 60-min / 500MB max. CloudConvert backend. |
| video-to-subtitles | (0.02 USDC/call) SRT / VTT subtitle generator from video or audio. Whisper v3 powered. Word-wrapped, ready for VLC / Premiere / FFmpeg. Auto-detect language + translate-to-English. |
| video-to-text | (0.10 USDC/call) Video transcription / audio transcription via Whisper v3 large. Auto-detects 90+ languages. Translate-to-English mode. Speaker diarization optional. 60 min max. |
| video-transcribe | (0.10 USDC/call) Transcribe video / video transcription / video to audio transcription / video-to-text. Whisper v3 large transcription for any video URL. Auto-detects 90+ languages. Translate-to-English mode. 60-min / 500MB max. Speaker diarization optional. Same backend as video-to-text under a clearer slug. |
| video-trim | (0.02 USDC/call) Video trimmer / video cutter / video clip tool. Pass start + end OR start + duration. HH:MM:SS, MM:SS, or seconds. CloudConvert + x264 re-encode. |
| watermark | (0.02 USDC/call) PDF / image / video watermarking — text or image overlay. CloudConvert engine. Configurable position, opacity, font, rotation, margin. Works on PDFs, PNG/JPG/GIF, MP4/MOV/WEBM. |
| watermark-pdf | (0.02 USDC/call) Add watermark to PDF. Text or image overlay on PDFs, PNG / JPG / GIF, or MP4 / MOV / WEBM with configurable position, opacity, font, rotation, and margin. Same backend as watermark / pdf-watermark under a clearer search slug. CloudConvert engine. |
| xlsx-to-csv | (0.005 USDC/call) Excel to CSV / XLSX to CSV / Numbers to CSV / spreadsheet to CSV. Convert any sheet of an .xlsx, .xlsm, .xls, or .ods workbook to CSV. Sheet selection, encoding, quote style. CloudConvert. |
| xml-to-word | (0.05 USDC/call) XML → Microsoft Word (DOCX) converter via CloudConvert. Demand-intel: 43 unmet signals. |
| youtube-transcript | (0.01 USDC/call) YouTube transcript / closed-caption fetcher / video subtitles puller / auto-generated CC reader. Pulls YouTube auto-generated or manual captions for any video and returns the full text plus per-segment {start, duration, text}. Optional language pick. Backed by Supadata's transcript pipeline (server-side; no caller key required). If no transcript is available for the video, returns a 404 with a clear error. |
How it works
- Agent calls a tool (e.g.
audio-loudnorm). - MCP server POSTs to
https://x402.agentutility.ai/audio-loudnorm. - The endpoint responds HTTP 402 with payment instructions.
- The MCP server signs an EIP-3009 USDC transfer authorization with
X402_PRIVATE_KEYand retries. - CDP facilitator settles on Base.
- The endpoint returns the actual response.
The agent never sees the payment flow — it just gets the result.
Links
- Cluster overview: https://agentutility.ai/mediakit/
- All MCP packages: https://mcp.agentutility.ai/
- Source: https://github.com/rooz21/x402/tree/main/packages/mcp-mediakit
Version: 0.1.8 · License: MIT
