movie-mcp
v1.0.0
Published
AI-powered Movie Organizer MCP Server - Open source, no API keys required
Downloads
18
Maintainers
Readme
🎬 Movie Organizer MCP Server (OSS, No API Key)
1. Overview
Build a Model Context Protocol (MCP) Server that powers an AI Movie Organizer Bot.
This server will:
- analyze messy media files
- identify movies / TV shows
- fetch metadata from web (no API keys)
- organize + rename files
- learn over time using RAG
- reduce LLM (Claude) token usage drastically
💡 Fully:
- open-source
- self-hosted
- privacy-first
- no paid APIs
2. Core Capabilities
2.1 Media Identification
- parse messy filenames
- detect:
- title
- year
- resolution
- release group
- language
- season/episode
- support:
- movies
- TV series
- anime
- documentaries
2.2 Pattern-Based Search
- search using:
- raw filename
- cleaned title
- regex pattern
- wildcard
- folder names
- fuzzy matching support
- typo tolerance
- multilingual handling
3. Filesystem Navigation (Shell-like)
MCP must support safe system exploration.
Commands (tool-based, not raw shell)
ls(path)→ list files/folderscd(path)→ change working directorypwd()→ current directorytree(path)→ recursive structurestat(path)→ file metadatafind(pattern)→ search filesdu(path)→ disk usage
Safety
- sandboxed root directory
- no system-level destructive commands
- no execution of arbitrary shell
- read-only mode by default
4. Web Search & Crawling (No API)
Supported sources
- IMDb (public pages)
- Wikipedia
- Letterboxd
- Rotten Tomatoes
- Google search (HTML parsing)
- public blogs / articles
Tools
- requests / httpx
- BeautifulSoup
- Playwright (JS fallback)
- trafilatura
- readability-lxml
- SearxNG (optional self-hosted search)
5. Web Search Strategy
- title + year
- title + language
- filename cleaned query
- series + S01E01
- fallback: folder name
- multiple query reformulation
6. Metadata Extraction
Core fields
- title
- original title
- year
- runtime
- genres
- language
- country
People
- director
- cast
- writers
Ratings
- IMDb rating
- votes (if available)
Series fields
- season
- episode
- episode title
- air date
7. Matching Engine
Compare:
- filename vs web title
- year match
- language match
- cast overlap
- edition keywords
Detect:
- remakes
- director's cut
- extended version
- dubbed vs original
Output:
- best match
- confidence score (0–1)
- explanation
8. Rename Engine
Output format examples
Movies:
Inception (2010) [1080p BluRay x264]Series:
Breaking Bad - S01E01 - PilotFeatures
- dry run preview
- batch rename
- undo support
- customizable templates
9. RAG (Learning System)
Stores:
- past matches
- rename decisions
- user corrections
- aliases
- release patterns
- failures
Use cases:
- improve matching accuracy
- avoid repeated mistakes
- learn naming preferences
Storage options:
- SQLite + FTS
- Chroma / FAISS / LanceDB
- JSON knowledge base
10. RAG Retrieval Usage
- find similar filenames
- recall past corrections
- reuse rename patterns
- improve ranking confidence
11. Web Crawling Safety
Must:
- rate limit
- cache responses
- retry with backoff
- timeout handling
Must NOT:
- require login
- bypass protections illegally
- spam websites
12. Caching Layer
Cache:
- parsed filenames
- search results
- metadata summaries
- match decisions
Storage:
- SQLite
- Redis (optional)
13. MCP Tools
Expose tools:
parse_filenamesearch_mediasearch_webcrawl_imdbextract_metadatamatch_candidatesget_best_matchrename_previewapply_renamefilesystem_lsfilesystem_cdfilesystem_findrag_storerag_searchexplain_match
14. Explainability
Each decision must include:
- why match chosen
- what signals matched
- confidence score
- source references
Example:
- title match strong
- year matched
- IMDb + Wikipedia agree
- confidence: 0.93
15. Token Optimization (Claude)
Goal
Minimize token usage by 70–90%
15.1 Local-first Pipeline
- parse filename
- check cache
- query RAG
- web search (if needed)
- rank locally
- send minimal data to Claude
15.2 Compact Payloads
Send only:
{
"title": "Inception",
"year": 2010,
"candidates": [
{"title": "Inception", "year": 2010, "score": 0.95},
{"title": "Inception: The Cobol Job", "year": 2010, "score": 0.52}
]
}15.3 Never Send
❌ full HTML pages ❌ full directory dumps ❌ raw logs ❌ repeated metadata
15.4 Preprocessing
Convert:
Movie.Name.2010.1080p.x264.mkvInto:
{
"title": "Movie Name",
"year": 2010,
"quality": "1080p",
"type": "movie"
}15.5 Summarization
Return:
- title
- year
- rating
- 2–3 evidence lines
NOT full documents.
15.6 Cache Everything
Avoid re-calling LLM:
- same filename
- same query
- same decision
16. Architecture
Components
- MCP Server (core brain)
- Parser Engine
- Matching Engine
- Web Crawler
- RAG Store
- Cache Layer
- Filesystem Adapter
17. Suggested Stack (OSS Only)
Backend
- Python (FastAPI)
- Node.js optional
Parsing
- guessit
- regex custom
Crawling
- BeautifulSoup
- Playwright
- httpx
Storage
- SQLite
- Chroma / FAISS
Search
- SearxNG
18. Future Extensions
- subtitle auto-matching
- torrent/indexer integration
- Jellyfin metadata sync
- poster downloader
- duplicate detection
- quality upgrade suggestions
19. Key Design Principles
- local-first
- zero API cost
- explainable decisions
- deterministic before LLM
- token-efficient
- modular MCP tools
- safe filesystem access
20. Final Goal
A single intelligent MCP server that replaces:
- manual file organization
- messy naming
- repeated metadata lookup
- heavy LLM usage
And enables:
🤖 Fully automated AI-powered media organization
