@gherk/requirements-extractor
v1.2.0
Published
MCP server that extracts, classifies and generates structured requirements from PDF documents using Ollama LLM
Readme
@gherk/requirements-extractor
An MCP (Model Context Protocol) server that extracts, classifies, and generates structured requirement documents from PDF files using a local Ollama LLM.
Features
- 3-Pass AI Pipeline — Identify → Classify → Generate requirements from any PDF
- Async Processing — Returns a job ID immediately; poll for incremental results
- 6 Categories — Backend, Frontend, Mobile, Infrastructure, DevOps, Non-Functional
- Structured Output — Generates individual markdown files and/or JSON per requirement
- Crash-Resilient — Files are written incrementally during generation (survives interruptions)
- Local LLM — Powered by Ollama (default:
qwen3:8b), no external API keys required
Prerequisites
Node.js ≥ 18
Ollama running locally with a model pulled:
ollama pull qwen3:8b
Installation
npx -y @gherk/requirements-extractorOr install globally:
npm install -g @gherk/requirements-extractorMCP Configuration
Add to your MCP client configuration (e.g., mcp-servers.json):
{
"name": "requirements-extractor",
"command": "npx",
"args": ["-y", "@gherk/requirements-extractor"],
"enabled": true
}Tools
extract_requirements
Starts an asynchronous extraction job. Returns a jobId immediately.
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| pdfPath | string | ✅ | — | Absolute path to the PDF file |
| outputDir | string | ✅ | — | Absolute path to the output directory |
| projectPrefix | string | ❌ | "ACS" | Prefix for requirement IDs (e.g., ACS_BE_01) |
| outputFormat | "markdown" \| "json" \| "both" | ❌ | "markdown" | Output format |
| filterCategories | string[] | ❌ | all | Only generate for these categories |
| model | string | ❌ | "qwen3:8b" | Ollama model to use |
| ollamaUrl | string | ❌ | "http://localhost:11434" | Ollama API URL |
Response:
{
"jobId": "a1b2c3d4",
"status": "running",
"message": "Extraction started. Use get_extraction_progress to poll for results."
}get_extraction_progress
Polls a running job for incremental results.
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| jobId | string | ✅ | — | Job ID from extract_requirements |
| since | number | ❌ | 0 | Return requirements generated after this index |
Response:
{
"jobId": "a1b2c3d4",
"status": "running",
"phase": "Pass 3: Generating markdown...",
"totalIdentified": 42,
"totalToGenerate": 38,
"done": 12,
"finished": false,
"new": [
{
"id": "ACS_BE_01",
"title": "User Authentication",
"category": "backend",
"category_label": "Backend",
"type": "Funcional",
"page_range": "5-7",
"description": "JWT-based authentication system...",
"acceptance_criteria": ["Given a valid token...", "..."]
}
]
}list_jobs
Lists all extraction jobs (running and completed). Takes no parameters.
Response:
[
{ "id": "a1b2c3d4", "status": "completed", "phase": "Done", "done": 38, "total": 38 },
{ "id": "e5f6g7h8", "status": "running", "phase": "Pass 2: Classifying...", "done": 0, "total": 0 }
]list_requirements
Lists all generated requirement files in a directory, organized by category.
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| requirementsDir | string | ✅ | Absolute path to the requirements directory |
get_requirement
Returns the full content of a specific requirement file.
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| requirementsDir | string | ✅ | Absolute path to the requirements directory |
| requirementId | string | ✅ | Requirement ID (e.g., BE_01) or filename (e.g., ACS_BE_01_auth.md) |
Pipeline Architecture
PDF → Parse (5-page chunks)
↓
Pass 1: Identify — LLM extracts raw requirements from each chunk
↓
Pass 2: Classify — LLM assigns category (backend/frontend/mobile/infra/devops/nf)
↓
Sequence — Assign IDs: {PREFIX}_{CATEGORY}_{SEQ} (e.g., ACS_BE_01)
↓
Pass 3: Generate — LLM generates structured markdown per requirement
↓
Write — Each file written immediately (crash-resilient)
↓
Summary — README.md + requirements.json in output directoryCategories
| ID | Prefix | Label | Description |
|----|--------|-------|-------------|
| backend | BE | Backend | APIs, services, business logic, integrations |
| frontend | FE | Frontend | Web interfaces, dashboards, forms, portals |
| mobile | MB | Mobile | iOS/Android apps, Bluetooth, GPS, push notifications |
| infra | IF | Infrastructure | Databases, schemas, storage, message queues, caching |
| devops | DO | DevOps | CI/CD, environments, containers, monitoring |
| non-functional | NF | Non-Functional | Security, performance, scalability, SLAs, GDPR |
Output Structure
outputDir/
├── backend/
│ ├── ACS_BE_01_user_authentication.md
│ └── ACS_BE_02_payment_api.md
├── frontend/
│ └── ACS_FE_01_dashboard.md
├── mobile/
│ └── ACS_MB_01_push_notifications.md
├── infra/
├── devops/
├── non-functional/
│ └── ACS_NF_01_gdpr_compliance.md
├── README.md # Summary with requirement counts
└── requirements.json # All requirements as structured JSONGherk Integration
When used within the Gherk platform, the MCP is called through a multi-layer async pipeline:
Frontend (ImporterComponent)
│ POST /tickets/analyze-requirements (file upload)
▼
server-go (RequirementsHandler)
│ Returns { jobId } immediately
│ Spawns goroutine →
▼
agent-go (/ai/extract-requirements)
│ Calls MCP tool: extract_requirements
│ MCP returns { jobId } → agent-go polls get_extraction_progress
│ When done, reads requirements.json from outputDir
▼
agent-go → maps MCP output to AnalyzedRequirementV2:
│ • category → side (frontend/backend/both)
│ • Adds priority, roles, acceptanceCriteria
▼
server-go → broadcasts via WebSocket:
│ • requirements_progress (phase updates)
│ • requirements_complete (final results)
│ • requirements_failed (errors)
▼
Frontend (ImporterStore) — receives results, shows previewCategory-to-Side Mapping
| MCP Category | Gherk Side |
|-------------|------------|
| frontend | frontend |
| mobile | frontend |
| backend | backend |
| infra | both |
| devops | both |
| non-functional | both |
Key Files
| File | Description |
|------|-------------|
| agent-go/extract_requirements_handler.go | Calls MCP, reads requirements.json, maps to Gherk format |
| agent-go/internal/mcp/mcp-servers.json | MCP registration (timeoutSeconds: 0) |
| server-go/.../requirements.go | Async handler, WebSocket broadcasting |
| desktop-app/.../importer.store.ts | Frontend state (survives navigation) |
Development
# Clone and install
npm install
# Build
npm run build
# Watch mode
npm run dev
# Run locally
npm startLicense
MIT
