@gherk/requirements-extractor

v1.2.0

Published

12 days ago

MCP server that extracts, classifies and generates structured requirements from PDF documents using Ollama LLM

0High
0Medium
0Low

@gherk/requirements-extractor

An MCP (Model Context Protocol) server that extracts, classifies, and generates structured requirement documents from PDF files using a local Ollama LLM.

Features

3-Pass AI Pipeline — Identify → Classify → Generate requirements from any PDF
Async Processing — Returns a job ID immediately; poll for incremental results
6 Categories — Backend, Frontend, Mobile, Infrastructure, DevOps, Non-Functional
Structured Output — Generates individual markdown files and/or JSON per requirement
Crash-Resilient — Files are written incrementally during generation (survives interruptions)
Local LLM — Powered by Ollama (default: qwen3:8b), no external API keys required

Prerequisites

Node.js ≥ 18
Ollama running locally with a model pulled:
```
ollama pull qwen3:8b
```

Installation

npx -y @gherk/requirements-extractor

Or install globally:

npm install -g @gherk/requirements-extractor

MCP Configuration

Add to your MCP client configuration (e.g., mcp-servers.json):

{
  "name": "requirements-extractor",
  "command": "npx",
  "args": ["-y", "@gherk/requirements-extractor"],
  "enabled": true
}

Tools

`extract_requirements`

Starts an asynchronous extraction job. Returns a jobId immediately.

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | pdfPath | string | ✅ | — | Absolute path to the PDF file | | outputDir | string | ✅ | — | Absolute path to the output directory | | projectPrefix | string | ❌ | "ACS" | Prefix for requirement IDs (e.g., ACS_BE_01) | | outputFormat | "markdown" \| "json" \| "both" | ❌ | "markdown" | Output format | | filterCategories | string[] | ❌ | all | Only generate for these categories | | model | string | ❌ | "qwen3:8b" | Ollama model to use | | ollamaUrl | string | ❌ | "http://localhost:11434" | Ollama API URL |

Response:

{
  "jobId": "a1b2c3d4",
  "status": "running",
  "message": "Extraction started. Use get_extraction_progress to poll for results."
}

`get_extraction_progress`

Polls a running job for incremental results.

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | jobId | string | ✅ | — | Job ID from extract_requirements | | since | number | ❌ | 0 | Return requirements generated after this index |

Response:

{
  "jobId": "a1b2c3d4",
  "status": "running",
  "phase": "Pass 3: Generating markdown...",
  "totalIdentified": 42,
  "totalToGenerate": 38,
  "done": 12,
  "finished": false,
  "new": [
    {
      "id": "ACS_BE_01",
      "title": "User Authentication",
      "category": "backend",
      "category_label": "Backend",
      "type": "Funcional",
      "page_range": "5-7",
      "description": "JWT-based authentication system...",
      "acceptance_criteria": ["Given a valid token...", "..."]
    }
  ]
}

`list_jobs`

Lists all extraction jobs (running and completed). Takes no parameters.

Response:

[
  { "id": "a1b2c3d4", "status": "completed", "phase": "Done", "done": 38, "total": 38 },
  { "id": "e5f6g7h8", "status": "running", "phase": "Pass 2: Classifying...", "done": 0, "total": 0 }
]

`list_requirements`

Lists all generated requirement files in a directory, organized by category.

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | requirementsDir | string | ✅ | Absolute path to the requirements directory |

`get_requirement`

Returns the full content of a specific requirement file.

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | requirementsDir | string | ✅ | Absolute path to the requirements directory | | requirementId | string | ✅ | Requirement ID (e.g., BE_01) or filename (e.g., ACS_BE_01_auth.md) |

Pipeline Architecture

PDF → Parse (5-page chunks)
  ↓
Pass 1: Identify — LLM extracts raw requirements from each chunk
  ↓
Pass 2: Classify — LLM assigns category (backend/frontend/mobile/infra/devops/nf)
  ↓
Sequence — Assign IDs: {PREFIX}_{CATEGORY}_{SEQ} (e.g., ACS_BE_01)
  ↓
Pass 3: Generate — LLM generates structured markdown per requirement
  ↓
Write — Each file written immediately (crash-resilient)
  ↓
Summary — README.md + requirements.json in output directory

Output Structure

outputDir/
├── backend/
│   ├── ACS_BE_01_user_authentication.md
│   └── ACS_BE_02_payment_api.md
├── frontend/
│   └── ACS_FE_01_dashboard.md
├── mobile/
│   └── ACS_MB_01_push_notifications.md
├── infra/
├── devops/
├── non-functional/
│   └── ACS_NF_01_gdpr_compliance.md
├── README.md          # Summary with requirement counts
└── requirements.json  # All requirements as structured JSON

Gherk Integration

When used within the Gherk platform, the MCP is called through a multi-layer async pipeline:

Frontend (ImporterComponent)
  │  POST /tickets/analyze-requirements (file upload)
  ▼
server-go (RequirementsHandler)
  │  Returns { jobId } immediately
  │  Spawns goroutine →
  ▼
agent-go (/ai/extract-requirements)
  │  Calls MCP tool: extract_requirements
  │  MCP returns { jobId } → agent-go polls get_extraction_progress
  │  When done, reads requirements.json from outputDir
  ▼
agent-go → maps MCP output to AnalyzedRequirementV2:
  │  • category → side (frontend/backend/both)
  │  • Adds priority, roles, acceptanceCriteria
  ▼
server-go → broadcasts via WebSocket:
  │  • requirements_progress (phase updates)
  │  • requirements_complete (final results)
  │  • requirements_failed (errors)
  ▼
Frontend (ImporterStore) — receives results, shows preview

Category-to-Side Mapping

| MCP Category | Gherk Side | |-------------|------------| | frontend | frontend | | mobile | frontend | | backend | backend | | infra | both | | devops | both | | non-functional | both |

Key Files

| File | Description | |------|-------------| | agent-go/extract_requirements_handler.go | Calls MCP, reads requirements.json, maps to Gherk format | | agent-go/internal/mcp/mcp-servers.json | MCP registration (timeoutSeconds: 0) | | server-go/.../requirements.go | Async handler, WebSocket broadcasting | | desktop-app/.../importer.store.ts | Frontend state (survives navigation) |

Development

# Clone and install
npm install

# Build
npm run build

# Watch mode
npm run dev

# Run locally
npm start

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@gherk/requirements-extractor

Features

Prerequisites

Installation

MCP Configuration

Tools

extract_requirements

get_extraction_progress

list_jobs

list_requirements

get_requirement

Pipeline Architecture

Categories

Output Structure

Gherk Integration

Category-to-Side Mapping

Key Files

Development

License

`extract_requirements`

`get_extraction_progress`

`list_jobs`

`list_requirements`

`get_requirement`