atp-lptp-mcp-server
v0.9.0
Published
MCP server for LPTP (Logic Program Theorem Prover) — proof verification, tactic application, and definition querying
Maintainers
Readme
atp-lptp-mcp-server
An MCP (Model Context Protocol) server that exposes LPTP (Logic Program Theorem Prover) operations as tools — proof verification, tactic application, definition querying, ground representation compilation, and proof-file analysis. Any MCP-compatible client (Claude Code, Claude Desktop, etc.) can connect and use these tools to interactively construct and verify formal proofs of Prolog programs.
Prerequisites
- Node.js >= 18
- LPTP source checkout — the directory containing
src/lptp.pland thelib/hierarchy - At least one Prolog backend on PATH:
- SWI-Prolog (
swipl) — recommended, best resource-limit support - Scryer Prolog (
scryer-prolog) — ISO-compliant alternative - GNU Prolog (
bin/lptp-gnu-prolog) — pre-compiled binary with LPTP baked in
- SWI-Prolog (
- E Theorem Prover (
eprover) — optional, needed forapply_eprover - Vampire (
vampire) — optional, needed forapply_vampire
Installation
npm install atp-lptp-mcp-serverOr install from a local checkout:
cd etc/atp-lptp-mcp-server
npm install
npm run buildNote: The build step also copies the Prolog helper scripts (
src/prolog/) intobuild/prolog/.
Configuration
Claude Code
Add to your .mcp.json (project or global):
{
"mcpServers": {
"atp-lptp-mcp-server": {
"type": "stdio",
"command": "npx",
"args": ["atp-lptp-mcp-server"],
"env": {
"LPTP_ROOT_DIR": "/path/to/your/lptp/checkout"
}
}
}
}Environment Variables
| Variable | Required | Description |
|----------|----------|-------------|
| LPTP_ROOT_DIR | Yes* | Path to the LPTP checkout root (directory containing src/lptp.pl). *If the server is installed inside the LPTP tree, it can auto-detect. |
| LPTP_PROLOG_BACKEND | No | Force a specific backend: swipl, scryer, or gprolog. Default: auto-detect in that order. |
Available Tools
Meta
| Tool | Description |
|------|-------------|
| get_lptp_grammar | Returns LPTP grammar summary for LLM reasoning. Returns all sections by default, or a specific named section. |
Proof Verification
| Tool | Description |
|------|-------------|
| verify_lemma | Validates a snippet of LPTP proof code (lemma/3, theorem/3, etc.) by running it through SWI-Prolog. Supports injecting external context or axioms. |
| check_proof | Runs LPTP's native check/1 on an existing .pr proof file on disk. Returns all verification errors or confirms the proof is valid. |
Tactics
| Tool | Description |
|------|-------------|
| apply_tactic | Applies any LPTP tactic (e.g. [ind], [auto(5)], completion) to a formula and returns the expanded derivation tree. |
| tactic_auto | Attempts automatic proof with configurable search depth and resource limits (inference_limit, time_limit_seconds). |
| tactic_case | Case splitting on disjunctions or pattern matches. |
| tactic_comp | Completion formula (fixed-point semantics of program clauses). |
| tactic_elim | Predicate definition elimination. |
| tactic_ex | Existential quantifier instantiation. |
| tactic_fact | Apply a known fact (lemma, theorem, corollary, or axiom). |
| tactic_ind | Structural induction on an inductively defined predicate. |
| tactic_indqf | Quantifier-free induction (induction without universally quantified hypothesis). |
| tactic_tot | Totality (termination) case splitting. |
| tactic_unfold | Unfold an abbreviation or definition. |
| tactic_debug | Show available formulas and assumptions in the current proof context. |
Query
| Tool | Description |
|------|-------------|
| print_definition | Print the completion definition of a predicate — IND(P) fixed-point axioms (success, failure, termination) generated from program clauses. |
| list_facts | List all known theorems, lemmas, and corollaries about a predicate. |
Compilation & Analysis
| Tool | Description |
|------|-------------|
| compile_gr | Compile a .pl Prolog program into its LPTP ground representation (.gr). Accepts LPTP alias paths (e.g. $(examples)/gen_99p/list_ops) or filesystem paths; omit the .pl extension. |
| count_pr_directives | Count LPTP directives in a .pr proof file using read_term/3 with strict functor+arity matching. Returns counts for all 13 directive types (preamble + body declarations) as a JSON object and a markdown table. All keys are always present (zero-filled). |
Directives counted by count_pr_directives
| Category | Directive | JSON key |
|----------|-----------|----------|
| Preamble | needs_gr/1 | needs_gr |
| Preamble | needs_thm/1 | needs_thm |
| Preamble | compile_gr/1 | compile_gr |
| Preamble | thm_file/1 | thm_file |
| Preamble | tex_file/1 | tex_file |
| Preamble | initialize/0 | initialize |
| Preamble | bye/1 | bye |
| Body | definition_pred/3 | def_pred |
| Body | definition_fun/5 | def_fun |
| Body | axiom/2 | axiom |
| Body | lemma/3 | lemma |
| Body | corollary/3 | corollary |
| Body | theorem/3 | theorem |
FOF Compilation & ATP Solvers
| Tool | Description |
|------|-------------|
| build_fof | Compiles a _reg_atp.pr proof file to TPTP First-Order Form (.fof) axioms and conjectures via src/indp.sh (IND(P) translation). Generates one .fof file per lemma/theorem/corollary. Requires the -all logic program variant (make build). Output is suitable for Vampire and E-Prover. |
| apply_eprover | Runs E Theorem Prover (--auto-schedule=8, --definitional-cnf, --delete-bad-limit) on a single .fof conjecture file. Returns PROVED / NOT PROVED and full prover output. Requires eprover on PATH. |
| apply_vampire | Runs Vampire (CASC mode: --mode casc, -m 16384, --cores 7) on a single .fof conjecture file. Returns PROVED / NOT PROVED and full prover output. Requires vampire on PATH. |
Proof Synthesis
| Tool | Description |
|------|-------------|
| format_proof | Validates a .pr proof file via LPTP's check/1, then pretty-prints each proof term via prt__write/1 at 76-char width. Fails fast with a diagnostic if the proof is invalid — never formats an invalid proof. Returns canonical LPTP-formatted proof text. |
| generate_groundness_proof | 3-phase groundness proof pipeline: (1) infer groundness invariants via abstract interpretation, (2) construct the LPTP proof, (3) certify via check/1. Returns HAL-04859406 benchmark metrics: Prog · Prop · Vars · Inf.ms · Deriv.ms · Cert.ms. |
| generate_regular_approximation_proof | Regular type approximation proof synthesis via cGA_LPTP / ABS_INT_MODE=prove_regular_types. Certifies the proof via check/1. Returns Prog · Prop · Types · Inf.(μs) · Checks(ms) · Deriv.(ms) · Cert.(ms) · LOC. At most 2 calls run concurrently. |
| generate_regular_approximation_proof_with_goal | Call-answer (magic-set) transformation from a top goal followed by regular type proof synthesis at increased precision (L1/L2 vs L0). Applies ABS_INT_MODE=prove_regular_types_with_goal via cGA_LPTP. Generates <stem>_ca.pl (magic-set-transformed program), <stem>_reg_ca.pl (bottom-up fixpoint), then synthesises and certifies <stem>_reg_ca_formatted.pr via check/1. Returns Prog · Goal · Prop · Types · Inf.(μs) · Checks(ms) · Deriv.(ms) · Cert.(ms) · LOC. At most 2 synthesis calls run concurrently. |
Build
| Tool | Description |
|------|-------------|
| build_lptp | Rebuild src/lptp.pl from source files (including bi_extended.pl) by running make test-library-with-swi-prolog. Use after editing src/bi_extended.pl to regenerate the concatenated LPTP source and verify all library proofs still pass. |
Prolog Backend Selection
The server auto-detects backends in this order:
- SWI-Prolog (
swipl) — full resource limits (time, inference, depth) - Scryer Prolog (
scryer-prolog) — inference limits only, Node.js timeout for time - GNU Prolog (
bin/lptp-gnu-prolog) — Node.js timeout only
Override with LPTP_PROLOG_BACKEND=swipl (or scryer, gprolog).
Testing
Tests are written with Jest + ts-jest and follow the RED → GREEN → REFACTOR cycle enforced by the /tdd-right-away skill.
Run all tests
cd etc/atp-lptp-mcp-server
npm testSet LPTP_ROOT_DIR to your LPTP checkout root for tests that exercise real proof files:
LPTP_ROOT_DIR=/path/to/lptp/checkout npm testTest suites
grammarSections.test.ts — Grammar unit tests (no dependencies)
Validates the GRAMMAR_SECTIONS map and helper functions (getGrammarSection, getFullGrammar).
All tests run without LPTP_ROOT_DIR.
| Test group | What it covers |
|---|---|
| GRAMMAR_SECTIONS | 10 sections exported, all keys present, each a non-empty string |
| getGrammarSection | Returns named section; undefined for unknown key |
| getFullGrammar | All 10 section headers present in concatenated output |
| Section content — terms | Variables (?name), compound terms, list syntax, operators |
| Section content — goals | true, fail, equations, negation, conjunction, disjunction |
| Section content — formulas | succeeds/fails/terminates, quantifiers, <=>, gr(term) |
| Section content — internal_repr | $(name), tag expressions, @(tag,…), S/F/T functions |
| Section content — derivation_steps | All 8 step types: assume, cases, exist, induction, contra, indirect |
| Section content — by_tags | All 13 tags: theorem, lemma, corollary, axiom, completion, sld, … |
| Section content — inference_rules | Identity, modus ponens (44.8 %), unification, inconsistency, totality |
| Section content — tactics | All tactic names: auto(n), case, comp, elim, ind, indqf, tot, unfold, debug |
| Section content — precedences | Full operator table (100–980, xfy, yfx, …) |
| Section content — definitions | definition_pred, definition_fun, axiom, existence, uniqueness |
| Integration | getFullGrammar backward-compatible with old terms section content |
mcp_lptp_server.test.ts — Executor integration tests
Requires LPTP_ROOT_DIR for proof-file tests; skips gracefully when files are absent.
| Test group | Key assertions |
|---|---|
| verifyLemma | Valid lemma → success: true, output contains 'parsed and verified successfully' |
| checkProof | Returns defined string output for an existing .pr file |
| applyTactic — auto(N) | success: true, derivation defined |
| applyTactic — ind | derivation contains 'induction(' |
| applyTactic — case | derivation contains 'by gap' |
| applyTactic — elim / fact / tot / unfold / ex / comp / indqf | success: true for each |
| countPrDirectives — 13 keys zero-filled | All 13 keys present; bye: 1 for :- bye(done). |
| countPrDirectives — strict arity | lemma/3 counted; lemma/2 ignored |
| countPrDirectives — markdown table | All 13 column headers present (ngr, lem, thm_b, …) |
| countPrDirectives — missing file | Rejects with "File not found" before invoking Prolog |
| AtpSemaphore | At most 2 concurrent ATP calls; 5 parallel applyEProver calls all return strings |
| buildFof | Returns string output for a valid stem path |
| applyEProver — no binary | Returns string output with install hint |
| applyEProver — real .fof | Output contains 'PROVED' or 'NOT PROVED' |
| applyVampire — no binary | Returns string output |
| applyVampire — real .fof | Output contains 'PROVED' or 'NOT PROVED' |
| formatProof — valid file | Non-empty string output |
| formatProof — invalid proof | success: false, output contains 'proof validation failed' |
| formatProof — missing file | success: false, output contains 'not found' |
| generateGroundnessProof — fields | All 9 result fields present for any input |
| generateGroundnessProof — apprev.pl | success: true, prop_count > 0, all timing fields ≥ 0, proof contains lemma(, valid: true |
| generateGroundnessProof — missing file | success: false, output contains 'not found' |
| generateRegularApproxProof — fields | All 13 RegularApproxProofResult keys present |
| generateRegularApproxProof — nat.pl | success: true, all metric fields populated, proof contains lemma(, valid: true |
| generateRegularApproxProof — missing file | success: false, output contains 'not found' |
proof_repairs.test.ts — TDD proof repair suite
Validates that all repaired .pr files pass check/1 with zero errors and zero warnings.
Requires LPTP_ROOT_DIR. Run a single group to target one repair category:
# Category B — duplicate directives (6 files, 60 s timeout each)
npx jest proof_repairs -t "Category B"
# Category B+C — duplicates + predicate mismatches (3 files)
npx jest proof_repairs -t "Category B\+C"
# Category C — untransposed source predicates (8 files)
npx jest proof_repairs -t "Category C"
# Category A — definition_fun iterative repair (10 files, 120 s timeout each)
npx jest proof_repairs -t "Category A"| Category | Files | Root cause |
|---|---|---|
| A — definition_fun repair | ancestry_bidirectional_lineage.pr, dna_codon_offset.pr, dna_terminal.pr, heaps_heap_priority.pr, heaps_min_path.pr, heaps_mirror_heap.pr, org_chart_reporting_level.pr, polynomial_arithmetic_poly_coefficient.pr, organizational_hierarchies_hierarchical_level.pr, supply_chain_material_priority.pr | definition_fun existence/uniqueness axiom mismatch |
| B — duplicate directives | heaps.pr, heaps_reheapify.pr, dna_complement_flip.pr, org_chart.pr, org_chart_org_re.pr, org_chart_subordinate_chain.pr | Duplicate definition_pred/definition_fun entries |
| B+C — duplicates + mismatches | heaps_heap_collection.pr, heaps_heapify.pr, org_chart_reorganization.pr | Both duplicate directives and untransposed predicates |
| C — predicate mismatch | dna_re.pr, dna_sequence.pr, dna_recombine.pr, org_chart_sub.pr, organizational_hierarchies_rank_by_seniority.pr, polynomial_arithmetic_polynomial_term.pr, supply_chain_optimize_flow.pr, tournament_brackets_arrange.pr | Source-domain predicates (integer, =<) not transposed |
TDD workflow with /tdd-right-away
Each repair cycle follows RED → GREEN → REFACTOR:
- RED — Run the failing test to confirm the error:
cd etc/atp-lptp-mcp-server LPTP_ROOT_DIR=../.. npx jest proof_repairs -t "<file stem>" --no-coverage - GREEN — Edit the
.prfile to remove duplicates or replace source predicates untilcheck/1returnso.k. - REFACTOR — Re-run the full suite to confirm no regressions:
LPTP_ROOT_DIR=../.. npm test
License
MIT — see LICENSE. The LPTP Prolog source (src/lptp.pl) has its own license (R.F. Staerk, 1996).
