pi-autocontext-erdos

v0.1.8

Published

9 hours ago

Pi package for AutoContext-assisted Erdős-style formal math problem loops with Lean verification.

0High
0Medium
0Low

jayscambler

pi-package autocontext erdos lean formal-math proof-search

pi-autocontext-erdos

Pi package for AutoContext-assisted Erdős-style formal math problem loops with Lean verification.

Latest published package: [email protected]. Release evidence is recorded in docs/RELEASE_0.1.7.md. Main is preparing 0.1.8 with a bounded Erdős–Straus certificate-search benchmark.

This package is an orchestration layer. It does not replace Lean and does not claim to prove open problems. It helps structure a math exploration loop: problem intake, formalization, lemma planning, proof attempts, Lean-backed verification, and reporting.

Relationship to `pi-autocontext-lean-verify`

Use this package for the problem loop and research workflow:

pi-autocontext-erdos
  = problem intake + formalization + lemma ladder + attempt logs + reports

Use pi-autocontext-lean-verify as the verifier/proof-repair backend:

pi-autocontext-lean-verify
  = Lean oracle + fixed-template proof-body repair + guardrail checks

Install both when you want proof-repair automation:

pi install npm:[email protected]
pi install git:github.com/greyhaven-ai/pi-autocontext-erdos

From a local checkout:

pi install ./pi-autocontext-erdos

For local development, from this checkout:

npm install
npm run validate
npm run pack:dry-run
pi install ./pi-autocontext-erdos --local

Provided resources

Extension tool: autocontext_erdos
Skill: erdos-math-loop
Prompt template: erdos-math-loop.md
Domain notes: docs/DOMAIN.md
Validation notes: docs/VALIDATION.md
Lake/Mathlib verifier notes: docs/LAKE_MATHLIB_SUPPORT.md
Mathlib benchmark evidence: docs/MATHLIB_BENCHMARK.md
Final theorem audit guardrails: docs/THEOREM_AUDIT.md
Lemma graph orchestration: docs/LEMMA_GRAPH.md
Erdős–Straus parametric family benchmark: docs/ERDOS_STRAUS_PARAMETRIC_FAMILIES.md
Erdős–Straus bounded certificate-search benchmark: docs/ERDOS_STRAUS_BOUNDED_SEARCH.md
Example runs:
- examples/erdos-ginzburg-ziv-n2
- examples/erdos-ginzburg-ziv-n3
- examples/schur-two-color-n5
- examples/schur-two-color-n5-lemma-graph
- examples/mathlib-divisibility-prime
- examples/erdos-straus-parametric-families
- examples/erdos-straus-bounded-search

Example validation

The checked examples include the n = 2 and n = 3 finite instances of the Erdős–Ginzburg–Ziv theorem over natural numbers, plus a full-pipeline finite Schur two-color n = 5 benchmark. The n = 2 EGZ example proves:

theorem egz_two_nat (a b c : Nat) :
    (a + b) % 2 = 0 ∨ (a + c) % 2 = 0 ∨ (b + c) % 2 = 0

The harder EGZ n = 3 example proves that among any five natural numbers, one of the ten triples has sum divisible by 3. The AutoContext repair smoke in docs/EGZ_N3_AUTOCONTEXT_REPAIR.md deliberately breaks that proof, repairs it back to a Lean-verified candidate with autocontext_erdos, and the compact repaired proof is now the canonical example.

The Schur n = 5 example proves the finite two-color statement that every coloring of 1..5 contains a monochromatic x + y = z triple. It exercises direct Lean verification, audit_theorems, fixed-template export, and pi-autocontext-lean-verify handoff; see docs/SCHUR_N5_FULL_PIPELINE.md.

The Mathlib divisibility-prime benchmark imports Mathlib in a Lake project and proves a scoped Euclid-lemma API theorem using Nat.Prime.dvd_mul; see docs/MATHLIB_BENCHMARK.md.

The Erdős–Straus parametric-family example verifies only the elementary even-denominator family as a rational identity over ℚ; it is explicitly not a proof of the full conjecture. It exercises Lake/Mathlib verification, lemma graph frontier tracking, audit_theorems, fixed-template export, and graph result recording; see docs/ERDOS_STRAUS_PARAMETRIC_FAMILIES.md.

The Erdős–Straus bounded-search example generates exact certificates for 2 ≤ n ≤ 100, then verifies the emitted finite theorem in Lean/Mathlib. The JavaScript search is only a candidate generator; Lean remains the correctness oracle. See docs/ERDOS_STRAUS_BOUNDED_SEARCH.md.

Validate the package and examples with:

npm run validate
npm run typecheck
npm run validate:extension
npm run validate:example
npm run validate:lake-example

AutoContext-backed tool

The package registers one consolidated Pi tool:

autocontext_erdos

Actions:

preflight: check Lean and uvx autoctx improve availability.
init_problem: create the run artifact layout.
verify_file: run Lean on a formalization and write attempt artifacts.
autocontext_improve: invoke autoctx improve with a Lean verifier command and save prompt/rubric/logs/best candidate/verdict.
export_fixed_template: freeze a named theorem into a Theorem.template.lean fixture with a {{PROOF}} hole for pi-autocontext-lean-verify proof-body repair.
audit_theorems: final-artifact audit that verifies the Lean file, runs #print axioms, checks forbidden constructs, compares imports to an optional baseline, and records environment/toolchain hashes.
lemma_graph: initialize, validate, summarize, advance, and record results for dependency graphs under lemmas/lemma_graph.json.
summarize_run: summarize files and attempt verdicts under a run root.

The AutoContext action uses direct Lean verification by default:

uvx --python 3.12 --from autocontext==0.5.1 autoctx improve ... --verify-cmd "<lean> {file}"

For Mathlib or local Lake-project imports, pass verificationMode="lake" and projectRoot; verification and AutoContext rechecks then use lake env lean <file>. init_problem can scaffold a Mathlib-aware Lake project with withMathlib=true. See docs/LAKE_MATHLIB_SUPPORT.md.

For whole-file repair, pass preserveTheorems to require exact normalized theorem-statement preservation before a Lean-verified candidate is accepted:

{
  "action": "autocontext_improve",
  "preserveTheorems": ["mod_three_eq_zero_or_one_or_two", "egz_three_nat"]
}

If Lean accepts a candidate but a preserved theorem statement changes, the verdict is theorem_statement_changed with Accepted: false.

For the compact EGZ n = 3 repair loop, current validation recommends timeoutSeconds=240; timeoutSeconds=120 is useful but budget-limited for this task.

When a theorem statement is ready to freeze, use export_fixed_template to hand off proof-body-only repair to pi-autocontext-lean-verify; see docs/LEAN_VERIFY_HANDOFF.md.

Before treating a final theorem as a serious result, run audit_theorems; see docs/THEOREM_AUDIT.md.

MVP workflow

A typical run creates a durable folder such as:

runs/<problem-slug>/
  problem.md
  source_notes.md
  formalization/
    Formalization.lean
    formalization_notes.md
  lemmas/
    lemma_plan.md
  attempts/
    0001/
      prompt.md
      candidate.lean
      lean.stderr.txt
      verdict.md
  reports/
    final_report.md

Every mathematical claim in the final report must be labeled as one of:

Lean verified;
formalization draft;
conjecture;
heuristic/proof sketch;
failed attempt.

Guardrail posture

Never count a proof unless Lean verifies the exact theorem or unchanged template. Do not use sorry, admit, new axioms, unsafe, theorem weakening, proof-by-importing-answer, or hidden assumptions as success criteria.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

pi-autocontext-erdos

Relationship to pi-autocontext-lean-verify

Provided resources

Example validation

AutoContext-backed tool

MVP workflow

Guardrail posture

Relationship to `pi-autocontext-lean-verify`