thefactory-tools
v0.1.8
Published
Tools runtime and utilities for TheFactory agents, including standardized tool interfaces, schema generation, code analysis via Tree-sitter, and supporting utilities.
Downloads
231
Maintainers
Readme
Multi-Agent Development Workflow
This repository implements a multi-agent development workflow with tools for reading/writing files, managing stories and features, running tests, and exposing these as chat-callable tools.
- Source code lives under src/
- Stories and artifacts live under .stories/
- Documentation lives under docs/
- Local LLM benchmark prompt lives at
docs/local-llm-benchmark/prompt.md, with verifier metadata underscripts/llm-benchmark/
See docs/FILE_ORGANISATION.md for a complete overview of the project structure and tool composition. For architecture and coding practices, see docs/CODE_STANDARD.md.
Getting Started
- Install dependencies:
npm install - Build:
npm run build - Test:
npm test
Testing
High-quality, comprehensive tests are vital to this project. Aim for close to 100% coverage where practical. Tests should validate inputs/outputs, cover edge cases, and never force meaningless code changes just to pass.
Three test buckets, each with its own command:
npm test— unit tests (*.test.ts). Pure, fast, deterministic. No external services, no filesystem outside temp dirs, no real parsers. Costs nothing.npm run test:integration— integration tests (*.integration.test.ts). Multi-module, real-fs, real-parser regression tests with no external services or credentials (today: tree-sitter regression coverage undersrc/codeIntel/). Slower than unit but deterministic. Costs nothing.npm run test:live— live tests (*.live.test.ts). Exercise real external services and credentials (e.g. real Claude Code through the sandbox). Requires Docker + the sandbox images + cached OAuth credentials, and costs real Claude API calls. See "Sandbox smoke tests" below for the same prereqs.- See docs/TESTING.md for full testing guidance, patterns, and standards.
- Refer to docs/CODE_STANDARD.md for coding standards that extend to test code as well.
Sandbox smoke tests
A separate sandbox-isolation + MCP-bridge integration suite that exercises real Docker containers, real Claude Code, and the full Phase 2 stack. Slower than the unit suite (each smoke spins up containers); kept out of npm test and behind its own command.
Prereqs:
- Docker Desktop (or Docker Engine) running.
- A Claude.ai subscription (Pro / Max) for the MCP smokes — the sandbox uses OAuth, not an API key.
One-time setup (caches OAuth credentials for ~90 days):
npm run sandbox:login-claude-codeThen run all sandbox smokes:
npm run test:smoke:sandboxOr filter to a subset by substring:
npm run test:smoke:sandbox -- mcp # only smokes whose name matches "mcp"
npm run test:smoke:sandbox -- fake-cli # only the sandbox-primitive smokeEach smoke auto-builds any images it needs on first use. For how the pieces fit together (sandbox, MCP bridge, action broker, CLI agent runner) see docs/architecture/SANDBOX_AND_CLI_AGENTS.md.
Contributing
- Review docs/CODE_STANDARD.md before contributing; keep your changes consistent with the established architecture and standards.
- Follow the established patterns for adding new toolsets (create a factory under src/tools, export functions, and compose them in src/tools/tools.ts).
- If exposing a tool to chat, add a descriptor to src/tools/chatTools.ts and update the dispatcher. Ensure data entering/leaving endpoints is validated and add tests accordingly.
Local LLM benchmark verification
This repository includes a single-doc local LLM benchmark flow.
Operator workflow
- Reset the repository to the intended benchmark baseline.
- Give the model exactly one document:
docs/local-llm-benchmark/prompt.md
- Let the model complete all tasks from that single prompt, use
runTestswithoptions.configPath: "vitest.benchmark.config.ts"for benchmark fixture tests, and write result artifacts under.benchmark-results/. - Run the aggregate verifier:
npm run llm-benchmark-verifyBenchmark layout
- Model-facing prompt:
docs/local-llm-benchmark/prompt.md - Machine-only benchmark metadata:
scripts/llm-benchmark/config.json - Aggregate verifier:
scripts/llm-benchmark/verify.mjs
What the aggregate verifier checks
The aggregate verifier is authoritative.
It checks:
- every required task result file exists under
.benchmark-results/ - every task result file is valid JSON
- each report's
taskIdmatches its configured task - each report contains the configured required fields
- each report's
resultvalue is one ofdone,partial, orblocked - each report is written to the configured result path
- benchmark harness files were not modified
- no unexpected repository files were changed outside task allowances
- each task's claimed changed files match observed repository changes for that task
- each task's task-specific benchmark validation passes under
vitest.benchmark.config.ts - each task's claimed
runTestsrecords include the required benchmark-config tool call when one is configured
Important note
For the verification result to be trustworthy:
- start from a clean, known benchmark baseline
- run the verifier before making unrelated changes
- trust the aggregate verifier output over the model's self-reported success
License
MIT (or project-specific license)
