@fastrag/graph
v0.1.0
Published
TypeScript SDK for enterprise knowledge-graph RAG over Markdown and chunk-based knowledge bases.
Maintainers
Readme
@fastrag/graph
面向企业知识图谱 RAG 的 TypeScript SDK,支持 Markdown 和外部切块知识库。
概览
当前包仍处于 pre-1.0 阶段。它提供稳定的 SDK 合约、Markdown/切块 ingest、dry-run extraction、provider-neutral LLM extractor templates、schema-aware extractor quality controls、evidence grounding helper、extraction budget helper、graph construction report、官方 Neo4j graph store、带证据级权限过滤的 search、provider-neutral 外部召回 adapter、只读维护 helper、确定性 retrieval evaluation、LLM retry/cache 工具、structured diagnostics,以及 ingest quality gate 评估。
它不会生成最终自然语言答案。Search 返回带 citation 的 hydrated evidence context;答案生成、Neo4j vector retrieval、cross-encoder reranking、hosted observability products、chunk 级局部 mutation 和后台任务平台不在当前 SDK 能力面内。
要求
- Node.js >= 20
- pnpm >= 10
- TypeScript strict mode
- ES Modules
安装
pnpm add @fastrag/graphPublic Subpaths
@fastrag/graph@fastrag/graph/schema@fastrag/graph/extractors@fastrag/graph/document@fastrag/graph/retrieval@fastrag/graph/llm@fastrag/graph/diagnostics@fastrag/graph/evaluation@fastrag/graph/neo4j
快速开始
import { createGraphRag } from '@fastrag/graph';
import { createLocalDocumentStore } from '@fastrag/graph/document';
import { createSchemaRegistry } from '@fastrag/graph/schema';
const rag = createGraphRag({
tenantId: 'default',
graphStore,
schema: createSchemaRegistry(),
extractor,
documentStore: createLocalDocumentStore(),
});
await rag.ingestMarkdown({
documentId: 'policy',
source: 'policy.md',
markdown: '# Policy\n\nRotate passwords.',
});单租户使用 tenantId: 'default'。多租户场景可以在每次调用时覆盖 tenantId。
自定义 Extractor
import { buildLlmCacheKey, withRetry } from '@fastrag/graph/llm';
import type { KnowledgeExtractor } from '@fastrag/graph/extractors';
const extractor: KnowledgeExtractor = {
async extract(input) {
const cacheKey = buildLlmCacheKey({
model: 'model-a',
schemaVersion: input.schema.version,
promptVersion: 'relations-v1',
chunkChecksum: input.document.checksum,
payload: input.chunks.map((chunk) => chunk.content),
});
return withRetry(
async () => runSchemaGuidedExtraction(input, cacheKey),
{ retries: 2, minTimeoutMs: 250, factor: 2 },
);
},
};Extractor 返回 entities、relations 和可选 diagnostics。SDK 会在 graph 写入前根据 SchemaRegistry 和当前 chunk evidence references 校验 extractor 输出。无效 entities 和 relations 会进入 diagnostics,并从 dry-run、extract-only 和 graph write 输出中移除。
LLM-based extraction 推荐从 provider-neutral helper 开始:
import { createLlmKnowledgeExtractor } from '@fastrag/graph/extractors';
const extractor = createLlmKnowledgeExtractor({
model: 'your-model-name',
budget: {
maxEntities: 40,
maxRelations: 80,
maxEvidencePerRelation: 2,
maxQuoteChars: 160,
},
async generateJson(prompt) {
const response = await callYourModelOutsideTheSdk({
system: prompt.system,
user: prompt.user,
responseFormat: prompt.responseFormat,
signal: prompt.signal,
});
return {
json: response.json,
tokenUsage: response.tokenUsage,
};
},
});SDK 负责 prompt template、JSON parser、cache/retry wiring、可选 extraction budget enforcement 和 diagnostics;不负责 credentials、provider client、answer generation、embedding 或 agent-loop policy。高级调用方可以在 generateJson() 内部实现两步抽取、agent loop、JSON repair 或 provider structured output,只要最后返回 JSON object。
Neo4j 设置
import { neo4jGraphStore } from '@fastrag/graph/neo4j';
const graphStore = neo4jGraphStore({
uri: process.env.NEO4J_URI!,
user: process.env.NEO4J_USER!,
password: process.env.NEO4J_PASSWORD!,
...(process.env.NEO4J_DATABASE ? { database: process.env.NEO4J_DATABASE } : {}),
});
await graphStore.initialize();Neo4j 凭据必须来自环境变量。ingest 或 search 前先调用 initialize(),确保 label、constraint 和 index 已创建。
Search 和权限
const result = await rag.search({
query: 'Which systems are affected by the password policy?',
accessContext: {
allowedDocumentIds: ['policy'],
accessTags: ['dept:security'],
},
recipe: 'hybridGraph',
limit: 10,
});
console.log(result.answerContext);
console.log(result.citations);SDK 支持 Neo4j fulltext retrieval、bounded graph traversal,以及通过 createGraphRag({ retrievers }) 接入外部 retriever。所有候选证据都会经过最终证据级权限过滤再 hydration。无 tag 的 evidence 默认公开;有 tag 的 evidence 需要匹配 accessContext.accessTags;allowedDocumentIds: [] 表示拒绝所有文档。
外部召回互操作
import { createExternalEvidenceCandidates, type Retriever } from '@fastrag/graph/retrieval';
const milvusRetriever: Retriever = {
name: 'milvus',
async retrieve(input) {
const hits = await searchMilvusOutsideTheSdk(input.query);
return {
candidates: createExternalEvidenceCandidates(
hits.map((hit) => ({
documentId: hit.documentId,
versionId: hit.versionId,
chunkId: hit.chunkId,
source: 'milvus',
sourceSystem: 'milvus',
retrievalMode: 'vector',
score: hit.score,
scoreKind: 'similarity',
})),
),
};
},
};外部 vector、fulltext、hybrid、rerank 或业务召回系统必须返回能对齐 SDK 已 ingest chunk 的 evidence identity。SDK 仍会执行最终权限过滤和 DocumentStore hydration。SDK 不负责 embedding 生成、vector index 或 provider SDK client。
维护操作和一致性
import { checkSearchResultConsistency } from '@fastrag/graph/diagnostics';
import { collectExternalIndexPayload } from '@fastrag/graph/document';
await rag.deleteDocument({ documentId: 'policy' });
const payload = await collectExternalIndexPayload({
documentStore,
tenantId: 'default',
documentId: 'policy',
source: 'policy.md',
});
const result = await rag.search({ query: 'policy approval' });
const issues = checkSearchResultConsistency(result);deleteDocument() 先删除 graph data,再删除 document-store chunks。graph 删除失败时不会继续删除 document store。collectExternalIndexPayload() 只读取 chunks,返回同步 payload 和 chunksRead stats,不调用外部索引。checkSearchResultConsistency() 会报告 candidate identity 缺失和 hydration miss,不包含 raw chunk content。
Accuracy、Diagnostics 和 Quality Gates
import { addTokenUsage, evaluateIngestQuality, mergeIngestDiagnostics } from '@fastrag/graph';
import { createGraphConstructionReport } from '@fastrag/graph/diagnostics';
import { validateEvidenceGrounding } from '@fastrag/graph/extractors';
const quality = evaluateIngestQuality({
relations,
policy: { minRelationConfidence: 0.5, requireEvidenceForRelations: true },
});
const searchResult = await rag.search({ query: 'policy' });
console.log(searchResult.diagnostics.retrievers);
console.log(searchResult.diagnostics.hydrationMisses);
const ingestResult = await rag.ingestMarkdown({
documentId: 'policy',
source: 'policy.md',
markdown: '# Policy',
});
console.log(ingestResult.diagnostics.stages);
const grounding = validateEvidenceGrounding({
chunks: ingestResult.chunks,
relations: ingestResult.relations,
policy: { mode: 'diagnostic', quoteMatch: 'normalizedWhitespace', validateSpan: true },
});
const report = createGraphConstructionReport({ result: ingestResult });
console.log(grounding.stats);
console.log(report.counts);evaluateIngestQuality() 返回 active、needs_review 或 failed。它不会自动改变 IngestResult.status;自动运行时 gating 需要后续批准的 qualityPolicy 合约。
运行时 extractor quality controls 是 deterministic 且本地执行的:写入前会 normalize、validate、dedupe 并丢弃无效 candidates。IngestResult.status 仍保持 active | failed;candidate 质量变化通过 diagnostics.validationErrors 和 diagnostics.quality 暴露。
Search diagnostics 会汇总 retriever counts、access filtering、fusion contribution 和 hydration misses。Ingest diagnostics 会汇总 normalization、extraction、quality controls、budget trimming、grounding stats 和 write stages。createGraphConstructionReport() 会返回 compact counts、label/type 分布、quality、budget、grounding 和 token usage,不包含 raw chunk content 或 quote。Diagnostics 输出保持 JSON-serializable,并避免 raw chunk content、provider raw response 和 secrets。
Evaluation
import { runSearchEvaluation } from '@fastrag/graph/evaluation';
const report = await runSearchEvaluation(rag, [
{
id: 'policy-answer',
query: 'Who can approve the policy?',
expectedEvidence: [{ documentId: 'policy', chunkId: 'approval' }],
forbiddenEvidence: [{ documentId: 'restricted-policy' }],
limit: 5,
},
]);
if (report.summary.accessLeakageCount > 0) {
throw new Error('Access leakage detected');
}
console.log(report.summary.diagnosticsSummary);@fastrag/graph/evaluation 是确定性的 SDK 原生评测核心,用于评估 evidence identity、ranking quality、access leakage、hydration miss 和 compact diagnostics summary。它不集成 Ragas、DeepEval、LangSmith、OpenAI Evals 或 LLM-as-judge 框架。
