rag-chunk-reorder
v0.1.7
Published
RAG chunk reordering, retrieval reranking, and context packing to mitigate lost-in-the-middle in LLM context windows. Drop-in reordering for LangChain, LlamaIndex, Vercel AI, and any RAG pipeline.
Maintainers
Readme
rag-chunk-reorder
Context-aware chunk reordering for RAG pipelines. Mitigates the lost-in-the-middle phenomenon by strategically placing high-relevance passages at the beginning and end of LLM context windows.
📖 Interactive Documentation & Live Demo — visual explanations, strategy comparisons, and a live reordering playground.
Quick links: Docs · GitHub · npm
Get started in 60 seconds:
import { Reorderer } from 'rag-chunk-reorder';
const reorderer = new Reorderer({ strategy: 'scoreSpread', topK: 8 });
const reordered = reorderer.reorderSync(chunks);
// High-relevance chunks land at start and end of context (fixes lost-in-the-middle)Table of Contents
- Why?
- The Lost-in-the-Middle Problem
- How It Works
- Install
- Quick Start
- Presets
- Token Counters
- CLI
- Strategies
- Configuration
- Advanced Usage
- API Reference
- Evaluation Metrics
- Integrations
- Recipes
- Benchmarks
- Performance Considerations
- Decision Guide
- Best Practices
- Gotchas
- Requirements
- Release Cadence
- Changelog
- License
Why?
Large Language Models (LLMs) don't process all information equally. Research shows that:
- Position Bias: Models tend to prioritize information at the beginning (primacy) and end (recency) of their context
- U-Shaped Attention: Attention scores follow a U-curve — highest at extremes, lowest in the middle
- Lost-in-the-Middle: When chunks are ordered by relevance score alone, critical information can get buried in positions 3-8 where it's least likely to influence generation
This library solves this by reordering retrieved chunks to maximize the visibility of high-relevance content.
The Lost-in-the-Middle Problem
When using retrieval-augmented generation (RAG), you typically:
- Retrieve chunks from a vector database using semantic similarity
- Rank them by relevance score
- Concatenate the top-K chunks into the LLM context
The problem: vector databases return results sorted by similarity score, but this ordering doesn't account for how LLMs process context.
Example
Retrieved (by similarity score):
┌─────────────────────────────────────────────────────────────┐
│ Position 0 │ "Introduction to Python" (score: 0.95)│
│ Position 1 │ "Variables and Data Types" (score: 0.92)│
│ Position 2 │ "Control Flow: If/Else" (score: 0.89)│
│ Position 3 │ "Functions and Parameters" (score: 0.87)│ ← BURIED!
│ Position 4 │ "Error Handling" (score: 0.85)│ ← BURIED!
│ Position 5 │ "Classes and Objects" (score: 0.83)│ ← BURIED!
│ Position 6 │ "File I/O Operations" (score: 0.80)│
│ Position 7 │ "Working with Modules" (score: 0.78)│
└─────────────────────────────────────────────────────────────┘
After Reordering (scoreSpread strategy):
┌─────────────────────────────────────────────────────────────┐
│ Position 0 │ "Introduction to Python" (score: 0.95)│ ← HIGH VISIBILITY
│ Position 1 │ "Variables and Data Types" (score: 0.92)│ ← HIGH VISIBILITY
│ Position 2 │ "File I/O Operations" (score: 0.80)│
│ Position 3 │ "Working with Modules" (score: 0.78)│
│ Position 4 │ "Control Flow: If/Else" (score: 0.89)│
│ Position 5 │ "Functions and Parameters" (score: 0.87)│
│ Position 6 │ "Error Handling" (score: 0.85)│
│ Position 7 │ "Classes and Objects" (score: 0.83)│ ← HIGH VISIBILITY
└─────────────────────────────────────────────────────────────┘How It Works
The library provides multiple reordering strategies:
1. ScoreSpread (Default)
Places the highest-scoring chunks at the beginning and end, with lower-scoring chunks in the middle. This exploits the U-shaped attention curve.
Input chunks sorted by score: [A, B, C, D, E] (A = highest)
Output: [A, B, E, D, C]
↑↑ ↑↑
start end2. PreserveOrder (OP-RAG)
Maintains document structure by:
- Grouping chunks by source document
- Sorting within groups by section index
- Ordering groups by highest relevance score
Ideal for maintaining narrative flow and context.
3. Chronological
Sorts by timestamp for temporal data like:
- Event logs
- Chat transcripts
- Time-series records
4. Custom
Supply your own comparator function for domain-specific ordering.
5. Auto
Selects a strategy dynamically using query intent + metadata coverage:
- Temporal query + high timestamp coverage ->
chronological - Narrative query + source/section coverage ->
preserveOrder - Otherwise ->
scoreSpread
Install
npm install rag-chunk-reorderOr using yarn:
yarn add rag-chunk-reorderOr using pnpm:
pnpm add rag-chunk-reorderQuick Start
One line for docs QA: const chunks = await reorderForDocsQA(retrievedChunks, query, { topK: 8 });
import { Reorderer } from 'rag-chunk-reorder';
const chunks = [
{ id: '1', text: 'Most relevant passage', score: 0.95 },
{ id: '2', text: 'Somewhat relevant', score: 0.72 },
{ id: '3', text: 'Also relevant', score: 0.85 },
{ id: '4', text: 'Less relevant', score: 0.6 },
{ id: '5', text: 'Moderately relevant', score: 0.78 },
];
const reorderer = new Reorderer();
const reordered = reorderer.reorderSync(chunks);
// Result: highest-scoring chunks at positions 0 and N-1 (primacy/recency bias)Pipeline Helpers (Batteries-Included)
For common RAG patterns you can use opinionated helpers instead of wiring configs by hand:
import {
reorderForChatHistory,
reorderForDocsQA,
reorderForLogs,
} from 'rag-chunk-reorder';
// Chat-style assistants: scoreSpread + edge-aware packing
const chatChunks = await reorderForChatHistory(chunks, query);
// Docs QA / KB: auto strategy + fuzzy dedup
const docsChunks = await reorderForDocsQA(chunks, query, {
maxTokens: 4096,
tokenCounter: tokenCounterFactory('char4'),
});
// Logs / events: chronological (latest first)
const logChunks = await reorderForLogs(chunks);Presets
Use opinionated presets to get started quickly:
import { Reorderer, reordererPresets, getPreset } from 'rag-chunk-reorder';
const reorderer = new Reorderer(reordererPresets.standard);
// Or clone a preset before customizing
const config = getPreset('diverse');
config.topK = 8;
const tuned = new Reorderer(config);Token Counters
Built-in helpers for quick token budgeting:
import { Reorderer, tokenCounterFactory } from 'rag-chunk-reorder';
const reorderer = new Reorderer({
maxTokens: 2048,
tokenCounter: tokenCounterFactory('char4'), // ~1 token per 4 chars
});For accurate token counts, install @dqbd/tiktoken or tiktoken. Optional tiktoken-based counter:
import { createTiktokenCounter, Reorderer } from 'rag-chunk-reorder';
const tokenCounter = await createTiktokenCounter({ model: 'gpt-4o-mini' });
const reorderer = new Reorderer({ maxTokens: 4096, tokenCounter });Install an optional tokenizer:
npm install @dqbd/tiktokenCLI
Reorder JSON or JSONL from the terminal:
rag-chunk-reorder --input chunks.json --output reordered.json --strategy scoreSpread --topK 8
rag-chunk-reorder --jsonl --input chunks.jsonl --query "when did it happen?" --strategy autoStrategies
ScoreSpread (default)
Interleaves chunks by priority score — highest at the start and end, lowest in the middle. Exploits the U-shaped attention curve of LLMs.
const reorderer = new Reorderer({ strategy: 'scoreSpread' });With explicit start/end counts:
const reorderer = new Reorderer({
strategy: 'scoreSpread',
startCount: 3, // top 3 chunks at the beginning
endCount: 2, // next 2 chunks at the end
});PreserveOrder (OP-RAG)
Maintains original document order within each source. Groups chunks by sourceId, sorts by sectionIndex within each group, and orders groups by their highest relevance score.
const reorderer = new Reorderer({ strategy: 'preserveOrder' });
const chunks = [
{ id: '1', text: 'Intro', score: 0.9, metadata: { sourceId: 'doc-a', sectionIndex: 0 } },
{ id: '2', text: 'Details', score: 0.7, metadata: { sourceId: 'doc-a', sectionIndex: 2 } },
{ id: '3', text: 'Summary', score: 0.8, metadata: { sourceId: 'doc-b', sectionIndex: 0 } },
];Use a custom source field:
const reorderer = new Reorderer({
strategy: 'preserveOrder',
preserveOrderSourceField: 'docId',
});Chronological
Sorts by timestamp ascending. Ideal for event logs, chat transcripts, and time-series data.
const reorderer = new Reorderer({ strategy: 'chronological' });
const chunks = [
{ id: '1', text: 'Event A', score: 0.8, metadata: { timestamp: 1700000000 } },
{ id: '2', text: 'Event B', score: 0.9, metadata: { timestamp: 1700000100 } },
];Descending order for “latest first”:
const reorderer = new Reorderer({
strategy: 'chronological',
chronologicalOrder: 'desc',
});Auto
Automatically chooses scoreSpread, preserveOrder, or chronological based on:
- query intent (factoid vs narrative vs temporal)
- metadata coverage (
timestamp,sourceId,sectionIndex)
const reorderer = new Reorderer({
strategy: 'auto',
autoStrategy: {
temporalTimestampCoverageThreshold: 0.4,
narrativeSourceCoverageThreshold: 0.4,
narrativeSectionCoverageThreshold: 0.3,
},
});Custom intent detector:
const reorderer = new Reorderer({
strategy: 'auto',
autoStrategy: {
intentDetector: (query) => query?.includes('timeline') ? 'temporal' : undefined,
},
});Custom
Supply your own comparator for domain-specific ordering.
const reorderer = new Reorderer({
strategy: 'custom',
customComparator: (a, b) => (b.metadata?.priority as number) - (a.metadata?.priority as number),
});Configuration
const reorderer = new Reorderer({
// Strategy selection
strategy: 'scoreSpread', // 'scoreSpread' | 'preserveOrder' | 'chronological' | 'custom' | 'auto'
chronologicalOrder: 'asc', // 'asc' | 'desc' (chronological)
preserveOrderSourceField: 'sourceId', // custom field for preserveOrder
// Auto strategy controls (used when strategy = 'auto')
autoStrategy: {
temporalTimestampCoverageThreshold: 0.4,
narrativeSourceCoverageThreshold: 0.4,
narrativeSectionCoverageThreshold: 0.3,
},
// Scoring weights for priority computation
weights: {
similarity: 1.0, // weight for base relevance score
time: 0.0, // weight for normalized timestamp
section: 0.0, // weight for normalized sectionIndex
sourceReliability: 0.0, // weight for normalized source reliability
},
scoreNormalization: 'none', // 'none' | 'minMax' | 'zScore' | 'softmax'
scoreNormalizationTemperature: 1.0, // temperature for softmax
// ScoreSpread options (optional)
startCount: 3, // top chunks to place at start
endCount: 2, // next top chunks to place at end
// If omitted, scoreSpread uses automatic edge interleaving
// Filtering
minScore: 0.5, // drop chunks below this score
topK: 10, // max chunks to return
// Token budget
maxTokens: 4096, // max total tokens in output
tokenCounter: (text) => text.split(/\s+/).length, // required when maxTokens is set
// If chunk.tokenCount or metadata.tokenCount is present, it is used instead of tokenCounter
maxChars: 20000, // fallback character budget when tokenCounter is unavailable
charCounter: (text) => text.length, // optional character counter for maxChars
minTopK: 4, // guarantee at least this many chunks even if budget is tight
scoreClamp: [0, 1], // optional clamp for chunk scores
// Deduplication
deduplicate: true, // remove duplicate chunks
deduplicateThreshold: 0.85, // fuzzy similarity threshold (0-1, default: 1.0 = exact)
deduplicateKeep: 'highestScore', // 'highestScore' | 'first' | 'last'
deduplicateLengthBucketSize: 120, // optional length bucket prefilter
deduplicateMaxCandidates: 400, // optional cap on fuzzy comparisons
// Diversity reranking (MMR)
diversity: {
enabled: true,
lambda: 0.7, // relevance-vs-diversity tradeoff
sourceDiversityWeight: 0.15, // penalize repeating same source
sourceField: 'sourceId',
maxCandidates: 200, // optional cap to bound MMR cost on large sets
},
// Grouping
groupBy: 'sourceId', // group chunks by this metadata field
// Reranker (async only)
reranker: myReranker, // external cross-encoder reranker
onRerankerError: (err) => console.warn('Reranker failed:', err),
rerankerTimeoutMs: 2000, // optional timeout for reranker
rerankerAbortSignal: myAbortSignal, // optional AbortSignal
rerankerConcurrency: 4, // optional concurrent reranker calls
rerankerBatchSize: 16, // optional batch size for reranker
// Budget packing policy for maxTokens/topK
packing: 'auto', // 'auto' | 'prefix' | 'edgeAware'
// Validation mode
validationMode: 'strict', // 'strict' | 'coerce'
// Diagnostics hook
onDiagnostics: (stats) => console.log('reorder diagnostics', stats),
onTraceStep: (step, ms, details) => console.log(step, ms, details),
validateRerankerOutputLength: true, // ensure reranker output length matches input
validateRerankerOutputOrder: true, // ensure reranker preserves input order (unique ids)
validateRerankerOutputOrderByIndex: false, // strict index order check using id+text
// Debug
includePriorityScore: true, // include computed priorityScore in output metadata
includeExplain: true, // include per-chunk placement explanation
// Streaming (iterables)
streamingWindowSize: 128, // reorderStream window size for iterables
});Configuration Options Reference
| Option | Type | Default | Description |
| ---------------------- | ----------------------------------------------------------------------- | --------------- | ------------------------------------------------------------ |
| strategy | 'scoreSpread' | 'preserveOrder' | 'chronological' | 'custom' | 'auto' | 'scoreSpread' | Reordering algorithm to use |
| chronologicalOrder | 'asc' | 'desc' | 'asc' | Direction for chronological ordering |
| preserveOrderSourceField | string | 'sourceId' | Metadata field used by preserveOrder grouping |
| startCount | number | undefined | Number of top chunks to place at the beginning (scoreSpread) |
| endCount | number | undefined | Number of next top chunks to place at the end (scoreSpread) |
| minScore | number | undefined | Minimum relevance score threshold |
| topK | number | undefined | Maximum number of chunks to return |
| maxTokens | number | undefined | Maximum token budget for output |
| maxChars | number | undefined | Maximum character budget (fallback without tokenCounter) |
| charCounter | (text) => number | undefined | Optional character counter for maxChars |
| minTopK | number | undefined | Minimum chunks returned even if budgets are tight |
| tokenCounter | (text) => number | undefined | Token counter required for maxTokens |
| scoreClamp | [number, number] | undefined | Clamp chunk scores to a safe range |
| scoreNormalization | 'none' | 'minMax' | 'zScore' | 'softmax' | 'none' | Normalize similarity scores before weighting |
| scoreNormalizationTemperature | number | 1.0 | Softmax temperature |
| packing | 'auto' | 'prefix' | 'edgeAware' | 'auto' | How token/topK budgets are packed |
| deduplicate | boolean | false | Whether to remove duplicates |
| deduplicateThreshold | number | 1.0 | Similarity threshold for fuzzy dedup (0-1) |
| deduplicateLengthBucketSize | number | undefined | Optional length bucket prefilter (chars) |
| deduplicateMaxCandidates | number | undefined | Optional cap on fuzzy comparisons |
| diversity.enabled | boolean | false | Enable MMR/source diversity reranking |
| diversity.lambda | number (0-1) | 0.7 | Relevance-vs-diversity tradeoff |
| diversity.sourceField| string | 'sourceId' | Metadata field used for source diversity |
| diversity.maxCandidates| number | undefined | Optional cap on candidates to bound MMR cost |
| groupBy | string | undefined | Metadata field to group chunks by |
| weights.similarity | number | 1.0 | Weight for base relevance score |
| weights.time | number | 0.0 | Weight for timestamp in priority calculation |
| weights.section | number | 0.0 | Weight for sectionIndex in priority calculation |
| weights.sourceReliability | number | 0.0 | Weight for source reliability in priority calculation |
| autoStrategy.temporalTimestampCoverageThreshold | number (0-1) | 0.4 | Timestamp coverage needed for temporal auto-routing |
| autoStrategy.narrativeSourceCoverageThreshold | number (0-1) | 0.4 | Source coverage needed for narrative auto-routing |
| autoStrategy.narrativeSectionCoverageThreshold | number (0-1) | 0.3 | Section coverage needed for narrative auto-routing |
| autoStrategy.intentDetector | (query) => QueryIntent | undefined | Custom intent detector (optional) |
| validationMode | 'strict' | 'coerce' | 'strict' | Input validation behavior |
| onDiagnostics | (stats) => void | undefined | Structured pipeline stats per reorder call |
| onTraceStep | (step, ms, details?) => void | undefined | Per-step timing hook |
| validateRerankerOutputLength | boolean | true | Verify reranker output length matches input |
| validateRerankerOutputOrder | boolean | true | Verify reranker output order matches input (unique ids) |
| validateRerankerOutputOrderByIndex | boolean | false | Strict index order check using id+text |
| includePriorityScore | boolean | false | Include computed priority in output metadata |
| includeExplain | boolean | false | Attach per-chunk explanation metadata |
| streamingWindowSize | number | 128 | Window size when reorderStream receives an iterable |
| rerankerTimeoutMs | number | undefined | Reranker timeout in milliseconds |
| rerankerAbortSignal | AbortSignal | undefined | Abort signal forwarded to reranker |
| rerankerConcurrency | number | undefined | Max concurrent reranker calls |
| rerankerBatchSize | number | undefined | Batch size per reranker call |
Diagnostics payloads include counts for dedupStrategyUsed, packingStrategyUsed, tokenCountUsed,
cachedTokenCountUsed, charCountUsed, and budgetUnit to aid production monitoring.
When maxChars is used, budgetUnit is chars and tokenCountUsed is 0.
For non-ASCII text, consider providing charCounter. Example:
const reorderer = new Reorderer({
maxChars: 4000,
charCounter: (text) => Array.from(text).length,
});For full grapheme support (emoji/combined characters), use a grapheme splitter and supply that count.
Advanced Usage
Per-Call Overrides
Override config on individual calls without mutating the instance:
const reorderer = new Reorderer({ strategy: 'scoreSpread' });
// Use chronological for this one call
const result = reorderer.reorderSync(chunks, { strategy: 'chronological' });
// Instance config is unchanged
reorderer.getConfig().strategy; // 'scoreSpread'Explain Mode
Attach per-chunk placement reasons for debugging and audits:
const reorderer = new Reorderer({ includeExplain: true });
const output = reorderer.reorderSync(chunks);
console.log(output[0].metadata?.reorderExplain);Diagnostics Shortcut
const reorderer = new Reorderer({ strategy: 'scoreSpread' });
const { chunks: output, diagnostics } = await reorderer.reorderWithDiagnostics(chunks, query);Auto Strategy Selection
const reorderer = new Reorderer({
strategy: 'auto',
autoStrategy: {
temporalTimestampCoverageThreshold: 0.4,
narrativeSourceCoverageThreshold: 0.4,
narrativeSectionCoverageThreshold: 0.3,
},
});
// Chooses chronological when query is temporal and timestamps are well-covered
const result = await reorderer.reorder(chunks, 'When did this happen?');Async Reorder with Reranker
import { Reorderer, Reranker } from 'rag-chunk-reorder';
const reranker: Reranker = {
async rerank(chunks, query) {
// Call your cross-encoder model here
const reranked = await myCrossEncoder.rerank(chunks, query);
return {
chunks,
scores: reranked.map((c) => c.score),
};
},
};
const reorderer = new Reorderer({
reranker,
onRerankerError: (err) => console.warn('Reranker failed, using original scores:', err),
});
const result = await reorderer.reorder(chunks, 'What is the capital of France?');If the reranker throws, the library falls back to original scores and calls onRerankerError.
validateRerankerOutputOrder only enforces order when input ids are unique. With identical duplicates
(id + text collisions), order checks are best-effort. Prefer unique ids or include a stable
metadata.chunkId when using rerankers that may reorder inputs. For strict index checks, use
validateRerankerOutputOrderByIndex (requires unique metadata.chunkId for duplicates).
For cross-encoder APIs that benefit from batching, set rerankerBatchSize and rerankerConcurrency to control
how many parallel calls are made.
Diversity-Aware Reranking (MMR + Source Diversity)
const reorderer = new Reorderer({
strategy: 'scoreSpread',
diversity: {
enabled: true,
lambda: 0.7,
sourceDiversityWeight: 0.15,
sourceField: 'sourceId',
maxCandidates: 200,
},
});This reranks chunks before final strategy application to reduce near-duplicate context waste.
Use maxCandidates to bound MMR cost on large retrieval sets.
Streaming
for await (const chunk of reorderer.reorderStream(chunks, 'my query')) {
process.stdout.write(chunk.text);
}Note: When you pass a Chunk[] array, the result is materialized before yielding. For iterables/async iterables,
reorderStreamprocesses chunks in windows (configurable viastreamingWindowSize) to bound memory. Windowed streaming is approximate and may differ from a global reorder.
Deduplication
Remove exact or near-duplicate chunks before reordering:
import { deduplicateChunks, trigramSimilarity } from 'rag-chunk-reorder';
// Exact dedup
const unique = deduplicateChunks(chunks);
// Fuzzy dedup (trigram Jaccard similarity)
const fuzzyUnique = deduplicateChunks(chunks, {
threshold: 0.85,
keep: 'highestScore',
lengthBucketSize: 120,
maxCandidates: 400,
});
// Check similarity between two texts
const sim = trigramSimilarity('hello world foo', 'hello world bar'); // ~0.6Fuzzy dedup uses O(n²) pairwise comparison. For arrays > 500 chunks, consider exact dedup, or enable lengthBucketSize/maxCandidates to bound comparisons.
deduplicateChunks validates inputs in strict mode by default. If you need the previous permissive behavior,
use deduplicateChunksUnsafe or pass { validationMode: 'coerce' }.
Serialization
import { serializeChunks, deserializeChunks } from 'rag-chunk-reorder';
const json = serializeChunks(chunks);
const restored = deserializeChunks(json);
// Round-trip: deserialize(serialize(chunks)) ≡ chunksAnswer-Level Evaluation Harness
import { evaluateAnswerSet } from 'rag-chunk-reorder';
const summary = evaluateAnswerSet([
{
prediction: 'Paris',
references: ['Paris'],
contexts: ['Paris is the capital of France.'],
},
{
prediction: 'Lyon',
references: ['Paris'],
contexts: ['Paris is the capital of France.'],
},
]);
console.log(summary.exactMatch, summary.f1, summary.faithfulness);Framework Adapters
import {
reorderLangChainDocuments,
reorderLlamaIndexNodes,
reorderHaystackDocuments,
} from 'rag-chunk-reorder';See runnable examples in examples/langchain.ts, examples/llamaindex.ts, and examples/haystack.ts.
API Reference
Reorderer
class Reorderer {
constructor(config?: ReorderConfig);
reorderSync(chunks: Chunk[], overrides?: Partial<ReorderConfig>): Chunk[];
reorderSync(chunks: Chunk[], query: string, overrides?: Partial<ReorderConfig>): Chunk[];
reorderSyncWithDiagnostics(chunks: Chunk[], overrides?: Partial<ReorderConfig>): ReorderResult;
reorderSyncWithDiagnostics(chunks: Chunk[], query: string, overrides?: Partial<ReorderConfig>): ReorderResult;
reorder(chunks: Chunk[], query?: string, overrides?: Partial<ReorderConfig>): Promise<Chunk[]>;
reorderWithDiagnostics(chunks: Chunk[], query?: string, overrides?: Partial<ReorderConfig>): Promise<ReorderResult>;
reorderStream(
chunks: Chunk[] | Iterable<Chunk> | AsyncIterable<Chunk>,
query?: string,
overrides?: Partial<ReorderConfig>,
): AsyncIterable<Chunk>;
getConfig(): ReorderConfig;
}Standalone Functions
| Function | Description |
| ----------------------------------------------- | ---------------------------- |
| scoreChunks(chunks, weights) | Compute priority scores |
| scoreChunksWithOptions(chunks, weights, options?) | Priority scores with normalization |
| validateChunks(chunks) | Validate chunk array |
| prepareChunks(chunks, mode?) | Validate or coerce chunks |
| validateConfig(config) | Validate configuration |
| mergeConfig(config) | Merge with defaults |
| reordererPresets | Preset configs |
| getPreset(name) | Retrieve a preset copy |
| tokenCounterFactory(name) | Built-in token counters |
| createTiktokenCounter(options?) | Optional tiktoken counter |
| deduplicateChunks(chunks, options?) | Remove duplicates |
| deduplicateChunksUnsafe(chunks, options?) | Remove duplicates (coerce) |
| trigramSimilarity(a, b) | Trigram Jaccard similarity |
| serializeChunks(chunks) | Serialize to JSON |
| deserializeChunks(json) | Deserialize from JSON |
| detectQueryIntent(query, autoConfig) | Query intent detection |
| metadataCoverage(chunks) | Metadata coverage ratios |
| resolveAutoStrategy(chunks, query, autoConfig)| Auto strategy resolver |
| rerankWithDiversity(chunks, diversityConfig) | MMR/source diversity rerank |
| keyPointRecall(keyPoints, texts, options?) | Key-point recall metric |
| keyPointPrecision(keyPoints, texts, options?) | Key-point precision metric |
| spanRecall(spans, texts, options?) | Relevant span recall metric |
| positionEffectiveness(chunks) | Position effectiveness score |
| ndcg(scores) | Normalized DCG |
| exactMatch(prediction, references) | Answer EM metric |
| isAnswerable(text) | Answerable/unanswerable classifier |
| answerabilityMatch(prediction, references) | Answerability match score |
| tokenF1(prediction, references) | Answer-level token F1 |
| citationCoverage(prediction, contexts, options?) | Answer token coverage in contexts |
| faithfulness(prediction, contexts, options?) | Context-support faithfulness |
| retrievalRecallAtK(retrieved, relevantIds, k?) | Retrieval recall@k |
| evaluateAnswerSet(cases) | Aggregate answer evaluation |
| reorderLangChainDocuments(docs, options?) | LangChain adapter |
| reorderLangChainPairs(pairs, options?) | LangChain scored-pairs adapter |
| reorderLlamaIndexNodes(nodes, options?) | LlamaIndex adapter |
| reorderHaystackDocuments(docs, options?) | Haystack-shaped adapter |
Types
interface Chunk {
id: string;
text: string;
score: number;
tokenCount?: number;
metadata?: ChunkMetadata;
}
interface ChunkMetadata {
timestamp?: number;
page?: number;
sectionIndex?: number;
chunkId?: string;
sourceId?: string | number | boolean;
sourceReliability?: number;
tokenCount?: number;
reorderExplain?: ReorderExplain;
[key: string]: unknown;
}
interface ReorderResult {
chunks: Chunk[];
diagnostics: ReorderDiagnostics;
}
interface ReorderExplain {
strategy: 'scoreSpread' | 'preserveOrder' | 'chronological' | 'custom';
placement: string;
position: number;
priorityRank?: number;
priorityScore?: number;
}
interface Reranker {
rerank(
chunks: Chunk[],
query: string,
options?: { signal?: AbortSignal },
): Promise<RerankerResult>;
}
type RerankerResult = Chunk[] | { chunks: Chunk[]; scores: number[] };
type Strategy = 'scoreSpread' | 'preserveOrder' | 'chronological' | 'custom' | 'auto';
type QueryIntent = 'factoid' | 'narrative' | 'temporal';Evaluation Metrics
import {
keyPointRecall,
keyPointPrecision,
spanRecall,
positionEffectiveness,
ndcg,
exactMatch,
isAnswerable,
answerabilityMatch,
tokenF1,
citationCoverage,
faithfulness,
retrievalRecallAtK,
evaluateAnswerSet,
} from 'rag-chunk-reorder';
const keyPoints = ['capital of France', 'Paris', 'Eiffel Tower'];
const texts = reordered.map((c) => c.text);
// What fraction of key points appear in the chunks?
const recall = keyPointRecall(keyPoints, texts);
const spanRecallScore = spanRecall(['Paris is the capital'], texts);
// What fraction of chunks contain at least one key point?
const precision = keyPointPrecision(keyPoints, texts);
// Case-insensitive matching
const ciRecall = keyPointRecall(keyPoints, texts, { caseInsensitive: true });
// Are high-scoring chunks at the start and end? (U-shaped weighting)
const posEff = positionEffectiveness(scoredChunks);
// Ranking quality (nDCG) — scores must be non-negative
const rankQuality = ndcg(reordered.map((c) => c.score));
// Answer-level metrics
const em = exactMatch('Paris', ['Paris', 'City of Paris']);
const answerable = isAnswerable('Paris');
const answerability = answerabilityMatch('Paris', ['Paris']);
const f1 = tokenF1('Paris is capital of France', ['The capital of France is Paris']);
const grounded = faithfulness('Paris is capital of France', texts);
const coverage = citationCoverage('Paris is capital of France', texts);
const recallAtK = retrievalRecallAtK(reordered, ['doc-1', 'doc-3'], 5);
const answerSummary = evaluateAnswerSet([
{ prediction: 'Paris', references: ['Paris'], contexts: texts },
]);Quick eval on a small fixture: run evaluateAnswerSet(cases) or ndcg(scores) to get measurable metrics; the package is eval-friendly for before/after comparisons.
For dataset-level before/after evaluations, see examples/eval-cli.js with a JSONL dataset
like examples/data/sample-eval.jsonl. Use --report eval-report.md to generate a markdown
summary suitable for README updates.
Pipeline Order
The reordering pipeline processes chunks in this order:
- Validate/coerce — check required fields (id, text, score), optionally clamp/clean
- minScore filter — drop chunks below threshold
- Deduplicate — remove exact/fuzzy duplicates
- Score — compute priorityScore from weights + metadata
- Diversity rerank (optional) — apply MMR/source diversity
- Auto strategy resolve (optional) — pick strategy from query + metadata coverage
- Group (optional) — partition by metadata field
- Strategy — apply reordering algorithm
- Strip internals — remove priorityScore/originalIndex
- Budget — apply
maxTokens(ormaxCharsfallback) withpackingpolicy - topK/minTopK — enforce
topKlimit andminTopKminimum
Performance Considerations
| Operation | Time Complexity | Notes | | ------------- | --------------- | --------------------- | | ScoreSpread | O(n log n) | Sorting dominates | | PreserveOrder | O(n log n) | Group + sort | | Chronological | O(n log n) | Sort by timestamp | | Custom | O(n log n) | Depends on comparator | | MMR Diversity | O(n²) | Pairwise similarity | | Fuzzy Dedup | O(n²) | Pairwise comparison | | Exact Dedup | O(n) | Hash-based |
Recommendations
- For large datasets (>500 chunks): Use exact dedup or pre-filter
- Diversity rerank at scale: Set
diversity.maxCandidatesto bound MMR cost (e.g., 200–500) - Token budget: Set
maxTokensto avoid context overflow - Streaming: Use
reorderStream()withstreamingWindowSize(default 128); for very long streams, increasing it can improve ordering quality at the cost of memory - Diversity at scale: For >500 chunks, keep
maxCandidates(e.g. 200–300) to bound latency
Decision Guide
| Use Case | Recommended Strategy | Notes |
| -------- | -------------------- | ----- |
| General RAG | scoreSpread | Best default, exploits primacy/recency |
| Multi-document narratives | preserveOrder | Requires sourceId + sectionIndex coverage |
| Logs / timelines | chronological | Set chronologicalOrder: 'desc' for latest-first |
| Mixed workloads | auto | Needs good metadata coverage for routing |
| Small topK with redundancy | scoreSpread + diversity.enabled | Reduces near-duplicate waste |
Best Practices
Choose the right strategy:
scoreSpread: General-purpose, works well with most RAG pipelinespreserveOrder: When document structure matters (technical docs, articles)chronological: For temporal data (logs, chat, events)auto: Let query intent + metadata coverage route strategy dynamically
Tune startCount/endCount: Set explicit values (for example 3/2) when you want deterministic head/tail placement; leave unset for automatic interleaving
Use rerankers: For critical applications, add a cross-encoder reranker to improve relevance scores
Enable diversity when topK is small:
diversity.enabled: truehelps avoid near-duplicate context wasteMonitor answer-level metrics: Track
exactMatch,tokenF1,answerability,citationCoverage, andfaithfulnessin addition to ranking metricsSet minScore: Filter low-quality chunks early to reduce processing overhead
Token budget: Always set
maxTokensto prevent context overflow
Gotchas
- Auto strategy needs metadata coverage: Without
timestamp/sourceId/sectionIndex, auto will fall back toscoreSpread. - Windowed streaming is approximate:
reorderStream()over iterables reorders per window; results can differ from full-batch ordering. - Score normalization matters: If your retrieval scores are not comparable, set
scoreNormalizationto stabilize weighting. - Fuzzy dedup can be expensive: For large sets, prefer exact dedup or use
lengthBucketSize/maxCandidates. - PreserveOrder needs stable section indices: Missing or non-finite
sectionIndexvalues fall back to input order.
Integrations
Ecosystem: Use with LangChain · LlamaIndex · Vercel AI (reorderVercelAIResults) · Haystack · Pinecone · Qdrant.
Works with: Pinecone, Qdrant, OpenAI, Cohere, Vercel AI, and any RAG or vector pipeline.
First-class adapters are available for framework-shaped objects:
reorderLangChainDocuments()andreorderLangChainPairs()reorderLlamaIndexNodes()reorderHaystackDocuments()reorderVercelAIResults()for Vercel AI SDK-style resultsreorderLangGraphState()for LangGraph state chunksreorderVectorStoreResults()for generic vector DB rows
Adapter Quickstarts
LangChain:
import { reorderLangChainDocuments } from 'rag-chunk-reorder';
const reordered = await reorderLangChainDocuments(docs, {
query,
config: { strategy: 'scoreSpread', topK: 8 },
});LlamaIndex:
import { reorderLlamaIndexNodes } from 'rag-chunk-reorder';
const reordered = await reorderLlamaIndexNodes(nodes, {
query,
config: { strategy: 'auto', topK: 8 },
});Haystack:
import { reorderHaystackDocuments } from 'rag-chunk-reorder';
const reordered = await reorderHaystackDocuments(docs, {
query,
config: { strategy: 'chronological', chronologicalOrder: 'desc' },
});See runnable integration examples:
examples/langchain.tsexamples/llamaindex.tsexamples/haystack.tsexamples/evaluation.tsexamples/eval-cli.jsexamples/langchain-pinecone.tsexamples/llamaindex-qdrant.tsexamples/openai-responses.ts
Recipes
Drop-in recipes for popular stacks live in examples/:
LangChain + Pinecone
import { Pinecone } from '@pinecone-database/pinecone';
import { reorderLangChainDocuments } from 'rag-chunk-reorder';
const pc = new Pinecone({ apiKey: process.env.PINECONE_API_KEY! });
const index = pc.index('docs');
const results = await index.query({ vector: embedding, topK: 20, includeMetadata: true });
const documents = results.matches.map((m) => ({
id: m.id,
pageContent: m.metadata?.text ?? '',
metadata: { score: m.score, sourceId: m.metadata?.docId },
}));
const reordered = await reorderLangChainDocuments(documents, {
query,
config: { strategy: 'scoreSpread', topK: 8, startCount: 2, endCount: 2 },
});See examples/langchain-pinecone.ts.
LlamaIndex + Qdrant
import { QdrantClient } from '@qdrant/js-client-rest';
import { reorderLlamaIndexNodes } from 'rag-chunk-reorder';
const qdrant = new QdrantClient({ url: process.env.QDRANT_URL! });
const hits = await qdrant.search('docs', { vector: embedding, limit: 20 });
const nodes = hits.map((hit) => ({
id_: hit.id?.toString(),
text: hit.payload?.text ?? '',
score: hit.score,
metadata: { timestamp: hit.payload?.timestamp, sourceId: hit.payload?.docId },
}));
const reordered = await reorderLlamaIndexNodes(nodes, {
query,
config: { strategy: 'auto', topK: 8 },
});See examples/llamaindex-qdrant.ts.
OpenAI Responses API
import OpenAI from 'openai';
import { Reorderer } from 'rag-chunk-reorder';
const reorderer = new Reorderer({ strategy: 'scoreSpread', topK: 2, startCount: 1, endCount: 1 });
const context = reorderer.reorderSync(chunks).map((c) => c.text).join('\\n\\n');
const response = await client.responses.create({
model: 'gpt-4.1-mini',
input: `Answer using the context below:\\n\\n${context}\\n\\nQ: ${query}`,
});See examples/openai-responses.ts.
Vercel AI + Vector Store
import { reorderVercelAIResults } from 'rag-chunk-reorder';
// results: { id?: string; content: string; score?: number; metadata?: any }[]
const reordered = await reorderVercelAIResults(results, {
query,
config: { strategy: 'auto', topK: 8 },
});
const context = reordered.map((r) => r.content).join('\n\n');LangGraph State Chunks
import { reorderLangGraphState } from 'rag-chunk-reorder';
// stateChunks: { id?: string; content: string; score?: number; metadata?: any }[]
const reordered = await reorderLangGraphState(stateChunks, {
query,
config: { strategy: 'auto', topK: 8 },
});
const context = reordered.map((c) => c.content).join('\n\n');Benchmarks
Reproducible scripts and datasets are included in bench/.
Run:
npm run benchSample results (TopK = 4, bench/data/sample.jsonl):
| Metric | Baseline (prefix) | Reordered (scoreSpread) | | ------ | ----------------- | ------------------------ | | keyPointRecall | 75.0% | 75.0% | | positionEffectiveness | 88.9% | 91.0% | | nDCG | 100.0% | 99.7% |
Position effectiveness improves consistently because high-priority chunks are pushed to primacy/recency zones.
Requirements
- Node.js >= 18.0.0
- TypeScript >= 5.0 (for type definitions)
Release Cadence
Weekly release PRs are prepared automatically, with changelog updates in CHANGELOG.md. Tags drive npm publishing with provenance.
Changelog
See CHANGELOG.md for version history and upgrade notes.
Documentation
For interactive examples, visual explanations of the lost-in-the-middle problem, and a live reordering demo, visit the documentation site.
License
MIT
