make-abstract
v2.7.0
Published
A CLI tool for making abstracts using AI and Zotero
Readme
make-abstract
A command-line tool for automatically generating academic abstracts from PDFs in your Zotero library and processing PDF/TXT files directly using AI. Supports both traditional abstract generation and structured screening questionnaires.
Features
- Multiple Input Support: Process Zotero items OR PDF/TXT files directly
- Multiple AI Providers: Supports OpenAI, Google Gemini, Anthropic Claude, Groq, DeepSeek, Cerebras, Mistral, and xAI
- Flexible Output Modes:
- Text Mode: Generate traditional abstracts
- JSON Mode: Structured screening with questionnaires
- Scan Mode: Direct PDF transcription using AI vision
- Categorize Mode: AI-powered Zotero collection management and item categorization
- Multiple Output Destinations: Update Zotero abstracts, create notes, print to console, or save to files
- Batch Processing: Handle multiple items/PDFs at once
- Custom Prompts & Screening Questions: Use your own AI prompts or screening questionnaires
- Automatic Tag Management: Boolean screening results automatically update Zotero tags
- Collection Management: AI-powered categorization and automatic Zotero collection creation
- File Accumulation: Process multiple items into single output files
- Selective abstract replacement based on pattern matching
Installation
npm install -g make-abstractUsage
Basic Usage
Zotero Items
# Single item by key
make-abstract ABCD1234
# Multiple items by keys
make-abstract ABCD1234 EFGH5678 IJKL9012
# Using Zotero select links (automatically extracts keys)
make-abstract "zotero://select/groups/12345/items/ABCD1234"
# Mix of keys and select links
make-abstract ABCD1234 "zotero://select/groups/12345/items/EFGH5678"PDF/TXT Files (Direct Processing)
# Single PDF file
make-abstract paper1.pdf --dest print
# Single TXT file
make-abstract document.txt --dest print
# Multiple files (mix of PDF and TXT)
make-abstract paper1.pdf document.txt paper2.pdf --dest file --out combined_results
# Mix of files and Zotero items
make-abstract ABCD1234 paper.pdf document.txt EFGH5678Note: PDF/TXT files can only use --dest print or --dest file (not abstract or note).
Output Modes
Text Mode (Default)
Generates traditional academic abstracts:
# Generate abstract for Zotero item
make-abstract ABCD1234
# Generate abstract for PDF with custom prompt
make-abstract paper.pdf --prompt custom-prompt.txt --dest print
# Generate abstract for TXT file
make-abstract document.txt --dest printJSON Mode
Structured screening using questionnaires:
# Screen Zotero items with questionnaire
make-abstract ABCD1234 EFGH5678 --mode json --prompt screening-questions.txt --dest file --out results
# Screen PDF files
make-abstract paper1.pdf paper2.pdf --mode json --prompt questions.txt --dest print
# Screen TXT files
make-abstract document1.txt document2.txt --mode json --prompt questions.txt --dest printJSON Mode Requirements:
- Must use
--promptwith screening questions file - Automatically updates Zotero tags for boolean questions
- Outputs structured JSON with metadata
Scan Mode
Direct PDF transcription using AI vision capabilities:
# Scan PDF for text transcription (basic)
make-abstract paper.pdf --mode scan --dest print
# Scan PDF with custom prompt
make-abstract paper.pdf --mode scan --prompt transcription-prompt.txt --dest file --out scanned_content
# Batch scan multiple PDFs
make-abstract paper1.pdf paper2.pdf paper3.pdf --mode scan --dest file --out all_scans
# Save scan results to automatically named files (result-{filename}.txt by default)
make-abstract paper.pdf --mode scan --save-scan --dest print
# Save scan with custom prompt to result files
make-abstract document1.pdf document2.pdf --mode scan --save-scan --prompt extraction-prompt.txt --dest print
# Customize the prefix for saved files
make-abstract paper.pdf --mode scan --save-scan --save-prefix "extracted-" --dest print
# Results in files like: extracted-paper.txt instead of result-paper.txtCategorize Mode
Automatically categorize Zotero items into collections using AI:
# Categorize items using a collection structure JSON file
make-abstract categorize collections.json "zotero://select/groups/12345/items/ABCD1234" "zotero://select/groups/12345/items/EFGH5678"
# Track costs during categorization
make-abstract categorize collections.json "zotero://select/groups/12345/items/ABCD1234" --cost categorization-costsCategorize Mode Features:
- Collection Management: Automatically creates missing collections based on hierarchical structure
- Smart Matching: Only creates collections that don't already exist (matches by hierarchy + name)
- AI-Powered Categorization: Uses AI to analyze item content and suggest appropriate collections
- Batch Processing: Categorize multiple items at once
- Group Confirmation: Confirms target Zotero group before making changes
- Cost Tracking: Optional cost analysis for AI categorization calls
- Hierarchical Support: Supports nested collection structures with parent-child relationships
Collection JSON Structure:
[
{
"name": "Research Methods",
"description": "Studies focusing on research methodologies and approaches",
"children": [
{
"name": "Quantitative Methods",
"description": "Studies using quantitative research approaches, statistical analysis, surveys, experiments"
},
{
"name": "Qualitative Methods",
"description": "Studies using qualitative research approaches, interviews, case studies, ethnography"
}
]
},
{
"name": "Subject Areas",
"description": "Classification by academic discipline or field of study",
"children": [
{
"name": "Education",
"description": "Educational research, pedagogy, learning theories, curriculum development"
},
{
"name": "Psychology",
"description": "Psychological studies, cognitive science, behavioral research",
"disabled": true
}
]
}
]Collection Properties:
name(required): The collection namedescription(optional): Description of what items belong in this collectionchildren(optional): Array of child collections for hierarchical structuredisabled(optional): Whentrue, the collection will be created in Zotero but excluded from AI categorization prompts
Categorize Mode Process:
- Group Verification: Queries Zotero API for the selected group's name and asks for user confirmation
- Collection Sync: Fetches existing collections and creates only missing ones based on the JSON structure
- AI Analysis: For each Zotero item, prompts AI to analyze content and suggest appropriate collections
- Collection Assignment: Automatically adds items to the AI-suggested collections via Zotero API
- Progress Tracking: Shows real-time progress and cost information during processing
Requirements:
- All Zotero URLs must belong to the same group
- JSON file must follow the Collection structure format
- Requires valid Zotero API configuration
- Only works with Zotero items (not PDF/TXT files)
Scan Mode Features:
- Uses AI vision to directly read PDF content (no OCR extraction first)
- Complete document transcription: Extracts ALL readable text including headers, body text, captions, footnotes, and annotations
- Simple table conversion: Converts tables to clean key: value format instead of complex table structures
- Plain text output: No markdown, no table formatting, just clean readable text
- Smart text correction: Applies OCR error correction for merged words, character substitutions, and spelling errors
- Advanced selection detection: Correctly interprets crossed-out text as SELECTED, plus checkmarks, circles, and other markings
- Clean organization: Natural document flow with key: value pairs for structured data and plain paragraphs for text
- Works with image-based PDFs, scanned documents, and text PDFs
- Only available for PDF files (not TXT files or Zotero items)
- Can use custom prompts for specific transcription needs
--save-scanoption automatically saves results to{prefix}{original_filename}.txtfiles (prefix customizable with--save-prefix)
Output Destinations
Use the --dest option to control where the output goes:
# Update the Zotero item's abstract field (default, Zotero only)
make-abstract ABCD1234 --dest abstract
# Create a note attachment in Zotero (Zotero only)
make-abstract ABCD1234 --dest note
# Print to console
make-abstract ABCD1234 --dest print
# Save to file with optional custom filename
make-abstract ABCD1234 --dest file --out my_abstractCost Tracking
Track token usage and costs for all API calls using the --cost option with current January 2025 pricing:
# Track costs and save to JSON file
make-abstract ABCD1234 --cost usage-report
# Works with all modes and destinations
make-abstract paper.pdf --mode scan --cost scan-costs.json
make-abstract *.pdf --mode json --prompt questions.txt --cost batch-analysis
# Combine with any other options
make-abstract ABCD1234 EFGH5678 --mode json --prompt screening.txt --dest file --out results --cost project-costsCost Tracking Features:
- Complete Model Coverage: 200+ models across 8 major AI providers
- Current 2025 Pricing: Updated with latest rates, including promotional pricing
- Comprehensive Providers: OpenAI, Anthropic, Google Gemini, Groq, DeepSeek, xAI, Cerebras, Mistral
- Detailed Tracking: Records all API calls with token usage (input/output/total)
- Rich Reports: JSON exports with timestamps, model info, and cost breakdowns
Comprehensive Model Coverage (per million tokens):
OpenAI (50+ models):
- Latest: GPT-4.1 ($2.00/$8.00), GPT-4.1-mini ($0.40/$1.60), o3 ($20.00/$80.00)
- Flagship: GPT-4o ($2.50/$10.00), GPT-4o-mini ($0.15/$0.60), ChatGPT-4o-latest ($5.00/$15.00)
- Reasoning: o1 ($15.00/$60.00), o1-pro ($60.00/$240.00), o1-mini ($3.00/$12.00)
- Legacy: GPT-4 ($30.00/$60.00), GPT-3.5-turbo ($0.50/$1.50), all turbo variants
Google Gemini (40+ models):
- Thinking Models: Gemini 2.5 Pro ($1.25/$10.00), Gemini 2.5 Flash ($0.30/$2.50)
- Audio Models: Native Audio Dialog ($0.50/$2.00), TTS ($0.50-$1.00 input)
- Stable: Gemini 1.5 Pro ($1.25/$5.00), Gemini 1.5 Flash ($0.075/$0.30)
- Efficient: Gemini 2.0 Flash ($0.10/$0.40), Flash-Lite ($0.075/$0.30)
- Free: All Gemma open models and embeddings
Anthropic (20+ models):
- Latest: Claude 4 Opus ($20.00/$100.00), Claude 4 Sonnet ($5.00/$25.00)
- Extended Thinking: Claude 4 Opus Extended ($30.00/$150.00)
- Current: Claude 3.5 Sonnet ($3.00/$15.00), Claude 3.5 Haiku ($1.00/$5.00)
Other Providers:
- DeepSeek: V3 ($0.14/$0.28), R1 Reasoner ($0.55/$2.19) - promotional pricing
- Groq: Llama 4 Scout ($0.11/$0.34), Llama 4 Maverick ($0.20/$0.60)
- xAI: Grok-3 ($2.00/$10.00), Grok-3-mini ($0.50/$2.50)
Cost Report Structure
{
"totalCost": {
"inputCost": 0.00125,
"outputCost": 0.00240,
"totalCost": 0.00365,
"totalTokens": 1450,
"totalPromptTokens": 500,
"totalCompletionTokens": 950
},
"apiCalls": [
{
"id": "api_1704067200_abc123def",
"timestamp": "2024-01-01T10:00:00.000Z",
"provider": "openai",
"model": "gpt-4o-mini",
"operation": "generateText",
"input": "Create a concise academic abstract...",
"usage": {
"promptTokens": 250,
"completionTokens": 150,
"totalTokens": 400
},
"costs": {
"inputCost": 0.000375,
"outputCost": 0.0009,
"totalCost": 0.001275
}
}
],
"metadata": {
"generatedAt": "2024-01-01T10:05:00.000Z",
"provider": "openai",
"currency": "USD"
}
}File Output Behavior
Text Mode Files (.txt)
- Single item: Creates new file each run
- Multiple items: All items appended to same file with separators
- Custom filename:
--out filename(adds .txt automatically) - Default filename:
abstract_{key}_{date}.txt
JSON Mode Files (.json)
- Single item: Creates JSON array with one object
- Multiple items: All items in single JSON array
- Custom filename:
--out filename(adds .json automatically) - Default filename:
screening_{key}_{date}.json
Screening Questions
Create screening questionnaires for systematic reviews or data extraction:
Question Format
Each line: key|type|question
Supported Types:
boolean- True/false questions (creates Zotero tags)string- Text responsesarray- List of itemsnumber- Numeric values
Example Screening File (screening-questions.txt):
hasRCT|boolean|Is this a randomized controlled trial?
sampleSize|number|What is the sample size?
methodology|string|What research methodology was used?
outcomes|array|What outcomes were measured?
hasBlinding|boolean|Was blinding used in the study?Automatic Tag Management
Boolean questions automatically create/update Zotero tags:
- Format:
_a:{key}={value}(e.g.,_a:hasRCT=true) - Removes old tags with same key before adding new ones
- Only boolean questions create tags
Complete Document Processing with Scan Mode
Scan mode provides comprehensive document transcription with special attention to structured content. It extracts ALL text while intelligently processing forms, tables, and other structured elements:
Simple Processing Features:
- Complete text extraction: Captures all headers, body text, captions, footnotes, and annotations
- Key-value table conversion: Converts tables to simple key: value format instead of complex structures
- Plain text output: No markdown formatting, no table structures, just clean readable text
- Form data extraction: Converts form fields to key: value pairs
- Multiple input types: Handles text boxes, checkboxes, radio buttons, dropdown selections
- Smart selection detection:
- Crossed-out text = SELECTED (important: crossed out means chosen, not ignored)
- Checkmarks (✓, ✗, ✔) = SELECTED
- Circled or highlighted options = SELECTED
- X marks in boxes = SELECTED
- Identifies final answers when multiple options are marked
- Converts handwriting to readable text
- Shows selections as key: selected_option format
- OCR error correction: Fixes merged words, character substitutions, and spelling errors
- Clean organization: Logical reading order with simple key: value pairs and plain paragraphs
Example Document Processing:
# Complete document transcription with enhanced table/form processing
make-abstract research-paper.pdf --mode scan --dest print
# Process mixed content (text + forms + tables) with comprehensive extraction
make-abstract complex-document.pdf --mode scan --dest file --out complete-transcription
# Use specialized prompt for specific document types
make-abstract technical-report.pdf --mode scan --prompt custom-extraction-prompt.txt --dest file --out extracted-dataThe scan mode provides complete document transcription with simple key: value output for structured data:
Student Name: SYNCLAIRE CHELIMO
Age: 17
Gender: FEMALE (crossed out = selected)
Has Disabilities: NO (checkmark = selected)
Preferred Subject: Mathematics (circled = selected)
Grade/Form: 1234567
School Type: PUBLIC
This is an example of how regular paragraph text would appear in the output. All text content is preserved and presented in clean, readable format. The scan mode correctly identifies crossed-out text, checkmarks, and other markings as selections rather than corrections.Custom Prompts
Use your own AI prompt from a text file:
# Text mode with custom prompt
make-abstract ABCD1234 --prompt my-prompt.txt
# JSON mode with screening questions (required)
make-abstract ABCD1234 --mode json --prompt screening-questions.txtExample Custom Prompt (example-prompt.txt):
Create a detailed academic abstract for the following research paper. The abstract should be structured with the following elements:
1. Background/Context: Briefly explain the research problem or gap
2. Objective: State the main research question or hypothesis
3. Methods: Describe the methodology or approach used
4. Results: Summarize the key findings
5. Conclusions: Highlight the main implications and contributions
The abstract should be approximately 200-250 words and written in a formal academic tone.
IMPORTANT: Respond with ONLY the abstract text. Do not include section headers or formatting.Example Form Extraction Prompt (form-extraction-prompt.txt):
You are an expert form processing system. Analyze this document and extract all form data with high accuracy.
FORM ANALYSIS GUIDELINES:
1. FIELD IDENTIFICATION:
- Detect all form fields, labels, and input areas
- Recognize various input types: text boxes, checkboxes, radio buttons, dropdowns
- Identify required vs optional fields
2. RESPONSE EXTRACTION:
- Text fields: Extract all typed or handwritten content
- Single-choice questions: Identify the ONE selected option (ignore crossed-out marks)
- Multiple-choice: List ALL selected options
- Yes/No questions: Provide clear "YES" or "NO" based on checkmarks
- Numerical fields: Extract exact numbers (grades, scores, IDs, phone numbers)
3. INTELLIGENT INTERPRETATION:
- Crossed-out text = IGNORE (these are mistakes/corrections)
- Multiple marks on single-choice = Choose the clearest/final mark
- Handwriting: Convert to most likely intended text
- Empty fields: Mark as "Not filled" or "N/A"
4. OUTPUT FORMAT - Present as a clean table:
| Question/Field | Response | Notes |
|----------------|----------|-------|
| [Field name] | [Answer] | [Any clarifications] |
5. QUALITY CHECKS:
- Verify logical consistency between related fields
- Flag any unclear or ambiguous responses in Notes column
- Ensure all visible form fields are captured
CRITICAL: Focus on ACCURACY. Extract only final, intended responses. Ignore stray marks and corrections.Complete Examples
# Traditional abstract generation
make-abstract ABCD1234 EFGH5678 --dest file --prompt academic-prompt.txt
# Systematic review screening
make-abstract ABCD1234 EFGH5678 --mode json --prompt screening-questions.txt --dest file --out review_screening
# Mixed processing (files and Zotero items)
make-abstract ABCD1234 paper1.pdf document.txt paper2.pdf --dest print
# Create notes with screening results
make-abstract ABCD1234 --mode json --prompt questions.txt --dest note
# Batch file processing to single file
make-abstract *.pdf *.txt --mode json --prompt screening.txt --dest file --out all_papers
# PDF transcription with AI vision
make-abstract scanned-document.pdf --mode scan --dest file --out transcribed_text
# Custom transcription for specific document types
make-abstract table-heavy-document.pdf --mode scan --prompt table-extraction-prompt.txt --dest print
# Form processing with specialized prompt
make-abstract survey-form.pdf --mode scan --prompt form-extraction-prompt.txt --dest file --out form-data
# Comprehensive OCR text correction
make-abstract ocr-text-with-errors.txt --dest print
# Cost tracking examples
make-abstract ABCD1234 --cost usage-report.json
make-abstract paper1.pdf paper2.pdf --mode scan --cost scan-analysis --dest file --out transcripts
# Save scan results to automatically named files
make-abstract report.pdf research-paper.pdf --mode scan --save-scan --dest print
# Customize file prefixes for saved files
make-abstract paper.pdf --mode scan --save-scan --save-prefix "scan-" --dest print
make-abstract ABCD1234 --mode json --prompt screening.txt --dest note --save-json --save-prefix "screening-"
# Process TXT files for abstract generation
make-abstract research-notes.txt manuscript-draft.txt --dest file --out text-abstracts
# Mix of PDF and TXT files with JSON screening
make-abstract paper.pdf notes.txt appendix.pdf --mode json --prompt screening.txt --dest file --out mixed-analysis
# Process multiple forms with intelligent extraction
make-abstract student-form1.pdf student-form2.pdf --mode scan --save-scan --dest print
# Batch form processing with custom prompt
make-abstract survey-*.pdf --mode scan --prompt form-extraction-prompt.txt --dest file --out all-formsJSON Output Structure
JSON mode produces structured output with metadata:
[
{
"zotero_id": "ABCD1234",
"date_generated": "2024-01-15T10:30:00.000Z",
"model": "gpt-4o-mini",
"prompt": "hasRCT|boolean|Is this a randomized controlled trial?...",
"result": {
"hasRCT": true,
"sampleSize": 150,
"methodology": "Double-blind randomized controlled trial",
"outcomes": ["primary efficacy", "safety", "quality of life"],
"hasBlinding": true
}
}
]For PDF/TXT files, zotero_id will be null.
Initial Setup
On first run, the tool will guide you through configuration:
- Choose your AI provider (OpenAI, Gemini, Anthropic, Groq, DeepSeek, Cerebras, Mistral, or xAI)
- Enter your API key for the chosen provider
- Enter your Zotero API key (if processing Zotero items)
- Enter your Zotero group ID (if processing Zotero items)
- Optionally configure the AI model
Note: For PDF/TXT-only usage, only AI configuration is required.
AI Provider and Model Selection
Listing and Selecting Models
Use the models command to view available models for each provider and select one:
# Interactive provider and model selection
make-abstract models
# List models for a specific provider
make-abstract models openai
make-abstract models anthropic
make-abstract models groq
# Show help for models command
make-abstract models --helpThe models command will:
- Show all available models for the selected provider
- Let you select a model interactively
- Update your configuration with the selected model
- Optionally set the provider as your active AI provider
Supported Providers and Models (Updated June 2025)
OpenAI
- GPT-4 series: gpt-4o, gpt-4o-mini, gpt-4o-audio-preview, gpt-4-turbo, gpt-4
- GPT-4.1 series: gpt-4.1, gpt-4.1-mini, gpt-4.1-nano
- GPT-4.5 preview: gpt-4.5-preview
- Reasoning models: o1, o1-mini, o1-preview, o1-pro, o3, o3-mini, o4-mini
Anthropic Claude
- Claude 4: claude-4-opus, claude-4-sonnet
- Claude 3.7: claude-3-7-sonnet
- Claude 3.5: claude-3-5-sonnet-v2, claude-3-5-sonnet, claude-3-5-haiku
- Claude 3: claude-3-opus, claude-3-sonnet, claude-3-haiku
Google Gemini
- Gemini 2.5: gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite-preview-06-17
- Gemini 2.0: gemini-2.0-flash, gemini-2.0-flash-lite, gemini-2.0-flash-preview-image-generation
- Gemini 1.5: gemini-1.5-flash, gemini-1.5-flash-8b, gemini-1.5-pro
Groq
- Llama models: llama-3.3-70b-versatile, llama-3.1-70b-versatile, llama-3.1-8b-instant
- Mixtral: mixtral-8x7b-32768
- Gemma: gemma-7b-it, gemma2-9b-it
DeepSeek
- Latest: deepseek-v3, deepseek-r1, deepseek-r1-lite-preview
- Specialized: deepseek-prover-v2
- Legacy: deepseek-v2.5, deepseek-v2, deepseek-coder
Cerebras
- Llama models: llama-3.3-70b, llama-3.1-70b, llama-3.1-8b, llama-3-70b, llama-3-8b
Mistral
- Latest: mistral-large-2411, mistral-small-3.1-2503, codestral-2501
- Specialized: mistral-ocr-2505, mistral-nemo
- Legacy: mixtral-8x7b-instruct, mixtral-8x22b-instruct
xAI Grok
- Grok 3: grok-3-beta, grok-3-mini-beta
- Grok 2: grok-2-beta, grok-2-mini-beta
- Grok 1: grok-beta
Configuration
The tool stores configuration in your user directory. Available settings:
aiProvider: AI service to use ('openai', 'gemini', 'anthropic', 'groq', 'deepseek', 'cerebras', 'mistral', 'xai')openaiApiKey: Your OpenAI API keygeminiApiKey: Your Google Gemini API keyanthropicApiKey: Your Anthropic API keygroqApiKey: Your Groq API keydeepseekApiKey: Your DeepSeek API keycerebrasApiKey: Your Cerebras API keymistralApiKey: Your Mistral API keyxaiApiKey: Your xAI API keyzoteroApiKey: Your Zotero API keyzoteroGroupId: Your Zotero group IDopenaiModel: OpenAI model to use (default: "gpt-4o")geminiModel: Gemini model to use (default: "gemini-2.5-flash")anthropicModel: Anthropic model to usegroqModel: Groq model to usedeepseekModel: DeepSeek model to usecerebrasModel: Cerebras model to usemistralModel: Mistral model to usexaiModel: xAI model to usetemperature: Generation temperature (default: 0.7, range: 0-1)replacePattern: Regex pattern to match abstracts that should be replaced
Configuration management:
# Set a configuration value
make-abstract config set <key> <value>
# Examples of setting replace patterns:
make-abstract config set replacePattern "/^AI-generated:/" # Matches abstracts starting with "AI-generated:"
make-abstract config set replacePattern "[AUTO]" # Matches abstracts containing exactly "[AUTO]"
make-abstract config set replacePattern ".*" # Matches any abstract (allows replacement of all)
# View current configuration (API keys will be masked)
make-abstract config list
# Delete configuration
make-abstract config delete <key>Command Reference
make-abstract [options] <inputs...>
Arguments:
inputs Zotero item keys/select links OR PDF/TXT file paths
Options:
--dest <destination> Output destination: abstract, note, print, file (default: "abstract")
--prompt <file> Path to text file with custom prompt or screening questions
--out <filename> Output filename when using --dest file (optional)
--mode <mode> Output mode: text, json, scan (default: "text")
--cost <filename> Save token usage and cost analysis to JSON file
--tagprefix <prefix> Prefix for screening tags (default: "_a:")
--save-json Save JSON result for each Zotero item (json mode only)
--save-scan Save scan result to file named result_{ORIGINAL FILE NAME}.txt (scan mode only)
--save-prefix <prefix> Prefix for saved files from --save-json and --save-scan options (default: "result-")
--keysfile <file> Path to text file containing Zotero keys (one per line)
-h, --help Display help for command
Commands:
models [provider] List and select models for AI providers
config Manage configurationOCR Text Correction
The tool includes intelligent OCR text processing that comprehensively corrects errors to make text as legible as possible:
What it fixes:
- Merged words: "researchersconducted" → "researchers conducted"
- Character substitutions: "efective" → "effective", "recieve" → "receive"
- Common OCR errors: "rn" → "m", "cl" → "d", "vv" → "w"
- Spelling mistakes: "introasting" → "interesting"
- Missing spaces and punctuation where context is clear
- Grammar errors caused by OCR corruption
Examples of Comprehensive Correction:
Input: "The researchersconducted anexperiment tomeasure theefectiveness ofthe treatrnent."
Output: "The researchers conducted an experiment to measure the effectiveness of the treatment."
Input: "Participantswere randornlyassigned tocontrol andtreatrnent groupsfor cornparison."
Output: "Participants were randomly assigned to control and treatment groups for comparison."
Input: "The finclings suggest that this approach is very efective for introasting results."
Output: "The findings suggest that this approach is very effective for interesting results."Applies to all modes:
- Text mode: Abstracts generated from fully corrected text
- JSON mode: Screening questions answered using corrected text
- Scan mode: PDF transcription with comprehensive error correction
File Processing Notes
- PDF/TXT files are processed directly without Zotero integration
- Text extraction uses pdf-parse library for PDF files in regular text mode
- TXT files are read directly as plain text
- Scan mode uses AI vision for direct PDF processing with OCR enhancement (PDF only)
- Only
--dest printand--dest fileare supported for PDF/TXT files - No tag management for PDF/TXT files (tags are Zotero-specific)
Error Handling
The tool continues processing if individual items fail:
- Invalid Zotero keys are skipped with error messages
- PDF/TXT files that can't be read are skipped
- Failed AI requests are logged but don't stop batch processing
- Configuration errors halt execution with helpful messages
Development
- Clone the repository
- Install dependencies:
npm install - Run in development mode:
npm run dev - Build the project:
npm run build
License
MIT
