@markwharton/flatten
v1.5.0
Published
Simple codebase flattening for LLMs
Maintainers
Readme
@markwharton/flatten
Simple, intelligent codebase flattening for LLMs. Creates a structured YAML representation of your repository and a single text file containing all your code - perfect for providing context to AI tools.
Installation
# Install globally
npm install -g @markwharton/flatten
# Or use directly with npx
npx @markwharton/flattenUsage
Basic Flattening
# Basic usage - creates timestamped output files
flatten
# Process specific directories
flatten src/ lib/
# Include specific files (even if normally ignored)
flatten src/ README.md package.json
# Exclude patterns
flatten --exclude "*.png,*.jpg,infra/sql"
# Structure only (no file contents)
flatten --no-contents
# Allow non-git directories
flatten --no-git
# Override file limit
flatten --max-files 5000
# Custom output file (single file, format from extension)
flatten --output context.txt
flatten -o structure.yaml
# Show version
flatten --version
# Show help
flatten --helpCleanup Old Files
# Show what would be removed (dry run)
flatten cleanup --dry-run
# Remove all flatten output files
flatten cleanup
# Remove files older than 7 days
flatten cleanup --days 7
# Get help on cleanup options
flatten cleanup --helpOutput Files
By default, flatten creates two files with smart, timestamped naming:
flatten-YYYYMMDD-HHMMSS-<git-hash>-<status>.yaml- YAML structure of your repositoryflatten-YYYYMMDD-HHMMSS-<git-hash>-<status>.txt- Flattened content of all files
Where:
YYYYMMDD-HHMMSSis the timestamp when flatten was run<git-hash>is the short git commit hash (7 characters)<status>is either 'clean' or 'dirty' based on git status
For non-git directories (with --no-git):
flatten-YYYYMMDD-HHMMSS.yaml- YAML structureflatten-YYYYMMDD-HHMMSS.txt- Flattened content
Custom Output
Use --output (or -o) to specify a custom filename and create a single file:
# Create a single text file with flattened content
flatten --output context.txt
# Create a single YAML file with structure only
flatten -o structure.yamlThe format is inferred from the extension:
.yamlor.yml→ YAML structure- Any other extension → Flattened text content
Parent directories are created automatically if needed:
flatten --output output/dir/context.txtExample Output
In a git repository:
flatten-20250610-121900-abc1234-clean.yaml # Repository structure
flatten-20250610-121900-abc1234-clean.txt # Flattened contentAfter making changes:
flatten-20250610-143200-abc1234-dirty.yaml # Shows uncommitted changes exist
flatten-20250610-143200-abc1234-dirty.txt # Includes uncommitted changesWith exclusions:
$ flatten --exclude "*.png,infra/sql"
✓ Created flatten-20250610-151230-abc1234-clean.yaml
✓ Created flatten-20250610-151230-abc1234-clean.txt (487.2 KB)
✓ Excluded 23 files matching patterns
✓ Omitted 3 binary files from content
Largest files:
src/runtime/services/database/mssql-database.ts (31.4 KB)
tools/generators/schema-generator.ts (28.7 KB)
docs/architecture/component-model.md (24.3 KB)
src/runtime/components/extractors/html-extractor.ts (19.8 KB)
tests/integration/full-pipeline.test.ts (17.2 KB)Ignoring Flatten Output Files
To prevent flatten output files from being committed to your repository, add these patterns to your .gitignore:
# Flatten output files
flatten-*.txt
flatten-*.yamlThis will ignore all flatten-generated files while keeping your repository clean.
Managing Output Files
The new timestamped naming makes it easy to manage multiple flatten outputs:
# List all flatten outputs (sorted by timestamp)
$ ls -l flatten-*.{yaml,txt}
-rw-r--r-- 1 user staff 869 Jun 8 12:29 flatten-20250608-122900-9e228ad-clean.yaml
-rw-r--r-- 1 user staff 52459 Jun 8 12:29 flatten-20250608-122900-9e228ad-clean.txt
-rw-r--r-- 1 user staff 318 Jun 8 14:54 flatten-20250608-145400-74706ff-dirty.yaml
-rw-r--r-- 1 user staff 15592 Jun 8 14:54 flatten-20250608-145400-74706ff-dirty.txt
-rw-r--r-- 1 user staff 869 Jun 10 12:19 flatten-20250610-121900-74706ff-clean.yaml
-rw-r--r-- 1 user staff 53297 Jun 10 12:19 flatten-20250610-121900-74706ff-clean.txt
# Clean up old files
$ flatten cleanup --dry-run
Would remove 4 files:
- flatten-20250608-122900-9e228ad-clean.yaml (2 days old)
- flatten-20250608-122900-9e228ad-clean.txt (2 days old)
- flatten-20250608-145400-74706ff-dirty.yaml (2 days old)
- flatten-20250608-145400-74706ff-dirty.txt (2 days old)
$ flatten cleanup --days 1
✓ Removed 4 files older than 1 daysWhat Gets Included?
Flatten includes all files except:
- Git ignored files (respects
.gitignore) - Lock files that don't add value for LLMs:
package-lock.json,yarn.lock,pnpm-lock.yamlcomposer.lock,Gemfile.lock,poetry.lock,Cargo.lock
- Minified files:
*.min.js,*.min.css - Flatten output files:
flatten-*.txt,flatten-*.yaml - Files matching your exclude patterns (when using
--exclude)
Binary File Handling
Binary files (images, PDFs, executables, etc.) are intelligently detected using a two-stage approach:
- Stage 1: Fast classification by file extension (99% of files)
- Stage 2: Content analysis for unknown file types (null byte detection, printability ratio)
- Included in the YAML structure (so you can see what files exist)
- Omitted from the text content (replaced with
[Binary file omitted]) - Performance optimized: Large files (>1MB) are classified as binary without content reading
This keeps your output focused on readable code while maintaining awareness of all project files.
Performance Features
- Fast binary detection: 99% of files classified instantly by extension
- Smart content analysis: Only unknown file types get content inspection
- Large file optimization: Files >1MB assumed binary without reading
- Efficient directory scanning: Respects ignore patterns early in the process
Exclude Patterns
The --exclude option accepts comma-separated patterns following gitignore-style rules:
# Exclude by extension
flatten --exclude "*.png,*.jpg"
# Exclude specific directories
flatten --exclude "dist,build"
# Exclude directory at any level
flatten --exclude "node_modules"
# Exclude from root only
flatten --exclude "/temp"
# Complex patterns
flatten --exclude "*.log,temp/,/cache,docs/*.pdf"Smart Git Integration
When run inside a git repository, flatten:
- Automatically switches to the repository root
- Preserves your intended paths (relative to where you run it)
- Includes the commit hash in output filenames
- Detects uncommitted changes (dirty state)
# Example: Running from a subdirectory
cd ~/projects/MyApp/frontend
flatten src/
# Output:
# Switches to git root: ~/projects/MyApp
# Processes: frontend/src/
# Creates: flatten-20250610-152000-abc1234-clean.yaml and .txtSafety Features
- File count limit: Default 1000 files (override with
--max-files) - Git repository boundary: Won't accidentally flatten your entire home directory
- Clear error messages: Tells you exactly what to do when limits are hit
- File size insights: Shows largest files when output exceeds 250KB
Use Cases
- LLM Context: Provide complete codebase context to ChatGPT, Claude, or other AI assistants
- Code Reviews: Share entire project structure and content easily
- Documentation: Generate file inventories and snapshots
- Archiving: Create readable snapshots tied to specific commits
- Version Tracking: Timestamped outputs make it easy to track changes over time
Example Session
$ cd my-awesome-project
$ flatten
✓ Created flatten-20250610-152000-abc1234-clean.yaml
✓ Created flatten-20250610-152000-abc1234-clean.txt (2.3 MB)
$ # Make some changes...
$ echo "TODO: fix this" >> src/index.js
$ flatten
✓ Created flatten-20250610-153045-abc1234-dirty.yaml
✓ Created flatten-20250610-153045-abc1234-dirty.txt (2.3 MB)
$ # Exclude large SQL files
$ flatten --exclude "*.sql"
✓ Created flatten-20250610-154500-abc1234-dirty.yaml
✓ Created flatten-20250610-154500-abc1234-dirty.txt (1.8 MB)
✓ Excluded 15 files matching patterns
✓ Omitted 2 binary files from content
Largest files:
src/services/data-processor.ts (89.2 KB)
docs/images/architecture.png (45.3 KB) [binary]
src/utils/validators.ts (38.1 KB)
tests/integration/full-suite.test.ts (35.2 KB)
src/models/complex-model.ts (31.7 KB)
$ # Check what we've created over time
$ ls -lt flatten-*.txt | head -5
-rw-r--r-- 1 user staff 1887232 Jun 10 15:45 flatten-20250610-154500-abc1234-dirty.txt
-rw-r--r-- 1 user staff 2412032 Jun 10 15:30 flatten-20250610-153045-abc1234-dirty.txt
-rw-r--r-- 1 user staff 2412001 Jun 10 15:20 flatten-20250610-152000-abc1234-clean.txt
$ # Clean up files older than 1 day
$ flatten cleanup --days 1
✓ Removed 6 files older than 1 days
$ # Too many files?
$ cd ~/large-monorepo
$ flatten
Error: Found 1,234 files (max: 1000)
Try: flatten --max-files 2000Output Format Examples
YAML Structure (flatten-20250610-152000-abc1234-clean.yaml)
codebase:
- path: .
type: directory
contents:
- path: src
type: directory
contents:
- path: src/index.js
type: file
- path: src/utils.js
type: file
- path: README.md
type: file
- path: package.json
type: fileFlattened Content (flatten-20250610-152000-abc1234-clean.txt)
<src/index.js>
import { helper } from './utils';
console.log('Hello world');
console.log(helper());
</src/index.js>
<src/utils.js>
export function helper() {
return 'helping!';
}
</src/utils.js>
<README.md>
# My Project
This is my awesome project.
</README.md>
<package.json>
{
"name": "my-project",
"version": "1.0.0"
}
</package.json>Philosophy
This tool follows a minimalist philosophy:
- Simple - No configuration files or complex options
- Smart - Intelligent defaults that just work
- Safe - Built-in protections against common mistakes
- Useful - Solves real problems for developers working with LLMs
Contributing
See CONTRIBUTING.md for guidelines.
License
MIT - see LICENSE for details.
Acknowledgments
This project was inspired by the need to easily provide codebase context to LLMs. The original version was a simple bash script that evolved into this more robust npm package.
