f2md

v1.1.1

Published

8 days ago

Convert PDF and DOCX files to Markdown using AI

0High
0Medium
0Low

saikrishnaambeti

pdf docx markdown converter ai cli

f2md

Convert PDF, DOCX, and image files to Markdown using AI. This CLI tool extracts text, images, and preserves table structure while converting documents to clean, well-formatted Markdown. It also supports OCR text extraction from images.

Features

PDF Support - Full text extraction, image extraction, and page screenshots for layout understanding
DOCX Support - Text and image extraction with structure preservation
Image OCR - Extract text from images (PNG, JPG, JPEG, GIF, WEBP) using AI-powered OCR
AI-Powered Conversion - Uses Google's Gemini AI to intelligently convert content to Markdown
Interactive CLI - Friendly prompts using clack.js
Easy Setup - Built-in configuration wizard for API keys

Installation

Using npx (no installation required)

npx f2md document.pdf

Using bunx

bunx f2md document.pdf

Using pnpm dlx

pnpm dlx f2md document.pdf

Global installation

npm install -g f2md
# or
bun install -g f2md

Setup

Before using the tool, you need to configure your Google AI API key.

Run the setup wizard

f2md setup
# or with npx
npx f2md setup

The setup wizard will:

Show you where to get a Google AI API key (https://aistudio.google.com/apikey)
Prompt you to enter your API key
Ask where to save it (local project or global for all projects)

Manual setup

Alternatively, set the environment variable:

export GOOGLE_GENERATIVE_AI_API_KEY="your-api-key-here"

Or create a .env file in your project:

GOOGLE_GENERATIVE_AI_API_KEY=your-api-key-here

Usage

Interactive Mode

f2md

The tool will prompt you for:

Input file path (PDF, DOCX, or image)
Output file path

CLI Mode

# Convert with auto-generated output name
f2md document.pdf

# Convert with custom output path
f2md document.pdf output.md

# Extract text from an image (OCR)
f2md screenshot.png

# Extract text from image with custom output
f2md image.jpg output.md

Supported File Types

PDF (.pdf)
Word Documents (.docx)
Images (.png, .jpg, .jpeg, .gif, .webp) - OCR text extraction

Options

f2md --help     # Show help
f2md --version  # Show version
f2md setup      # Configure API key

How It Works

For PDF and DOCX files:

Extraction - Reads the input file and extracts text, images, and layout information
Processing - For PDFs, captures page screenshots to understand visual layout
AI Conversion - Sends extracted content to Google's Gemini AI model
Markdown Generation - Receives AI-generated Markdown with proper formatting
Cleanup - Removes unused images and saves the final output

For Image files:

Image Processing - Reads the image file and encodes it for AI processing
OCR Analysis - Sends the image to Google's Gemini AI with specialized prompts for text extraction
Text Extraction - AI extracts all visible text while preserving structure (headings, lists, tables)
Markdown Generation - Converts extracted content to well-formatted Markdown
Output - Saves the final Markdown file

Development

Prerequisites

Bun installed

Setup

# Clone the repository
git clone <repo-url>
cd f2md

# Install dependencies
bun install

# Run in development mode
bun run dev

Build

bun run build

Project Structure

src/
  cli.ts      - CLI entry point with clack prompts
  convert.ts  - Core conversion logic
  index.ts    - Public API exports
dist/         - Built output (generated)

API Usage

You can also use this as a library in your Node.js/Bun projects:

import { convert } from "f2md";

const result = await convert("input.pdf", "output.md", {
  onProgress: (message) => console.log(message),
  respectPages: false,
});

console.log(`Saved to: ${result.outputPath}`);
console.log(`Images saved: ${result.imagesSaved}`);
console.log(`Images cleaned: ${result.imagesDeleted}`);

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

f2md

Features

Installation

Using npx (no installation required)

Using bunx

Using pnpm dlx

Global installation

Setup

Run the setup wizard

Manual setup

Usage

Interactive Mode

CLI Mode

Supported File Types

Options

How It Works

For PDF and DOCX files:

For Image files:

Development

Prerequisites

Setup

Build

Project Structure

API Usage

License