npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

expo-pdf-text-extract

v1.0.0

Published

Native PDF text extraction for React Native and Expo. Extract text content from PDF files using platform-native APIs (PDFKit on iOS, PDFBox on Android). Works with Expo development builds.

Readme

expo-pdf-text-extract

Native PDF text extraction for React Native and Expo. Extract text content from PDF files using platform-native APIs - no OCR needed for digital PDFs.

npm version license platforms

Features

  • Native Performance - Uses PDFKit (iOS) and PDFBox (Android) for fast, reliable extraction
  • No OCR Required - Extracts embedded text directly from digital PDFs
  • Expo Compatible - Works with Expo development builds (SDK 49+)
  • TypeScript Support - Full type definitions included
  • Simple API - Just one function to extract text
  • Page-level Control - Extract from specific pages or get page count
  • Multiple Path Formats - Supports file://, content://, and absolute paths

When to Use This

| Scenario | This Package | Alternative | |----------|-------------|-------------| | Digital PDFs (from email, downloads) | Yes | - | | Scanned PDFs (images of paper) | No | Use OCR library | | Need text content only | Yes | - | | Need to render/view PDF | No | Use react-native-pdf | | Expo Go | No | Requires dev build |

Requirements

  • Expo SDK: 49.0.0 or higher
  • React Native: 0.72.0 or higher
  • iOS: 15.1 or higher
  • Android: API 21 (Lollipop) or higher

Important: This package requires an Expo development build. It will not work in Expo Go.

Installation

Using Expo

npx expo install expo-pdf-text-extract

Using npm/yarn

npm install expo-pdf-text-extract
# or
yarn add expo-pdf-text-extract

Create Development Build

Since this is a native module, you need to create a development build:

# For iOS
npx expo run:ios

# For Android
npx expo run:android

# Or create a development build
eas build --profile development --platform all

Quick Start

import { extractText, isAvailable } from 'expo-pdf-text-extract';

// Check if native module is available
if (isAvailable()) {
  // Extract text from a PDF file
  const text = await extractText('/path/to/document.pdf');
  console.log(text);
}

API Reference

isAvailable()

Check if the native PDF extractor is available.

function isAvailable(): boolean

Returns false when:

  • Running in Expo Go
  • Native module failed to load
  • Platform not supported

Example:

import { isAvailable } from 'expo-pdf-text-extract';

if (isAvailable()) {
  // Show PDF upload option
} else {
  // Show message: "PDF extraction requires a development build"
}

extractText(filePath)

Extract all text from a PDF file.

function extractText(filePath: string): Promise<string>

Parameters:

  • filePath - Path to the PDF file. Supports:
    • file:///path/to/file.pdf - File URI
    • /absolute/path/to/file.pdf - Absolute path
    • content://... - Content URI (Android document picker)

Returns: Promise resolving to extracted text

Throws:

  • Error if native module not available
  • Error if file not found
  • Error if PDF is invalid or corrupted

Example:

import { extractText } from 'expo-pdf-text-extract';
import * as DocumentPicker from 'expo-document-picker';

// Pick a PDF file
const result = await DocumentPicker.getDocumentAsync({
  type: 'application/pdf',
});

if (!result.canceled) {
  const text = await extractText(result.assets[0].uri);
  console.log('Extracted text:', text);
}

getPageCount(filePath)

Get the number of pages in a PDF.

function getPageCount(filePath: string): Promise<number>

Example:

import { getPageCount } from 'expo-pdf-text-extract';

const pages = await getPageCount('/path/to/document.pdf');
console.log(`PDF has ${pages} pages`);

extractTextFromPage(filePath, pageNumber)

Extract text from a specific page.

function extractTextFromPage(filePath: string, pageNumber: number): Promise<string>

Parameters:

  • filePath - Path to the PDF file
  • pageNumber - Page number (1-indexed, first page is 1)

Example:

import { extractTextFromPage, getPageCount } from 'expo-pdf-text-extract';

// Extract text from first page only
const firstPageText = await extractTextFromPage('/path/to/document.pdf', 1);

// Extract text from each page separately
const pageCount = await getPageCount('/path/to/document.pdf');
for (let i = 1; i <= pageCount; i++) {
  const pageText = await extractTextFromPage('/path/to/document.pdf', i);
  console.log(`Page ${i}:`, pageText);
}

extractTextWithInfo(filePath)

Extract text with additional metadata.

function extractTextWithInfo(filePath: string): Promise<{
  text: string;
  pageCount: number;
  success: boolean;
  error?: string;
}>

Example:

import { extractTextWithInfo } from 'expo-pdf-text-extract';

const result = await extractTextWithInfo('/path/to/document.pdf');

if (result.success) {
  console.log(`Extracted ${result.text.length} characters from ${result.pageCount} pages`);
} else {
  console.error('Extraction failed:', result.error);
}

Usage with Document Picker

import { extractText, isAvailable } from 'expo-pdf-text-extract';
import * as DocumentPicker from 'expo-document-picker';

async function handlePdfUpload() {
  // Check if extraction is available
  if (!isAvailable()) {
    Alert.alert(
      'Not Available',
      'PDF extraction requires a development build. Please rebuild the app.'
    );
    return;
  }

  // Pick PDF file
  const result = await DocumentPicker.getDocumentAsync({
    type: 'application/pdf',
    copyToCacheDirectory: true,
  });

  if (result.canceled) {
    return;
  }

  try {
    // Extract text
    const text = await extractText(result.assets[0].uri);

    // Use the extracted text
    console.log('Extracted text:', text.substring(0, 500));

    // Parse the text, search for patterns, etc.
    const hasKeyword = text.includes('invoice');

  } catch (error) {
    Alert.alert('Error', `Failed to extract text: ${error.message}`);
  }
}

Error Handling

import { extractText, isAvailable } from 'expo-pdf-text-extract';

async function safeExtract(filePath: string): Promise<string | null> {
  // Check availability first
  if (!isAvailable()) {
    console.warn('PDF extraction not available');
    return null;
  }

  try {
    return await extractText(filePath);
  } catch (error) {
    if (error.message.includes('not found')) {
      console.error('File not found:', filePath);
    } else if (error.message.includes('PDF_LOAD_ERROR')) {
      console.error('Invalid or corrupted PDF');
    } else {
      console.error('Extraction failed:', error.message);
    }
    return null;
  }
}

Platform Differences

iOS (PDFKit)

  • Uses Apple's native PDFKit framework
  • Built into iOS, no additional dependencies
  • Excellent support for standard PDF formats
  • Minimum iOS version: 15.1

Android (PDFBox)

  • Uses Apache PDFBox (Android port)
  • Text is sorted by position on page for better readability
  • Handles compressed PDF streams (FlateDecode, etc.)
  • Minimum API level: 21

Troubleshooting

"PDF extraction is not available"

This error occurs when running in Expo Go. Solution:

# Create a development build
npx expo run:ios
# or
npx expo run:android

Empty text returned

If extractText() returns empty string:

  1. Scanned PDF - The PDF contains images, not text. Use OCR instead.
  2. Protected PDF - The PDF has copy protection. Text extraction may be blocked.
  3. Corrupted PDF - Try opening the PDF in another app to verify it's valid.

Slow extraction on large PDFs

For PDFs with many pages, consider:

  1. Extract page by page using extractTextFromPage()
  2. Show progress indicator to users
  3. Process in background using a worker

Performance

| PDF Size | Pages | Extraction Time (approx) | |----------|-------|--------------------------| | Small | 1-5 | < 100ms | | Medium | 10-50 | 100-500ms | | Large | 100+ | 500ms-2s |

Times measured on iPhone 13 and Pixel 6

Contributing

Contributions are welcome! Please read our contributing guidelines before submitting PRs.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

MIT License - see LICENSE for details.

Credits

Related Packages