@rendomnet/file-tool-kit
v1.0.8
Published
A cross-platform file utility toolkit for React Native and Web
Readme
file-tool-kit
A cross-platform TypeScript library for unified, high-level file manipulation and text extraction across Web (browser) and React Native environments.
Features
- Extract text from PDF, DOCX, XLSX, CSV, PPTX, TXT, and JSON (PDF extraction is not supported in React Native)
- Consistent API for Web and React Native
- Easy to extend for new environments and file types
- Robust file type detection (magic numbers, extensions)
- Clear error for unsupported formats (e.g., legacy PPT/DOC/XLS)
Supported File Types
- PDF (
.pdf) (Web only) - Word (
.docx,.doc[only for detection, not extraction]) - Excel (
.xlsx,.xls[only for detection, not extraction]) - PowerPoint (
.pptx) - CSV (
.csv) - Plain text (
.txt) - JSON (
.json)
Note: Legacy Office formats (
.doc,.xls,.ppt) are detected but not supported for extraction. You will get a clear error if you try to extract from them.
Usage Example
Web (Browser)
// Import the web utility and the PDF.js worker URL
import { FilesUtilWeb } from 'file-tool-kit/web';
// @ts-ignore
import workerSrc from 'pdfjs-dist/build/pdf.worker.mjs?url';
const filesUtil = new FilesUtilWeb(workerSrc);
// Extract text from a remote file URL
async function extractTextFromUrl(url: string) {
try {
const text = await filesUtil.urlToText(url);
console.log('Extracted text:', text);
} catch (err) {
console.error('Extraction error:', err);
}
}
// Or get the serialized file object from a URL
async function getSerializedFromUrl(url: string) {
try {
const serialized = await filesUtil.urlToSerializedData(url);
// You can now use serialized with other methods
} catch (err) {
console.error('Serialization error:', err);
}
}
// Example: Extract text from a file input
async function handleFileInput(event: Event) {
const input = event.target as HTMLInputElement;
if (!input.files || input.files.length === 0) return;
const file = input.files[0];
try {
// Serialize the file (to base64, etc.)
const serialized = await filesUtil.serializeFile(file);
// Extract text
const text = await filesUtil.serializedToText(serialized);
console.log('Extracted text:', text);
} catch (err) {
console.error('Extraction error:', err);
}
}React Native
import { FilesUtilRN } from 'file-tool-kit/rn';
const filesUtil = new FilesUtilRN();
// Extract text from a remote file URL (requires network permissions)
async function extractTextFromUrl(url: string) {
try {
const text = await filesUtil.urlToText(url);
console.log('Extracted text:', text);
} catch (err) {
console.error('Extraction error:', err);
}
}
// Or get the serialized file object from a URL
async function getSerializedFromUrl(url: string) {
try {
const serialized = await filesUtil.urlToSerializedData(url);
// You can now use serialized with other methods
} catch (err) {
console.error('Serialization error:', err);
}
}
// Example: Extract text from a file (using a file picker and base64)
async function extractTextFromFile(base64: string, fileType: string, fileName: string) {
try {
// Create a serialized file object
const serialized = {
cls: 'File',
name: fileName,
type: fileType,
lastModified: Date.now(),
value: base64,
};
// Extract text
const text = await filesUtil.serializedToText(serialized);
console.log('Extracted text:', text);
} catch (err) {
console.error('Extraction error:', err);
}
}Note: In React Native, you may need to polyfill
fetchor use a library likereact-native-fetch-blobfor binary downloads, depending on your environment.
React Native Support & PDF Extraction
- All formats except PDF are supported in React Native (DOCX, XLSX, PPTX, CSV, TXT, JSON).
- PDF extraction is not supported in React Native.
- For PDF extraction in React Native, use a server-side/cloud solution:
- Upload the PDF to your backend.
- Extract text using a Node.js or Python library (e.g.,
pdf-parse,PyPDF2). - Return the extracted text to your app.
- The library will throw a clear error if you attempt PDF extraction in React Native.
Internal Development (Path Aliases)
For contributors, path aliases are used for clean imports:
import { FILE_SIGNATURES } from '@shared/files.shared';
import { SerializedFile } from '@types/files.types';Aliases are configured in tsconfig.json and supported in Vite and ts-node via plugins.
API
urlToText(url: string): Promise<string>(Web only)serializedToText(serializedData: SerializedData): Promise<string>urlToSerializedData(url: string): Promise<SerializedData>- ...
Installation
pnpm add file-tool-kitLicense
MIT
PDF.js Worker Setup (Web)
When using PDF extraction in the browser, you must provide the path to the PDF.js worker script. For modern bundlers like Vite or webpack, use the ?url import to get the correct worker URL:
// Vite/webpack example:
// @ts-ignore
import workerSrc from 'pdfjs-dist/build/pdf.worker.mjs?url';
import { FilesUtilWeb } from 'file-tool-kit/web';
const filesUtil = new FilesUtilWeb(workerSrc); // workerSrc can be a string or { url: string }- The library supports both a string URL or an object with a
urlproperty (as some bundlers may provide). - If you are not using a bundler, you can provide a direct path to the worker script as a string.
Running Integration (E2E) Tests
To start the web example server:
pnpm run server:webTo run the browser-based integration test (Puppeteer):
pnpm run test:e2e:webThis will launch a headless browser, run extraction for all supported file types, and assert expected output (e.g., 'lorem ipsum' in DOCX and PPTX).
To run both the web server and the E2E test in a single command:
pnpm run test:e2e:web:fullThis will start the dev server, wait for it to be ready, run the E2E test, and shut down the server automatically.
Troubleshooting Path Aliases
- Vite: Path aliases are supported via the
vite-tsconfig-pathsplugin (seeexamples/web/vite.config.ts). - ts-node: Path aliases are supported via
tsconfig-paths/register(see E2E test scripts). - If you see errors about unresolved imports like
@shared/files.shared, ensure you are using the correct scripts and have all dev dependencies installed.
Error Handling for Unsupported Formats
If you try to extract from an unsupported file type (e.g., legacy .ppt, .doc, .xls), you will get a clear error message:
Error: Unsupported file type: ppt. Supported types are: pdf, doc, docx, xls, xlsx, txt, json, csv, pptxNo external PPTX parser is used; all extraction is handled internally using JSZip and XML parsing.
