@cityssm/file-to-text
v0.0.4
Published
Converts most common file types into clean text or Markdown
Downloads
608
Maintainers
Readme
Node File to Text
Converts most common file types into clean text or Markdown.
Installation
Include all available parsers.
npm install @cityssm/file-to-textBase installation. Parsers can be installed as needed.
npm install @cityssm/file-to-text --no-optionalAvailable Parsers
⭐ Only plain text parsing is available out-of-the-box. ⭐
All parsers are considered optional dependencies so you can choose which parsers to include.
Office Files
Support for Office files (i.e. docx, pdf, pptx) relies on
officeparser.
Image Files
Support for image files (i.e. jpg, png, gif) relies on
tesseract.js.
Audio Files
Support for audio file transcriptions (i.e. wav, mp3) relies on
@cityssm/whisper-speech-to-text.
See the package prerequisites, which include Python, FFmpeg, and OpenAI Whisper.
Usage
Options are coming!
import fileToText from '@cityssm/file-to-text'
const text1 = await fileToText('path/to/file.txt')
// Use OpenAI Whisper to convert speech to text locally.
const text2 = await fileToText('path/to/voicemail.wav')
// Use the Tesseract OCR engine to convert images to text.
const text3 = await fileToText('path/to/scannedDocument.png')