npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, πŸ‘‹, I’m Ryan HefnerΒ  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you πŸ™

Β© 2026 – Pkg Stats / Ryan Hefner

content-grabber-alvamind

v1.0.1

Published

A Node.js library to extract text content from various file types. πŸ’ͺ

Downloads

3

Readme

πŸ—‚οΈ content-grabber-alvamind

NPM Version Build Status License

A Node.js library to extract text content from various file types. πŸ’ͺ

✨ Features

  • Versatile File Support: Extracts text from .txt, .pdf, .docx, .csv, and .xlsx files. πŸ“„
  • Local & Remote Files: Works with both local file paths and URLs. 🌐
  • Intelligent Content Type Handling: Automatically detects content types from headers and file extensions. πŸ€”
  • PDF Text Extraction: Extracts text from PDF files, with optional OCR support. 🧐
  • Configurable OCR: Control OCR behavior (scale, languages). βš™οΈ
  • Customizable Logging: Supports custom logger for info, error and debug messages. πŸͺ΅
  • Error Handling: Provides descriptive error messages. ⚠️
  • Easy to Use: Simple API for quick integration into your projects. πŸš€

🎯 Benefits

  • Simplify Data Extraction: Quickly grab text from different file types. ⏱️
  • Save Time: No need to handle file formats manually. ⏳
  • Improve Productivity: Focus on processing text rather than parsing files. πŸ“ˆ
  • Reliable: Robust and well-tested. βœ…

πŸ“¦ Installation

npm install content-grabber-alvamind

πŸ› οΈ Usage

Basic Example

import { fetchFileContent } from 'content-grabber-alvamind';

async function main() {
  try {
    const fileUrl = 'path/to/your/document.pdf'; // Replace with your file URL/path
    const extractedContent = await fetchFileContent(fileUrl);
    console.log(extractedContent);
  } catch (error) {
    console.error('Error:', error);
  }
}

main();

PDF with OCR

import { fetchFileContent } from 'content-grabber-alvamind';

async function main() {
    try {
      const fileUrl = 'path/to/your/scanned_document.pdf';
      const extractedContent = await fetchFileContent(fileUrl, {
          pdfOptions: {
              ocrEnabled: true, // Enable OCR for scanned PDFs
              languages: ['eng', 'spa'], // Specify OCR languages
              scale: 2.5 // increase scale for better OCR quality
          }
      });
      console.log(extractedContent);
  } catch (error) {
      console.error("Error:", error);
  }
}

main();

Custom Logger Example

import { fetchFileContent, FileContentExtractionOptions } from 'content-grabber-alvamind';

class CustomLogger {
  info(message: string, ...args: any[]): void {
    console.log(`[CUSTOM INFO] ${message}`, ...args);
  }

  error(message: string, ...args: any[]): void {
    console.error(`[CUSTOM ERROR] ${message}`, ...args);
  }

  debug(message: string, ...args: any[]): void {
     console.debug(`[CUSTOM DEBUG] ${message}`, ...args);
  }
}


async function main() {
  try {
    const fileUrl = 'path/to/your/document.txt';
      const options: FileContentExtractionOptions = {
          logger: new CustomLogger()
      }
    const extractedContent = await fetchFileContent(fileUrl, options);
    console.log(extractedContent);
  } catch (error) {
    console.error('Error:', error);
  }
}

main();

DOCX Extraction

import { fetchFileContent } from 'content-grabber-alvamind';

async function main() {
  try {
    const fileUrl = 'path/to/your/document.docx'; // Replace with your file URL/path
    const extractedContent = await fetchFileContent(fileUrl);
    console.log(extractedContent);
  } catch (error) {
    console.error('Error:', error);
  }
}

main();

CSV Extraction

import { fetchFileContent } from 'content-grabber-alvamind';

async function main() {
  try {
    const fileUrl = 'path/to/your/data.csv'; // Replace with your file URL/path
    const extractedContent = await fetchFileContent(fileUrl);
    console.log(extractedContent);
  } catch (error) {
    console.error('Error:', error);
  }
}

main();

Excel Extraction

import { fetchFileContent } from 'content-grabber-alvamind';

async function main() {
  try {
    const fileUrl = 'path/to/your/data.xlsx'; // Replace with your file URL/path
    const extractedContent = await fetchFileContent(fileUrl);
    console.log(extractedContent);
  } catch (error) {
    console.error('Error:', error);
  }
}

main();

API

fetchFileContent(fileUrl: string, options?: FileContentExtractionOptions): Promise<string>

  • fileUrl (string): The URL or local file path of the document to extract text from.
  • options (object, optional): An object containing optional configurations:
    • pdfOptions (object, optional): Configuration for PDF extraction:
      • ocrEnabled (boolean, optional): Enable OCR extraction. Default true.
      • scale (number, optional): Scale factor for OCR image. Default 2.0.
      • languages (string[], optional): Array of OCR languages (e.g., ['eng', 'spa']). Default ['eng'].
      • minTextLength (number, optional): Minimum length of normal text to consider using OCR. Default 50.
    • logger: Custom logger object that implements info, error and debug methods
  • Returns: A Promise that resolves with the extracted text content or throws an error.

πŸ›£οΈ Roadmap

  • [ ] Support for more file types (e.g., .odt, .rtf).
  • [ ] Improved OCR accuracy and performance.
  • [ ] Configurable text extraction strategies.
  • [ ] Add unit tests.
  • [ ] More advanced logging options.

🀝 Contributing

Contributions are welcome! Feel free to submit issues, feature requests, and pull requests on GitHub. πŸ™

Here’s how you can help:

  • Report bugs. πŸ›
  • Suggest new features. πŸ’‘
  • Improve documentation. ✍️
  • Submit code changes. πŸ’»

πŸ’– Support the Project

If you find this project useful, consider supporting its development! You can contribute through:

  • GitHub Sponsors: ⭐️ [Link to GitHub Sponsors]
  • Donations: πŸ’° [Link to Donation Platform]

Your support keeps this project going! πŸ™Œ

πŸ“„ License

MIT