llm-search
v1.0.5
Published
[DEPRECATED] A Node.js module for searching and scraping web content, designed for LLMs but useful for any project where webscraping is needed!
Maintainers
Readme
[DEPRECATED] llm-search 🔍
⚠️ DEPRECATION NOTICE: This package has been deprecated in favor of llm-kit. Please install
llm-kitinstead:npm install llm-kitThe new package provides all the same functionality with improved features and ongoing maintenance. Please update your dependencies accordingly.
A Node.js module for searching and scraping web content, designed for LLMs but useful for everyone!
Features
- Search multiple engines (Google, DuckDuckGo)
- Wikipedia search and content extraction
- HackerNews scraping
- Webpage content extraction
- Document parsing (PDF, DOCX, CSV)
- Image OCR/text extraction support
- No API keys required at all
- Automatic fallbacks
- TypeScript & Node support
Installation
npm install llm-search
# Optional: Install OCR language data for non-English languages
npm install tesseract.js-dataQuick Start
import { search, parse } from "llm-search";
// Web Search
const results = await search("typescript tutorial");
console.log(results);
// Parse Documents
const pdfResult = await parse("document.pdf");
console.log(pdfResult.text);
const csvResult = await parse("data.csv", {
csv: { columns: true },
});
console.log(csvResult.data);
// OCR Images
const imageResult = await parse("image.png", {
language: "eng",
});
console.log(imageResult.text);Supported File Types
Documents
- PDF files (
.pdf) - Word documents (
.docx) - CSV files (
.csv)
Images (OCR)
- PNG (
.png) - JPEG (
.jpg,.jpeg) - BMP (
.bmp) - GIF (
.gif)
Documentation
See the docs directory for detailed documentation:
- Search - Web search capabilities
- Wikipedia - Wikipedia integration
- HackerNews - HackerNews API
- Webpage - Web content extraction
- Parser - Document and image parsing
Example Usage
Web Search
import { search } from "llm-search";
const results = await search("typescript tutorial");
console.log(results);Document Parsing
import { parse } from "llm-search";
// Parse PDF
const pdfResult = await parse("document.pdf");
console.log(pdfResult.text);
// Parse CSV with options
const csvResult = await parse("data.csv", {
csv: {
delimiter: ";",
columns: true,
},
});
console.log(csvResult.data);
// OCR Image
const imageResult = await parse("image.png", {
language: "eng", // supports multiple languages
});
console.log(imageResult.text);Error Handling
try {
const result = await parse("document.pdf");
console.log(result.text);
} catch (error) {
if (error.code === "PDF_PARSE_ERROR") {
console.error("PDF parsing failed:", error.message);
}
// Handle other errors
}Dependencies
This package uses these great libraries:
- @mozilla/readability - Web content extraction
- csv-parse - CSV parsing
- duck-duck-scrape - DuckDuckGo search
- fast-xml-parser - XML parsing
- google-sr - Google search
- jsdom - DOM emulation for web scraping
- mammoth - DOCX parsing
- pdf-parse - PDF parsing
- puppeteer - Headless browser automation
- tesseract.js - OCR
- wikipedia - Wikipedia API
License
MIT
Contributing 
Contributions VERY welcome!! Please read the contributing guidelines first.

