@sebastianappelberg/data-scraper
v0.0.8
Published
A simple library for scraping web data and extracting structured information using LLMs.
Readme
data-scraper
A simple library for scraping web data and extracting structured information using LLMs.
Usage
Scraper
The Scraper class allows you to navigate web pages and extract HTML content.
import { Scraper } from './src/index';
async function runScrape() {
const scraper = new Scraper({ headless: true, debug: false });
try {
const htmlContent = await scraper.scrape({
url: 'https://example.com',
navigation: async (page) => {
// Optional: Perform actions like clicking buttons or filling forms
// await page.click('#myButton');
},
getContent: () => {
// Optional: Use document.querySelector or document.querySelectorAll to get specific content
return document.querySelector('#main-content')?.innerHTML || '';
},
});
console.log(htmlContent);
const markdownContent = scraper.convertToMarkdown(htmlContent);
console.log(markdownContent);
} finally {
await scraper.close();
}
}
runScrape();Data Extraction
The extractData function uses AI to parse and extract structured data from text.
import { extractData } from './src/index';
async function runExtraction() {
const text = `Product: Laptop, Price: $1200, Brand: ExampleTech`;
const jsonData = await extractData(text, {
dataStructure: 'JSON',
prompt: 'Extract product name, price, and brand.',
});
console.log(jsonData);
}
runExtraction();Installation
npm install data-scraperLicense
This project is licensed under the MIT License. See the LICENSE file for details.
