fetchfox
v0.0.40
Published
AI based web scraping library
Readme
Getting started
Install the package and playwright:
npm i fetchfox
npx playwright install-deps
npx playwright installThen use it. Here is the callback style:
import { fox } from 'fetchfox';
const workflow = await fox
.init('https://pokemondb.net/pokedex/national')
.extract({ name: 'Pokemon name', number: 'Pokemon number' })
.limit(3)
.plan();
const results = await workflow
.run(null, (delta) => { console.log(delta.item) });
for (const item of results.items) {
console.log('Item:', item);
}If you prefer, you can use the streaming style:
import { fox } from 'fetchfox';
const stream = fox
.init('https://pokemondb.net/pokedex/national')
.extract({ name: 'Pokemon name', number: 'Pokemon number' })
.stream();
for await (const delta of stream) {
console.log(delta.item);
}Following URLs
You'll often want to scrape over multiple levels. You can do this using the url field. If you extract a url field, FetchFox will follow that URL on the next step.
For example, you can get HP and attack on the second page of the Pokedex:
const workflow = await fox
.init('https://pokemondb.net/pokedex/national')
.extract({
url: 'URL of pokemon profile',
name: 'Pokemon name',
number: 'Pokemon number'
})
.extract({
hp: 'Pokemon HP',
attack: 'Pokemon attack power',
})
.limit(3)
.plan();
const results = await workflow
.run(null, (delta) => { console.log(delta.item) });
for (const item of results.items) {
console.log('Item:', item);
}This scraper will start at https://pokemondb.net/pokedex/national, and then go to detail pages like https://pokemondb.net/pokedex/pikachu to get the HP and attack values.
Enter your API key
You'll need to give an API key for the AI provider you are using, such as OpenAI. There are a few ways to do this.
The easiest option is to set the OPENAI_API_KEY environment variable. This will get picked up by the FetchFox library, and all AI calls will go through that key. To use this option, run your code like this:
OPENAI_API_KEY=sk-your-key node index.jsAlternatively, you can pass in your API key in code, like this:
import { fox } from 'fetchfox';
const results = await fox
.config({ ai: ['openai:gpt-4o-mini', { apiKey: 'sk-your-key' }]})
.init('https://pokemondb.net/pokedex/national')
.extract({ name: 'Pokemon name', number: 'Pokemon number' })
.limit(3)
.run();This will use OpenAI's gpt-4o-mini model, and the API key you specify. You can also use OpenRouter to access AI models from other providers:
const results = await fox
.config({ ai: ['openrouter:google/gemini-flash-1.5', { apiKey: 'your-openrouter-key' }]})
.init('https://pokemondb.net/pokedex/national')
.extract({ name: 'Pokemon name', number: 'Pokemon number' })
.limit(3)
.run();Choose the AI model that best suits your needs.
The following providers are supported
- OpenAI: Model strings are
openai:..., for exampleopenai:gpt-4o - Google: Model strings are
google:..., for examplegoogle:gemini-1.5-flash - OpenRouter: Model strings are
openrouter:..., for exampleopenrouter:anthropic/claude-3.5-haiku
By default, FetchFox uses OpenAI's gpt-4o-mini model. We've found this model to provide a good tradeoff between cost, runtime, and accuracy. We have a public benchmarks dashboard where you can review performance data on recent commits.
