jina-crawler
v0.0.7
Published
An LLM-friendly crawler from Jina.
Readme
Jina Crawler
An LLM-friendly crawler powered by Jina AI. This tool helps you crawl websites and process the content in a way that's optimized for Large Language Models.
Features
- Web crawling with configurable depth
- Integration with Jina AI for content processing
- Easy-to-use CLI interface
- TypeScript support
Installation
npm install jina-crawler
# or
pnpm add jina-crawler
# or
yarn add jina-crawlerOptions
--baseUrl,-u: Target URL to crawl (required)--name,-n: Project name (required)--maxDepth: Maximum depth to crawl (default: 2)--token: Your Jina AI token (can also be set via JINA_READER_TOKEN environment variable)
Usage
You can use Jina Crawler in two ways:
1. Quick Start with npx (No Installation Required)
npx jina-crawler --baseUrl <url> --name <project-name> [options]2. Project Installation (Recommended for Team Collaboration)
First, install the package as a dependency:
npm install jina-crawler
# or
pnpm add jina-crawler
# or
yarn add jina-crawlerThen add the following to your package.json:
{
"scripts": {
"crawl:example": "jina-crawler --baseUrl https://example.com --name dev-crawl"
}
}Now you can run the crawler using:
npm run crawl:exampleAuthentication
To use Jina Crawler, you'll need a Jina AI token. You can get one by visiting https://jina.ai/reader/.
You can provide the token in two ways:
- Via the
--tokencommand line option - Via the
JINA_READER_TOKENenvironment variable
Development
# Install dependencies
pnpm install
# Run type checking
pnpm typecheck
# Build the project
pnpm build
# Run linting
pnpm lintLicense
MIT License 2024 zcf0508
