@xcrap/factory
v0.0.4
Published
Xcrap Factory is a set of utilities for dynamically creating instances of clients, extractors, and parsing models, making it easier to configure and extend scraping and parsing pipelines.
Readme
🕷️ Xcrap Factory: Instantiate clients, parsing models, and extractors from configuration objects
Xcrap Factory is a set of utilities for dynamically creating instances of clients, extractors, and parsing models, making it easier to configure and extend scraping and parsing pipelines.
📦 Installation
Installation is straightforward—just use your favorite dependency manager. Here’s an example using NPM:
npm i @xcrap/factory🛠️ Features
- createClient: Instantiates clients from a registry of allowed classes.
- createExtractor: Creates extractor functions from configurable text and a registry of allowed extractors.
- createParsingModel: Builds validated and nested parsing models with customizable extractors and types.
🚀 Usage
1. Creating a Client
import { GotScrapingClient } from "@xcrap/got-scraping-client"
import { AxiosClient } from "@xcrap/axios-client"
import { createClient } from "@xcrap/factory"
const config = {
allowedClients: {
"got-scraping": GotScrapingClient,
"axios": AxiosClient
}
}
const client = createClient({
config: config,
type: "...", // Client type
options: {...} // Client constructor options
})2. Creating an Extractor
import { extractInnerText, extractSrc, extractHref, extractAttribute } from "@xcrap/parser"
import { createExtractor } from "@xcrap/factory"
const config = {
allowedExtractors: {
innerText: extractInnerText,
src: extractSrc,
href: extractHref,
attribute: extractAttribute // extractAttribute(name: string) -> Generates an extractor
},
argumentSeparator: ":" // Optional | Usage example -> "attribute:value"
}
const extractor = createExtractor({
extractorText: "..", // innerText, src, href, attribute:ATTRIBUTE_NAME...
config: config
})3. Creating a Parsing Model
import { HtmlParsingModel, JsonParsingModel } from "@xcrap/parser"
import { createParsingModel } from "@xcrap/factory"
const config = {
allowedExtractors: {...},
extractorArgumentSeparator: "...", // Optional
allowedModels: {
html: HtmlParsingModel,
json: JsonParsingModel
}
}
const parsingModel = createParsingModel({
config: config,
model: {
type: "html", // Model type: html, json..
model: {
title: {
query: "title",
extractor: "innerText",
},
bodyData: { // Nested model
query: "body",
nested: {
type: "html",
model: {
heading: {
query: "h1",
extractor: "innerText"
}
}
}
}
}
}
})🧪 Testing
Automated tests are located in __tests__. To run them:
npm run test🤝 Contributing
- Want to contribute? Follow these steps:
- Fork the repository.
- Create a new branch (git checkout -b feature-new).
- Commit your changes (git commit -m 'Add new feature').
- Push to the branch (git push origin feature-new).
- Open a Pull Request.
📝 License
This project is licensed under the MIT License.
