@stnickza/pricepatrol-parser
v1.0.3
Published
Structured data parsing library for Price Patrol ecosystem
Downloads
84
Maintainers
Readme
@pricepatrol/parser
A comprehensive structured data parsing library for the Price Patrol ecosystem. This library provides robust extraction and processing capabilities for structured data from web pages, including JSON-LD, meta tags, data layers, and microdata.
Features
- 🚀 Universal Compatibility: Works in both browser and Node.js environments
- 🎯 Multiple Data Sources: Supports JSON-LD, meta tags, data layers, and microdata
- 🔧 Flexible Processing: Advanced path evaluation and data transformations
- 📊 Confidence Scoring: Built-in confidence metrics for extraction reliability
- 🧪 Well Tested: Comprehensive test suite with 100% coverage
- 📦 Tree Shakable: Separate exports for browser and processing functionality
Installation
npm install @pricepatrol/parserQuick Start
Browser Environment
import { createBrowserExtractor, StructuredDataProcessor } from '@pricepatrol/parser';
// Extract data from current page
const extractor = createBrowserExtractor();
const structuredData = extractor.extractAll();
// Process with custom selectors
const processor = new StructuredDataProcessor();
const selectors = {
productName: { jsonLd: '0.name' },
price: { jsonLd: '0.offers.price', transformations: [{ type: 'parseNumber' }] },
brand: { metaTags: 'product:brand' }
};
const result = processor.processRecipe(structuredData, selectors);
console.log(result.productName?.value); // Extracted product nameNode.js Environment
import { StructuredDataProcessor } from '@pricepatrol/parser/processing';
const processor = new StructuredDataProcessor();
// Process pre-extracted structured data
const structuredData = {
jsonLd: [{ "@type": "Product", "name": "Example Product", "offers": { "price": "29.99" } }],
metaTags: { "og:title": "Example Product Page" },
dataLayers: { dataLayer: [{ product: { name: "Example" } }] },
url: "https://example.com/product",
pageTitle: "Example Product",
timestamp: new Date().toISOString(),
extractorVersion: "1.0.0"
};
const result = processor.processSelector(structuredData, {
jsonLd: '0.name'
});
console.log(result?.value); // "Example Product"API Reference
Core Classes
StructuredDataProcessor
Main processing class for extracting data using selectors.
const processor = new StructuredDataProcessor();
// Process single field
const field = processor.processSelector(data, selector);
// Process multiple fields
const results = processor.processRecipe(data, selectors);
// Validate data structure
const isValid = StructuredDataProcessor.validateStructuredData(data);BrowserDataExtractor
Browser-specific extraction from DOM elements.
const extractor = new BrowserDataExtractor(document, window);
// Extract all structured data
const data = extractor.extractAll();
// Check capabilities
const capabilities = extractor.getCapabilities();
// Extract with custom CSS selectors
const customData = extractor.extractCustomData({
title: 'h1.product-title',
price: '.price-amount'
});Selector Format
Selectors define how to extract data from different structured data sources:
interface FieldSelector {
jsonLd?: string; // JSON-LD path (e.g., "0.offers.price")
metaTags?: string; // Meta tag key (e.g., "og:price:amount")
dataLayers?: string; // Data layer path (e.g., "dataLayer.0.product.name")
microdata?: string; // Microdata property name
regex?: string; // Post-processing regex
transformations?: FieldTransformation[]; // Data transformations
}Data Transformations
Apply transformations to extracted values:
const selector = {
jsonLd: '0.offers.price',
transformations: [
{ type: 'regex', pattern: '([0-9.]+)', flags: 'g' },
{ type: 'parseNumber' },
{ type: 'trim' }
]
};Available transformations:
regex: Apply regular expressionreplace: String replacementtrim: Remove whitespacelowercase/uppercase: Case conversionparseNumber: Convert to numberparseBoolean: Convert to boolean
Path Evaluation
The library supports complex path evaluation for nested data:
import { evaluateStructuredDataPath } from '@pricepatrol/parser';
const data = {
products: [
{ name: "Product 1", offers: [{ price: "10.99" }] }
]
};
// Extract nested array data
const price = evaluateStructuredDataPath(data, 'products[0].offers[0].price');
console.log(price); // "10.99"Modules
Universal Processing (@pricepatrol/parser/processing)
Core data processing functionality that works in any JavaScript environment:
StructuredDataProcessorevaluateStructuredDataPath- Type definitions
Browser Extraction (@pricepatrol/parser/browser)
Browser-specific DOM extraction functionality:
BrowserDataExtractorcreateBrowserExtractorextractJsonLdDataextractMetaTagsextractDataLayersextractMicrodata
Confidence Scoring
The library provides confidence scores for extracted data based on the source:
- JSON-LD: 0.9 (highest confidence)
- Meta Tags: 0.8
- Data Layers: 0.7
- Microdata: 0.6
- Custom Data Layers: 0.5
const result = processor.processSelector(data, selector);
console.log(result?.confidence); // 0.9 for JSON-LD sourceBrowser Compatibility
- Modern Browsers: Chrome 80+, Firefox 75+, Safari 13+, Edge 80+
- Node.js: 18.0.0+
- JSDOM: Supported for server-side testing
Contributing
- Fork the repository
- Create a feature branch:
git checkout -b feature/new-feature - Run tests:
npm test - Commit changes:
git commit -am 'Add new feature' - Push to branch:
git push origin feature/new-feature - Submit a pull request
Testing
# Run all tests
npm test
# Run tests in watch mode
npm run test:watch
# Generate coverage report
npm run test:coverageLicense
MIT License - see LICENSE file for details.
Changelog
1.0.0
- Initial release
- Core structured data processing
- Browser DOM extraction
- Comprehensive test suite
- TypeScript support
