asimov-brightdata-module
v0.0.0
Published
ASIMOV module for data import powered by the Bright Data web data platform.
Downloads
5
Maintainers
Readme
ASIMOV Bright Data Module
ASIMOV module for data import powered by the Bright Data web data platform.
✨ Features
- Imports structured data from Airbnb, Amazon, Crunchbase, eBay, Facebook, Google, Indeed, Instagram, LinkedIn, Walmart, X (aka Twitter), Yahoo, and YouTube.
- Collects the raw JSON data via the Bright Data API (requires an API key).
- Constructs a semantic knowledge graph based on the KNOW ontology.
- Supports plain JSON output as well as RDF output in the form of JSON-LD.
- Distributed as a standalone static binary with zero runtime dependencies.
🛠️ Prerequisites
- Rust 1.85+ (2024 edition) if building from source code
⬇️ Installation
Installation from PyPI
pip install -U asimov-brightdata-moduleInstallation from RubyGems
gem install asimov-brightdata-moduleInstallation from NPM
npm install -g asimov-brightdata-moduleInstallation from Source Code
cargo install asimov-brightdata-module👉 Examples
export BRIGHTDATA_API_KEY="..."Fetching X Profiles
asimov-brightdata-fetcher https://x.com/bright_init # JSON
asimov-brightdata-importer https://x.com/bright_init # JSON-LDFetching LinkedIn Profiles
asimov-brightdata-fetcher https://www.linkedin.com/in/orlenchner/
asimov-brightdata-fetcher https://www.linkedin.com/company/bright-data/Fetching Crunchbase Profiles
asimov-brightdata-fetcher https://www.crunchbase.com/organization/brightdataFetching Amazon Products
asimov-brightdata-fetcher https://www.amazon.com/Master-Algorithm-Ultimate-Learning-Machine/dp/0465094279⚙ Configuration
Environment Variables
BRIGHTDATA_API_KEY: (required) the Bright Data API key to use
📚 Reference
Installed Binaries
asimov-brightdata-cataloger: discovers entities via the Bright Data API (not implemented yet)asimov-brightdata-fetcher: collects JSON data from the Bright Data APIasimov-brightdata-importer: collects and transforms JSON into JSON-LD (not implemented yet)
Supported Datasets
Dataset | URL Prefix | JSON | RDF
:------ | :--------- | :--: | :--:
Airbnb | https://www.airbnb.com/rooms/ | ✅ | 🚧
Amazon | https://www.amazon.com/ | ✅ | 🚧
| https://www.amazon.com/sp?seller= | ✅ | 🚧
Crunchbase | https://www.crunchbase.com/organization/ | ✅ | 🚧
eBay | https://www.ebay.com/itm/ | ✅ | 🚧
Facebook | https://www.facebook.com/events/ | ✅ | 🚧
| https://www.facebook.com/groups/ | ✅ | 🚧
| https://www.facebook.com/marketplace/item/ | ✅ | 🚧
| https://www.facebook.com/share/p/ | ✅ | 🚧
Google | https://www.google.com/shopping/product/ | ✅ | 🚧
Indeed | https://www.indeed.com/cmp/ | ✅ | 🚧
Instagram | https://www.instagram.com/ | ✅ | 🚧
| https://www.instagram.com/p/ | ✅ | 🚧
| https://www.instagram.com/reel/ | ✅ | 🚧
LinkedIn | https://www.linkedin.com/company/ | ✅ | 🚧
| https://www.linkedin.com/in/ | ✅ | 🚧
| https://www.linkedin.com/jobs/ | ✅ | 🚧
| https://www.linkedin.com/posts/ | ✅ | 🚧
| https://www.linkedin.com/pulse/ | ✅ | 🚧
Walmart | https://www.walmart.com/global/seller/ | ✅ | 🚧
| https://www.walmart.com/ip/ | ✅ | 🚧
X (Twitter) | https://x.com/ | ✅ | ✅
Yahoo | https://finance.yahoo.com/quote/ | ✅ | 🚧
YouTube | https://www.youtube.com/@ | ✅ | 🚧
| https://www.youtube.com/watch?v= | ✅ | 🚧
| | |
👨💻 Development
git clone https://github.com/asimov-modules/asimov-brightdata-module.git