@sschepis/brand-ingestor
v1.0.0
Published
LLM-powered brand intelligence library. Point it at a URL, get back a comprehensive brand profile — company info, brand identity, full product catalog, taxonomy, and metadata.
Maintainers
Readme
brand-ingestor
LLM-powered brand intelligence library. Point it at a URL, pass an LLM adapter, get back a comprehensive brand profile — company info, brand identity, full product catalog, taxonomy, metadata schema, and site content.
Designed to give an AI marketing system everything it needs to create ads and run campaigns for a brand.
Install
npm install
npx playwright install chromiumUsage
import { ingestBrand, LLMProvider, BrandProfile } from 'brand-ingestor';
const profile: BrandProfile = await ingestBrand('https://philosophy.com', {
llmProvider: myLLMAdapter,
maxPages: 20, // optional, default 50
concurrency: 2, // optional, default 2
});LLM Adapter
You provide an object implementing LLMProvider:
interface LLMProvider {
generateObject: <T>(prompt: string, schema: ZodType<T>) => Promise<T>;
}The library sends a prompt and a Zod schema. Your adapter returns a parsed object matching the schema. How you call your LLM is up to you.
Example with Vercel AI SDK + LM Studio:
import { generateObject } from 'ai';
import { createOpenAI } from '@ai-sdk/openai';
const lm = createOpenAI({ baseURL: 'http://localhost:1234/v1', apiKey: 'not-needed' });
const model = lm('openai/gpt-oss-20b');
const llmProvider: LLMProvider = {
generateObject: async (prompt, schema) => {
const { object } = await generateObject({ model, schema, prompt, mode: 'json' });
return object;
}
};What It Returns
ingestBrand() returns a BrandProfile with these sections:
company — Corporate & Business Info
| Field | Description |
|-------|------------|
| name | Brand name |
| legalName | Legal/registered name |
| description | What the company does |
| tagline | Primary slogan |
| foundedYear | Year founded |
| headquarters | HQ location |
| parentCompany | Parent company if any |
| industry | Line of business |
| contactEmail | Public contact email |
| contactPhone | Public phone number |
| contactAddress | Physical address |
| socialProfiles[] | Platform, URL, and handle for each social account |
brand — Brand Identity
| Field | Description |
|-------|------------|
| logos[] | Logo URLs with context (favicon, header, etc.) |
| colors[] | Brand colors as hex codes with usage context |
| fonts[] | Font families used on the site |
| taglines[] | Slogans/taglines found in site content |
| voiceTone | Brand voice description (e.g. "warm and aspirational") |
| brandValues[] | Core values (e.g. "self-care", "simplicity") |
| targetDemographic | Target audience description |
| brandPersonality | Brand personality description |
taxonomy — Product Taxonomy
| Field | Description |
|-------|------------|
| collections[] | All collections/categories with title, handle, description, product count |
| productTypes[] | All unique product types |
| tags[] | All unique tags across products |
| vendors[] | All unique vendors |
| priceRange | Min, max, average price and currency |
products[] — Full Product Catalog
Each product includes:
| Field | Description |
|-------|------------|
| id, handle, url | Identifiers |
| name, description, descriptionHtml | Content |
| productType, vendor, tags[] | Classification |
| variants[] | Each variant: id, title, SKU, price, compareAtPrice, available, weight, option values |
| options[] | Option definitions (e.g. Size: ["4oz", "8oz", "16oz"]) |
| images[] | Image URL, alt text, dimensions, position |
| publishedAt, createdAt, updatedAt | Timestamps |
metadata — Product Metadata Schema
Computed from the product data so you understand the structure:
| Field | Description |
|-------|------------|
| optionDefinitions[] | Every option name, all values seen across products, how many products use it |
| tagCloud[] | Every tag with frequency count |
| fieldPopulation[] | For each product field: how many products have it populated (fill rate %) |
content — Site Content
| Field | Description |
|-------|------------|
| pages[] | Static pages (about, FAQ, policies) with title, handle, and text content |
| metaDescription | Site meta description |
| metaKeywords[] | Meta keywords |
| ogImage | Open Graph image URL |
| favicon | Favicon URL |
Top-level metadata
| Field | Description |
|-------|------------|
| platform | Detected platform ("shopify" or "generic") |
| sourceUrl | The URL that was ingested |
| ingestedAt | ISO timestamp |
How It Works
Platform detection — Probes
/products.jsonto detect Shopify. More platform detectors can be added.Shopify path — Fetches products, collections, and pages from Shopify's public JSON APIs (no auth needed). Paginates automatically. Renders the homepage with Playwright for brand identity extraction.
Generic path — Crawls the site with Playwright (renders JS), uses LLM to categorize URLs, extract products from page text, and derive corporate info.
Both paths — Scrape HTML for logos, colors, fonts, social links, meta tags, and JSON-LD structured data. Use LLM to derive brand voice, values, personality, and target demographic.
CLI (for testing)
npx tsx src/cli.ts https://philosophy.comEnv vars: LLM_BASE_URL, LLM_API_KEY, LLM_MODEL, MAX_PAGES.
Writes output to brand-profile-{hostname}.json.
