pulse-flows
v1.0.1
Published
Pulse Flows - Workflow automation service
Maintainers
Readme
Pulse Flows
Intelligent content discovery and curation system for the Pulse ecosystem. Crawls → Processes → Labels → Prioritizes local content for trending feeds.
🎯 System Overview
Configs → Crawl → AI Extract → Label → Learn Patterns → Trending Selection
↓ ↓ ↓ ↓ ↓ ↓
Firebase Multiple Gemini/ Manual + Cross-Area Local First
Sources Sources OpenAI Automatic Intelligence Algorithm🚀 Key Features
- Smart Content Crawling: Multi-source with intelligent refresh rates
- AI Processing: Gemini (default) + OpenAI fallback for content extraction
- Content Labeling System: Manual + automatic pattern recognition
- Cross-Area Intelligence: Learn once, apply everywhere (60+ cities)
- Local-First Trending: Prioritizes hyperlocal > local > national content
- Pattern Recognition: City-agnostic patterns prevent cross-contamination
- Embedding Search: Find similar content across areas
- Production Ready: Cloud Run Jobs, cron automation, comprehensive monitoring
🏗️ Architecture
Core Components
- Express.js API Server - RESTful API endpoints
- Crawler Service - Multi-source content extraction
- AI Service - OpenAI-powered content processing
- Content Service - Storage and lifecycle management
- Config Service - Dynamic configuration management
- Cleanup Service - Automated stale content removal
Dependencies
- pulse-ai-utils - AI processing and Firestore utilities
- pulse-type-registry - Shared TypeScript types and schemas
- Crawl4AI - Primary web scraping service
- OpenAI - Content analysis and extraction
📡 API Endpoints
Core Operations
POST /api/flows/crawl- Crawl all sources (parallel)POST /api/flows/cleanup- Clean stale contentPOST /api/flows/trending/:area- Generate trending for area
Content Labeling
GET /api/flows/admin/content-labeling- Admin labeling interfacePOST /api/flows/admin/labels- Save content labelsPOST /api/flows/admin/patterns/learn- Learn from labeled contentPOST /api/flows/admin/patterns/apply- Apply patterns to unlabeled
Area Management
POST /api/flows/area-configs/generate- Generate configs for 31 US metrosPOST /api/flows/filters/update- Update area-specific filters
🚀 Quick Start
Prerequisites
- Node.js 20+
- Firebase service account credentials
- OpenAI API key
Installation
# Clone the repository
git clone https://github.com/anandroid/pulse-flows.git
cd pulse-flows
# Install dependencies
npm install
# Copy environment template
cp .env.example .env
# Edit .env with your API keys and credentials
nano .envEnvironment Variables
# Firebase/Firestore (choose one authentication method):
# Method 1: JSON string in environment variable (recommended for production)
FIREBASE_SERVICE_ACCOUNT_JSON='{"type":"service_account","project_id":"api-project-269146618053",...}'
# Method 2: Path to service account file
FIREBASE_SERVICE_ACCOUNT_PATH=/path/to/firebase-service-account.json
# Method 3: Place the file at ./configs/firebase-service-account.json (default)
# Note: Firebase initialization is handled by pulse-ai-utils
GOOGLE_CLOUD_PROJECT=api-project-269146618053
# OpenAI (for content processing)
OPENAI_API_KEY=sk-proj-your-openai-key
# Optional: Flow configuration
CRAWL_DELAY_MS=1000
MAX_CONCURRENT_CRAWLS=3
CONTENT_BATCH_SIZE=50
CLEANUP_BATCH_SIZE=100Development
# Start development server
npm run dev
# Build TypeScript
npm run build
# Run tests
npm run test
# Lint code
npm run lintProduction
# Build for production
npm run build
# Start production server
npm start🔧 Usage Examples
1. Standard Daily Operations
# Crawl all sources
curl -X POST http://localhost:8080/api/flows/crawl
# Generate trending for Tampa
curl -X POST http://localhost:8080/api/flows/trending/tampa
# Clean up old content
curl -X POST http://localhost:8080/api/flows/cleanup2. Content Labeling Workflow
# Open admin interface
open http://localhost:8080/api/flows/admin/content-labeling
# Learn patterns from labeled content
curl -X POST http://localhost:8080/api/flows/admin/patterns/learn
# Apply patterns to new content
curl -X POST http://localhost:8080/api/flows/admin/patterns/apply3. Production Cron Jobs
# Deploy all cron jobs
npm run deploy:cron
# Job schedule:
# - Crawl: Every 4 hours
# - Trending: After each crawl
# - Cleanup: Daily at 2 AM
# - Pattern Learning: Daily at 4 AM📊 How It Works
Content Discovery Flow
1. Crawl Sources → 2. AI Extract → 3. Store Content → 4. Label → 5. Generate Trending
↓ ↓ ↓ ↓ ↓
Web Pages Gemini AI Firestore Admin UI Local First
Google Search Structured + Supabase + Patterns Selection
RSS Feeds Extraction + Embeddings + Auto-labelExample: Local Story Recognition
Tampa: "Tampa cop celebrates birthday at shelter" → Label: LOCAL
↓
Pattern: "<CITY> cop celebrates birthday at shelter"
↓
Austin: "Austin cop celebrates birthday at shelter" → Auto-label: LOCAL
Miami: "Miami officer birthday party at rescue" → Auto-label: LOCAL (85% match)Trending Selection Process
All Content → Filter by Area → Apply Labels → Score Content → Select Top 15
↓ ↓ ↓ ↓ ↓
10,000+ Tampa: 500 Local: 300 Hyperlocal: 1.5x Diverse
items Austin: 450 National: 150 Local: 1.3x Categories
Miami: 600 Unknown: 50 National: 0.7x Guaranteed🎯 Real-World Example Flow
Scenario: Tampa Cop Birthday Story
Day 1 - Tampa:
1. Crawl: "Tampa cop celebrates 50th birthday at animal shelter"
2. Store: Save to content collection with area=tampa
3. Label: Admin labels as LOCAL (specific to Tampa)
4. Learn: System extracts pattern "<CITY> cop celebrates birthday at shelter"
Day 2 - Austin:
1. Crawl: "Austin police officer marks birthday at local shelter"
2. Match: System recognizes pattern (85% similarity)
3. Auto-label: Suggests LOCAL with high confidence
4. Trending: Prioritized in Austin feed (1.3x local boost)
Result:
- Tampa users see Tampa cop story
- Austin users see Austin cop story
- No cross-contamination between cities
- One manual label helped 60+ cities🐳 Docker Support
# Build image
docker build -t pulse-flows .
# Run container
docker run -p 8080:8080 \
-e OPENAI_API_KEY=your-key \
-e GOOGLE_APPLICATION_CREDENTIALS=/app/configs/firebase-service-account.json \
-v /path/to/firebase-creds:/app/configs \
pulse-flows🧪 Testing
# Run all tests
npm test
# Test specific endpoints
npm run test:integration
# Health check
curl http://localhost:8080/health📚 Documentation
- Content Labeling System - Complete labeling guide
- Labeling Guide - How to label content effectively
- API Examples - Comprehensive API usage
- Claude Code Config - AI assistant configuration
🔗 Related Projects
- pulse-ui - Frontend application
- pulse-apis - Backend API services
- pulse-type-registry - Shared types and schemas
- pulse-ai-utils - AI and utility functions
📄 License
Private repository - All rights reserved
🤝 Contributing
This is a private repository. For access or contributions, contact the repository owner.
🏆 Key Innovations
- City-Agnostic Pattern Recognition: One label works across 60+ cities
- Local-First Algorithm: Hyperlocal (1.5x) > Local (1.3x) > National (0.7x)
- Smart City Detection: Prevents Austin news in Tampa feeds
- Embedding + Pattern Hybrid: 70% semantic + 30% pattern matching
- Production Scale: Cloud Run Jobs handle 2-hour crawls across all US metros
Pulse Flows - Making local content truly local, everywhere.
🧩 Built with Claude Code
