@san-francisco/sf-docs-embeddings
v0.3.1
Published
San Francisco documentation embeddings - example RAG data package
Downloads
626
Maintainers
Readme
@san-francisco/sf-docs-embeddings
San Francisco documentation embeddings - example RAG data package.
Overview
This package demonstrates how to ship pre-computed embeddings as PGPM migrations. It creates a sample collection with San Francisco city documentation and example embeddings.
What This Package Does
- Creates an embedding model configuration for
text-embedding-3-small - Creates the
sf-docscollection with semantic chunking config - Seeds example documents and chunks
- (In production) Would include actual vector embeddings
Usage
# Install the RAG schema first
pgpm deploy @sf-ai/rag-core
# Then install this data package
pgpm deploy @san-francisco/sf-docs-embeddingsData Structure
After installation, you'll have:
- Collection:
sf-docs- San Francisco city documentation - Model:
text-embedding-3-small(OpenAI, 1536 dimensions) - Documents: Example SF city services content
- Chunks: Semantically chunked document segments
Creating Your Own Data Package
To create a similar data package for your own embeddings:
- Generate embeddings using your preferred model
- Export using
rag.export_collection_json() - Convert the JSON to SQL INSERT statements
- Package as a PGPM module
Example workflow:
-- Export your collection
SELECT rag.export_collection_json('your-collection-id');
-- Or export as CSV for processing
SELECT * FROM rag.export_embeddings_csv('your-collection-id');Dependencies
@sf-ai/rag-core
