@asktext/core
v1.0.5
Published
Core embedding and vector store utilities for AskText voice Q&A.
Maintainers
Readme
@asktext/core
TypeScript-first embedding and retrieval engine for voice-enabled Q&A on articles.
What it does
- Text processing: Splits HTML/Markdown into semantic chunks with configurable overlap
- Embeddings: Generates OpenAI embeddings for each chunk
- Storage: Saves chunks + embeddings to your database (Prisma JSON, pgvector, or custom)
- Retrieval: Semantic search to find relevant passages for user questions
Installation
npm install @asktext/core openai @prisma/clientQuick Start
1. Database Schema
Add to your schema.prisma:
model ArticleChunk {
id String @id @default(cuid())
postId String
chunkIndex Int
content String @db.Text
startChar Int
endChar Int
embedding String @db.Text // JSON-encoded float[]
@@index([postId, chunkIndex])
}Run npx prisma db push.
2. Embed Articles
import { PrismaClient } from '@prisma/client';
import { OpenAIEmbedder, embedAndStore } from '@asktext/core';
const prisma = new PrismaClient();
const store = embedAndStore.createPrismaJsonStore(prisma);
const embedder = new OpenAIEmbedder({
apiKey: process.env.OPENAI_API_KEY!
});
// Call this when publishing/updating articles
export async function saveEmbeddings(postId: string, htmlContent: string) {
await embedAndStore({
articleId: postId,
htmlOrMarkdown: htmlContent,
embedder,
store
});
}3. Retrieve Passages
import { retrievePassages } from '@asktext/core';
const passages = await retrievePassages({
query: "How does binary search work?",
store,
embedder,
filter: { postId: "article-123" },
limit: 5
});Configuration
Text Splitting
import { TextSplitter } from '@asktext/core';
const splitter = new TextSplitter({
chunkSize: 1500, // characters per chunk
chunkOverlap: 200, // overlap between chunks
separators: ['\n\n', '\n', '. ', ' '] // split priorities
});Custom Vector Store
Implement the VectorStore interface for your database:
interface VectorStore {
saveChunks(chunks: ChunkWithEmbedding[]): Promise<void>;
searchSimilar(embedding: number[], limit: number, filter?: any): Promise<ChunkWithScore[]>;
deleteByArticleId(articleId: string): Promise<void>;
}Environment Variables
OPENAI_API_KEY=sk-... # Required for embeddings
DATABASE_URL=postgresql://... # For Prisma storeAdvanced Usage
Batch Processing
const articles = await getArticlesToProcess();
for (const article of articles) {
await saveEmbeddings(article.id, article.content);
console.log(`Processed: ${article.title}`);
}Custom Embedder
class CustomEmbedder implements Embedder {
async embed(texts: string[]): Promise<number[][]> {
// Your embedding logic
}
}Cost Estimation
- 100k words ≈ 75k tokens ≈ $0.01 with
text-embedding-3-small - 1M words ≈ 750k tokens ≈ $0.10
License
MIT
