baml-sap-ts
v0.1.2
Published
TypeScript implementation of BAML's Schema-Aligned Parsing (SAP) algorithm using TypeBox
Maintainers
Readme
baml-sap-ts
A TypeScript implementation of BAML's Schema-Aligned Parsing (SAP) algorithm using TypeBox for schema definition.
Note: This is an unofficial community migration of BAML's core SAP algorithm. For the official BAML project, visit boundaryml.com.
Overview
This package provides LLM framework independent structured output parsing - use it with any LLM client (OpenAI, Anthropic, Ollama, etc.).
What is Schema-Aligned Parsing (SAP)?
SAP is BAML's algorithm for reliably extracting structured outputs from LLMs:
- Prompt Rendering: Auto-generates schema instructions for the LLM
- Response Parsing: Handles malformed JSON, markdown extraction, chain-of-thought filtering
- Type Coercion: Validates and coerces parsed data to the target schema
Key Features
- ✅ Markdown code block extraction - Extract JSON from
```jsonblocks - ✅ JSON fixing - Auto-fix trailing commas, missing quotes, etc.
- ✅ Unicode quote normalization - Normalize smart quotes (
“ ” ‘ ’) before parsing - ✅ Chain-of-thought filtering - Remove reasoning text before JSON
- ✅ Union type matching - Automatically select best matching variant
- ✅ Optional fields with defaults - Handle missing fields gracefully
- ✅ Type coercion - Convert strings to numbers, etc.
- ✅ Partial/streaming support - Parse incomplete responses
- ✅ Detailed error reporting - Know exactly what went wrong
- ✅ Full TypeScript support - Type-safe schemas and responses
Installation
npm install baml-sap-ts @sinclair/typeboxQuick Start
import { Type } from "@sinclair/typebox";
import { createPromptWithSchema, parseResponse } from "baml-sap-ts";
// 1. Define your schema
const UserSchema = Type.Object({
name: Type.String(),
age: Type.Number(),
email: Type.Optional(Type.String()),
});
// 2. Create a prompt with schema instructions
const prompt = createPromptWithSchema(
'Extract user information from: "Alice is 30 years old"',
UserSchema
);
console.log(prompt);
// Output:
// Extract user information from: "Alice is 30 years old"
//
// Respond with a JSON object in the following format:
//
// ```json
// {
// "name": string,
// "age": number,
// "email": string (optional)
// }
// ```
// 3. Send prompt to LLM and parse response
const response = `Here's the information:
\`\`\`json
{
"name": "Alice",
"age": 30
}
\`\`\``;
const result = parseResponse(response, UserSchema);
if (result.success) {
console.log(result.value); // { name: "Alice", age: 30 }
} else {
console.error(result.errors);
}API Reference
createPromptWithSchema(basePrompt, schema, options?)
Appends schema instructions to a base prompt to guide the LLM toward producing correctly structured output.
Parameters:
basePrompt: string- Your prompt textschema: TSchema- TypeBox schemaoptions?: SchemaRenderOptions- Optional rendering options
Returns: string - Complete prompt with schema instructions
parseResponse(response, schema, options?)
Parse an LLM response into a typed value. This is the main entry point for the SAP algorithm.
Parameters:
response: string- Raw LLM response textschema: TSchema- TypeBox schema to validate againstoptions?: ParseOptions- Optional parsing options
Returns: ParseResult<T> - Parsed value with metadata
interface ParseResult<T> {
success: boolean;
value: T;
errors: Array<{ path: string; message: string }>;
isPartial: boolean;
meta: {
raw: string;
fromMarkdown?: boolean;
chainOfThoughtFiltered?: boolean;
fixes?: string[];
coercions?: string[];
};
}parsePartialResponse(response, schema, options?)
For streaming scenarios - parses partial responses with allowPartials: true.
extractJson(text, options?)
Low-level JSON extraction from text. Handles markdown blocks and malformed JSON.
coerceValue(value, schema, options?)
Low-level type coercion and validation against a TypeBox schema.
validateValue(value, schema)
Validates a value against a schema without coercion. Returns validation errors.
Usage Examples
Chain-of-Thought Reasoning
Automatically detected and filtered:
const response = `Let me think step by step...
First, I'll analyze the input. The user mentions they are 25.
Therefore the output JSON is:
\`\`\`json
{"name": "John", "age": 25}
\`\`\``;
const result = parseResponse(response, schema);
console.log(result.meta.chainOfThoughtFiltered); // true
console.log(result.value); // { name: "John", age: 25 }Malformed JSON
Auto-fixes common issues:
const response = '{"name": "test", "age": 25,}'; // trailing comma
const result = parseResponse(response, schema);
console.log(result.meta.fixes); // ['applied_auto_fixes']
console.log(result.value); // { name: "test", age: 25 }Smart/Unicode Quotes
Normalizes typographic quotes automatically:
const response = '{“name”: “test”, “age”: 25}';
const result = parseResponse(response, schema);
console.log(result.meta.fixes); // ['normalized_unicode_quotes']
console.log(result.value); // { name: "test", age: 25 }Markdown Code Blocks
Extracts JSON from markdown:
const response = '```json\n{"key": "value"}\n```';
const result = parseResponse(response, schema);
console.log(result.meta.fromMarkdown); // trueUnion Types
Selects best matching variant:
const Schema = Type.Union([
Type.Object({ type: Type.Literal("user"), name: Type.String() }),
Type.Object({ type: Type.Literal("bot"), id: Type.Number() }),
]);
const result = parseResponse('{"type": "user", "name": "Alice"}', Schema);
// Automatically matches the first variantComplex Nested Schema
const OrderSchema = Type.Object({
orderId: Type.String(),
status: Type.Enum({ ORDERED: "ORDERED", SHIPPED: "SHIPPED" }),
items: Type.Array(
Type.Object({
name: Type.String(),
quantity: Type.Integer({ minimum: 1 }),
price: Type.Number({ minimum: 0 }),
})
),
total: Type.Number({ minimum: 0 }),
});
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [
{
role: "user",
content: createPromptWithSchema("Extract order info", OrderSchema),
},
],
});
const result = parseResponse(response.choices[0].message.content, OrderSchema);Options
ParseOptions
interface ParseOptions {
// Extraction options
allowMarkdownJson?: boolean; // Extract from markdown blocks (default: true)
allowFixes?: boolean; // Fix malformed JSON (default: true)
allowAsString?: boolean; // Return string if all fails (default: true)
findAllJsonObjects?: boolean; // Find multiple JSON objects (default: true)
normalizeUnicodeQuotes?: boolean; // Normalize “ ” ‘ ’ to standard quotes (default: true)
maxDepth?: number; // Max parsing depth (default: 100)
// Coercion options
allowPartials?: boolean; // Allow incomplete objects (default: false)
useDefaults?: boolean; // Use default values (default: true)
strict?: boolean; // No type coercion (default: false)
trackCoercions?: boolean; // Track applied coercions (default: false)
// Parse options
filterChainOfThought?: boolean; // Filter reasoning text (default: true)
returnAllCandidates?: boolean; // Return all candidate parses (default: false)
}Architecture
┌─────────────────────────────────────────────────────────────┐
│ parseResponse │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────┐ ┌───────────────────────────┐ │
│ │ 1. Filter CoT Text │───▶│ 2. Extract JSON │ │
│ │ (if detected) │ │ - Markdown blocks │ │
│ └─────────────────────┘ │ - Direct JSON │ │
│ │ - Fix malformed │ │
│ └───────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────┐ │
│ │ 3. Type Coercion │ │
│ │ - Union matching │ │
│ │ - Field validation │ │
│ │ - Default values │ │
│ └───────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────┐ │
│ │ 4. Return ParseResult │ │
│ │ with metadata │ │
│ └───────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘Comparison with Original BAML
| Feature | Original BAML (Rust) | This Migration (TS) |
| ----------------- | -------------------- | ------------------- |
| Schema Definition | BAML DSL | TypeBox |
| Prompt Rendering | ctx.output_format | renderSchema() |
| JSON Extraction | jsonish parser | extractJson() |
| Type Coercion | Coercer IR | coerceValue() |
| Streaming | Native | Supported |
| Partial Results | Native | Supported |
| Error Reporting | Detailed | Detailed |
| Performance | Native Rust | JavaScript |
Development
# Clone the repository
git clone https://github.com/aiamblichus/baml-sap-ts.git
cd baml-sap-ts
# Install dependencies
npm install
# Build
npm run build
# Run tests
npm test
# Run example
npm run example
# Lint and format
npm run check
npm run check:fixContributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
License
MIT © Your Name
Acknowledgments
- Original BAML project by BoundaryML
- TypeBox for the excellent schema validation library
