@layerscale/layerscale
v0.3.0
Published
Client for the LayerScale inference server
Readme
@layerscale/layerscale
TypeScript client for the LayerScale inference server.
Install
npm install @layerscale/layerscaleGet a license key
You need a LayerScale license key to authenticate. Grab a free one at layerscale.ai/get-license, it takes about 10 seconds.
Usage
import { LayerScale } from '@layerscale/layerscale';
const client = new LayerScale('http://localhost:8080', {
apiKey: 'LS-...', // or set LAYERSCALE_LICENSE_KEY env var
});Complete example
End-to-end: create a session, push a few OHLCV candles, wait for them to be processed, and ask the model about the data.
import { LayerScale } from '@layerscale/layerscale';
const client = new LayerScale('http://localhost:8080', {
apiKey: process.env.LAYERSCALE_LICENSE_KEY,
});
async function main() {
// 1. Create a session with a system prompt and freeze it in the cache
// so it is not reprocessed on every incoming data update.
const session = await client.sessions.create({
type: 'ohlcv',
prompt:
'You are a real-time market analyst. You receive live OHLCV candles and ' +
'answer questions about market direction in a single word when possible.',
context: 4096,
markPrefix: true,
});
// 2. Push a few OHLCV candles and wait for them to be decoded into the cache.
// For a reactive alternative, subscribe to the `data_updated` event on
// client.sessions.events() or client.sessions.stream() instead of using `wait`.
const candles = [
{ o: 150.20, h: 150.80, l: 150.10, c: 150.70, v: 1_200_000, timestamp: 1_733_000_000, sym: 'AAPL' },
{ o: 150.70, h: 151.10, l: 150.60, c: 150.95, v: 900_000, timestamp: 1_733_000_060, sym: 'AAPL' },
{ o: 150.95, h: 151.40, l: 150.90, c: 151.30, v: 1_500_000, timestamp: 1_733_000_120, sym: 'AAPL' },
{ o: 151.30, h: 151.80, l: 151.20, c: 151.75, v: 1_100_000, timestamp: 1_733_000_180, sym: 'AAPL' },
{ o: 151.75, h: 152.10, l: 151.60, c: 152.00, v: 1_300_000, timestamp: 1_733_000_240, sym: 'AAPL' },
];
await client.sessions.push(session.session_id, candles, { wait: true });
// 3. Query the model about the data we just ingested.
const answer = await client.sessions.query(session.session_id, {
prompt: 'Is the market trending up or down?',
max_tokens: 8,
fast_answer: true,
});
console.log('Answer:', answer.text.trim());
await client.sessions.delete(session.session_id);
}
main().catch((err) => {
console.error(err);
process.exit(1);
});OpenAI-compatible chat
const chat = await client.chat({
messages: [{ role: 'user', content: 'Hello' }],
});Anthropic-compatible messages
const msg = await client.message({
messages: [{ role: 'user', content: 'Hello' }],
max_tokens: 256,
});Streaming
All streaming methods return async generators:
for await (const chunk of client.chatStream({ messages })) {
process.stdout.write(chunk.choices[0]?.delta.content ?? '');
}
for await (const chunk of client.messageStream({ messages, max_tokens: 256 })) {
if (chunk.type === 'content_block_delta') {
process.stdout.write(chunk.delta.text);
}
}Sessions
const session = await client.sessions.create({
type: 'ohlcv',
prompt: 'You are a trading analyst.',
flash: [
{ query: 'Is the trend bullish?' },
{ query: 'Is volume increasing?' },
],
markPrefix: true,
});
const answer = await client.sessions.query(session.session_id, {
prompt: 'What is the trend?',
max_tokens: 256,
});
// Stream generation token-by-token
for await (const chunk of client.sessions.queryStream(session.session_id, { max_tokens: 256 })) {
if (chunk.token) process.stdout.write(chunk.token);
}Tool calling in sessions
Pass OpenAI-format messages + tools. The server applies the model's
Jinja chat template, diffs against the session's cached tokens, and only
decodes the delta. The tool_call_guide is cached per session so repeated
turns with the same tool set skip re-tokenising tool names.
const tools = [{
type: 'function' as const,
function: {
name: 'read_file',
description: 'Read a file',
parameters: {
type: 'object',
properties: { path: { type: 'string' } },
required: ['path'],
},
},
}];
// Turn 1: ask the model to pick a tool.
const resp = await client.sessions.query(session.session_id, {
messages: [
{ role: 'system', content: 'You are a coding agent.' },
{ role: 'user', content: 'Read src/App.tsx' },
],
tools,
max_tokens: 256,
});
if (resp.tool_calls?.length) {
const call = resp.tool_calls[0];
console.log(`Tool: ${call.function.name}(${call.function.arguments})`);
// Turn 2: feed the tool result back. Include the assistant's prior
// tool_call so the template renders the matching tool_call_id header.
const followUp = await client.sessions.query(session.session_id, {
messages: [
{ role: 'system', content: 'You are a coding agent.' },
{ role: 'user', content: 'Read src/App.tsx' },
{ role: 'assistant', content: null, tool_calls: [call] },
{ role: 'tool', tool_call_id: call.id, content: "import React from 'react'; ..." },
],
tools,
max_tokens: 256,
});
console.log(followUp.text);
}Session management
const sessions = await client.sessions.list();
const state = await client.sessions.get(session.session_id);
await client.sessions.delete(session.session_id);Continuous streaming
Push OHLCV data into the lock-free ring buffer for background processing:
await client.sessions.push(session.session_id, [
{ o: 150.5, h: 151.0, l: 150.0, c: 150.75, v: 1_000_000, timestamp: 1704067200, sym: 'AAPL' },
]);
const status = await client.sessions.streamStatus(session.session_id);
const stats = await client.sessions.stats(session.session_id);Flash queries
Register questions that are automatically evaluated in the background after each data update. Answers are cached for near-instant retrieval:
const registered = await client.sessions.flash(session.session_id, 'Is the trend bullish?');
// optional third argument caps the answer length in tokens (default 32):
await client.sessions.flash(session.session_id, 'Summarise the trend', 64);
const queries = await client.sessions.listFlash(session.session_id);
await client.sessions.unflash(session.session_id, registered.id);Session events
Subscribe to real-time session events via SSE:
for await (const event of client.sessions.events(session.session_id)) {
if (event.type === 'flash_ready') {
console.log(event.query, event.value, event.confidence);
} else if (event.type === 'data_updated') {
console.log('new data version:', event.data_version);
}
}WebSocket streaming
A WebSocket combines data-push and event subscription on one connection. It does not auto-reconnect — wrap it in your own backoff loop for long-lived consumers:
const socket = client.sessions.stream(session.session_id);
socket.on('open', () => socket.push([candle]));
socket.on('flash_ready', (data) => console.log(data.query, data.value));
socket.on('data_updated', (data) => console.log('version:', data.data_version));
socket.on('error', (err) => console.error(err.message));
socket.on('close', () => {/* reconnect here if needed */});Cancellation and timeouts
Every request method accepts an optional { signal } for cancellation. Non-streaming requests also have a default timeout (10 min by default, configurable via timeoutMs on the constructor; pass 0 to disable).
const client = new LayerScale(baseUrl, {
apiKey: '...',
timeoutMs: 30_000, // 30s default for non-streaming calls
});
const ac = new AbortController();
setTimeout(() => ac.abort(), 5_000);
await client.chat({ messages }, { signal: ac.signal });Error handling
import { LayerScaleError } from '@layerscale/layerscale';
try {
await client.sessions.query(sessionId, { max_tokens: 256 });
} catch (err) {
if (err instanceof LayerScaleError) {
console.error(err.status, err.body.error.message);
}
}API
| Method | Endpoint | Description |
|--------|----------|-------------|
| health() | GET /v1/health | Check whether the server is ready to serve requests. |
| models() | GET /v1/models | List the models the server has loaded. |
| chat(params) | POST /v1/chat/completions | OpenAI-compatible chat completion (single response). |
| chatStream(params) | POST /v1/chat/completions (streaming) | OpenAI-compatible chat completion streamed as SSE chunks. |
| complete(params) | POST /v1/completions | OpenAI-compatible legacy text completion. |
| message(params) | POST /v1/messages | Anthropic-compatible messages API (single response). |
| messageStream(params) | POST /v1/messages (streaming) | Anthropic-compatible messages API streamed as SSE chunks. |
| sessions.create(params) | POST /v1/sessions/init | Create a streaming session with a system prompt and optional flash queries. |
| sessions.list() | GET /v1/sessions | List all active sessions on the server. |
| sessions.get(id) | GET /v1/sessions/:id/state | Inspect a session's state, token counts, and decoded context. |
| sessions.delete(id) | DELETE /v1/sessions/:id | Delete a session and free its GPU context. |
| sessions.append(id, text) | POST /v1/sessions/:id/append | Tokenize and append raw text to the session context. |
| sessions.query(id, params) | POST /v1/sessions/:id/generate | Run a generation against the session's current context. |
| sessions.queryStream(id, params) | POST /v1/sessions/:id/generate (streaming) | Stream a generation token-by-token as SSE chunks. |
| sessions.markPrefix(id) | POST /v1/sessions/:id/mark_prefix | Freeze the current position as a reusable cache prefix. |
| sessions.push(id, data, { wait? }) | POST /v1/sessions/:id/stream/push | Push streaming data (OHLCV, IoT, vitals, etc.) into the session's ring buffer. |
| sessions.streamStatus(id) | GET /v1/sessions/:id/stream/status | Get processor status, queue depth, and ingestion metrics. |
| sessions.stats(id) | GET /v1/sessions/:id/stats | Get computed statistics over the session's ingested data. |
| sessions.flash(id, query, maxTokens?) | POST /v1/sessions/:id/flash | Register a flash query evaluated automatically after each data update. |
| sessions.listFlash(id) | GET /v1/sessions/:id/flash | List the session's flash queries and their cached answers. |
| sessions.unflash(id, queryId) | DELETE /v1/sessions/:id/flash/:queryId | Remove a flash query from the session. |
| sessions.events(id) | GET /v1/sessions/:id/events (SSE) | Subscribe to session events (flash_ready, data_updated) over SSE. |
| sessions.stream(id) | WS /v1/sessions/:id/ws | Open a WebSocket for bidirectional data push and event delivery. |
