@layerscale/layerscale

v0.3.0

Published

6 hours ago

Client for the LayerScale inference server

0High
0Medium
0Low

logotype

@layerscale/layerscale

TypeScript client for the LayerScale inference server.

Install

npm install @layerscale/layerscale

Get a license key

You need a LayerScale license key to authenticate. Grab a free one at layerscale.ai/get-license, it takes about 10 seconds.

Usage

import { LayerScale } from '@layerscale/layerscale';

const client = new LayerScale('http://localhost:8080', {
  apiKey: 'LS-...', // or set LAYERSCALE_LICENSE_KEY env var
});

Complete example

End-to-end: create a session, push a few OHLCV candles, wait for them to be processed, and ask the model about the data.

import { LayerScale } from '@layerscale/layerscale';

const client = new LayerScale('http://localhost:8080', {
  apiKey: process.env.LAYERSCALE_LICENSE_KEY,
});

async function main() {
  // 1. Create a session with a system prompt and freeze it in the cache
  //    so it is not reprocessed on every incoming data update.
  const session = await client.sessions.create({
    type: 'ohlcv',
    prompt:
      'You are a real-time market analyst. You receive live OHLCV candles and ' +
      'answer questions about market direction in a single word when possible.',
    context: 4096,
    markPrefix: true,
  });

  // 2. Push a few OHLCV candles and wait for them to be decoded into the cache.
  //    For a reactive alternative, subscribe to the `data_updated` event on
  //    client.sessions.events() or client.sessions.stream() instead of using `wait`.
  const candles = [
    { o: 150.20, h: 150.80, l: 150.10, c: 150.70, v: 1_200_000, timestamp: 1_733_000_000, sym: 'AAPL' },
    { o: 150.70, h: 151.10, l: 150.60, c: 150.95, v:   900_000, timestamp: 1_733_000_060, sym: 'AAPL' },
    { o: 150.95, h: 151.40, l: 150.90, c: 151.30, v: 1_500_000, timestamp: 1_733_000_120, sym: 'AAPL' },
    { o: 151.30, h: 151.80, l: 151.20, c: 151.75, v: 1_100_000, timestamp: 1_733_000_180, sym: 'AAPL' },
    { o: 151.75, h: 152.10, l: 151.60, c: 152.00, v: 1_300_000, timestamp: 1_733_000_240, sym: 'AAPL' },
  ];
  await client.sessions.push(session.session_id, candles, { wait: true });

  // 3. Query the model about the data we just ingested.
  const answer = await client.sessions.query(session.session_id, {
    prompt: 'Is the market trending up or down?',
    max_tokens: 8,
    fast_answer: true,
  });
  console.log('Answer:', answer.text.trim());

  await client.sessions.delete(session.session_id);
}

main().catch((err) => {
  console.error(err);
  process.exit(1);
});

OpenAI-compatible chat

const chat = await client.chat({
  messages: [{ role: 'user', content: 'Hello' }],
});

Anthropic-compatible messages

const msg = await client.message({
  messages: [{ role: 'user', content: 'Hello' }],
  max_tokens: 256,
});

Streaming

All streaming methods return async generators:

for await (const chunk of client.chatStream({ messages })) {
  process.stdout.write(chunk.choices[0]?.delta.content ?? '');
}

for await (const chunk of client.messageStream({ messages, max_tokens: 256 })) {
  if (chunk.type === 'content_block_delta') {
    process.stdout.write(chunk.delta.text);
  }
}

Sessions

const session = await client.sessions.create({
  type: 'ohlcv',
  prompt: 'You are a trading analyst.',
  flash: [
    { query: 'Is the trend bullish?' },
    { query: 'Is volume increasing?' },
  ],
  markPrefix: true,
});

const answer = await client.sessions.query(session.session_id, {
  prompt: 'What is the trend?',
  max_tokens: 256,
});

// Stream generation token-by-token
for await (const chunk of client.sessions.queryStream(session.session_id, { max_tokens: 256 })) {
  if (chunk.token) process.stdout.write(chunk.token);
}

Tool calling in sessions

Pass OpenAI-format messages + tools. The server applies the model's Jinja chat template, diffs against the session's cached tokens, and only decodes the delta. The tool_call_guide is cached per session so repeated turns with the same tool set skip re-tokenising tool names.

const tools = [{
  type: 'function' as const,
  function: {
    name: 'read_file',
    description: 'Read a file',
    parameters: {
      type: 'object',
      properties: { path: { type: 'string' } },
      required: ['path'],
    },
  },
}];

// Turn 1: ask the model to pick a tool.
const resp = await client.sessions.query(session.session_id, {
  messages: [
    { role: 'system', content: 'You are a coding agent.' },
    { role: 'user',   content: 'Read src/App.tsx' },
  ],
  tools,
  max_tokens: 256,
});

if (resp.tool_calls?.length) {
  const call = resp.tool_calls[0];
  console.log(`Tool: ${call.function.name}(${call.function.arguments})`);

  // Turn 2: feed the tool result back. Include the assistant's prior
  // tool_call so the template renders the matching tool_call_id header.
  const followUp = await client.sessions.query(session.session_id, {
    messages: [
      { role: 'system', content: 'You are a coding agent.' },
      { role: 'user',   content: 'Read src/App.tsx' },
      { role: 'assistant', content: null, tool_calls: [call] },
      { role: 'tool', tool_call_id: call.id, content: "import React from 'react'; ..." },
    ],
    tools,
    max_tokens: 256,
  });
  console.log(followUp.text);
}

Session management

const sessions = await client.sessions.list();
const state = await client.sessions.get(session.session_id);
await client.sessions.delete(session.session_id);

Continuous streaming

Push OHLCV data into the lock-free ring buffer for background processing:

await client.sessions.push(session.session_id, [
  { o: 150.5, h: 151.0, l: 150.0, c: 150.75, v: 1_000_000, timestamp: 1704067200, sym: 'AAPL' },
]);

const status = await client.sessions.streamStatus(session.session_id);
const stats  = await client.sessions.stats(session.session_id);

Flash queries

Register questions that are automatically evaluated in the background after each data update. Answers are cached for near-instant retrieval:

const registered = await client.sessions.flash(session.session_id, 'Is the trend bullish?');
// optional third argument caps the answer length in tokens (default 32):
await client.sessions.flash(session.session_id, 'Summarise the trend', 64);

const queries = await client.sessions.listFlash(session.session_id);
await client.sessions.unflash(session.session_id, registered.id);

Session events

Subscribe to real-time session events via SSE:

for await (const event of client.sessions.events(session.session_id)) {
  if (event.type === 'flash_ready') {
    console.log(event.query, event.value, event.confidence);
  } else if (event.type === 'data_updated') {
    console.log('new data version:', event.data_version);
  }
}

WebSocket streaming

A WebSocket combines data-push and event subscription on one connection. It does not auto-reconnect — wrap it in your own backoff loop for long-lived consumers:

const socket = client.sessions.stream(session.session_id);

socket.on('open', () => socket.push([candle]));
socket.on('flash_ready', (data) => console.log(data.query, data.value));
socket.on('data_updated', (data) => console.log('version:', data.data_version));
socket.on('error', (err) => console.error(err.message));
socket.on('close', () => {/* reconnect here if needed */});

Cancellation and timeouts

Every request method accepts an optional { signal } for cancellation. Non-streaming requests also have a default timeout (10 min by default, configurable via timeoutMs on the constructor; pass 0 to disable).

const client = new LayerScale(baseUrl, {
  apiKey: '...',
  timeoutMs: 30_000, // 30s default for non-streaming calls
});

const ac = new AbortController();
setTimeout(() => ac.abort(), 5_000);
await client.chat({ messages }, { signal: ac.signal });

Error handling

import { LayerScaleError } from '@layerscale/layerscale';

try {
  await client.sessions.query(sessionId, { max_tokens: 256 });
} catch (err) {
  if (err instanceof LayerScaleError) {
    console.error(err.status, err.body.error.message);
  }
}

API

| Method | Endpoint | Description | |--------|----------|-------------| | health() | GET /v1/health | Check whether the server is ready to serve requests. | | models() | GET /v1/models | List the models the server has loaded. | | chat(params) | POST /v1/chat/completions | OpenAI-compatible chat completion (single response). | | chatStream(params) | POST /v1/chat/completions (streaming) | OpenAI-compatible chat completion streamed as SSE chunks. | | complete(params) | POST /v1/completions | OpenAI-compatible legacy text completion. | | message(params) | POST /v1/messages | Anthropic-compatible messages API (single response). | | messageStream(params) | POST /v1/messages (streaming) | Anthropic-compatible messages API streamed as SSE chunks. | | sessions.create(params) | POST /v1/sessions/init | Create a streaming session with a system prompt and optional flash queries. | | sessions.list() | GET /v1/sessions | List all active sessions on the server. | | sessions.get(id) | GET /v1/sessions/:id/state | Inspect a session's state, token counts, and decoded context. | | sessions.delete(id) | DELETE /v1/sessions/:id | Delete a session and free its GPU context. | | sessions.append(id, text) | POST /v1/sessions/:id/append | Tokenize and append raw text to the session context. | | sessions.query(id, params) | POST /v1/sessions/:id/generate | Run a generation against the session's current context. | | sessions.queryStream(id, params) | POST /v1/sessions/:id/generate (streaming) | Stream a generation token-by-token as SSE chunks. | | sessions.markPrefix(id) | POST /v1/sessions/:id/mark_prefix | Freeze the current position as a reusable cache prefix. | | sessions.push(id, data, { wait? }) | POST /v1/sessions/:id/stream/push | Push streaming data (OHLCV, IoT, vitals, etc.) into the session's ring buffer. | | sessions.streamStatus(id) | GET /v1/sessions/:id/stream/status | Get processor status, queue depth, and ingestion metrics. | | sessions.stats(id) | GET /v1/sessions/:id/stats | Get computed statistics over the session's ingested data. | | sessions.flash(id, query, maxTokens?) | POST /v1/sessions/:id/flash | Register a flash query evaluated automatically after each data update. | | sessions.listFlash(id) | GET /v1/sessions/:id/flash | List the session's flash queries and their cached answers. | | sessions.unflash(id, queryId) | DELETE /v1/sessions/:id/flash/:queryId | Remove a flash query from the session. | | sessions.events(id) | GET /v1/sessions/:id/events (SSE) | Subscribe to session events (flash_ready, data_updated) over SSE. | | sessions.stream(id) | WS /v1/sessions/:id/ws | Open a WebSocket for bidirectional data push and event delivery. |

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@layerscale/layerscale

Install

Get a license key

Usage

Complete example

OpenAI-compatible chat

Anthropic-compatible messages

Streaming

Sessions

Tool calling in sessions

Session management

Continuous streaming

Flash queries

Session events

WebSocket streaming

Cancellation and timeouts

Error handling

API