inferis-vue

v1.0.0

Published

2 months ago

Vue 3 composables and plugin for inferis-ml

Downloads

0High
0Medium
0Low

pashunechka

inferis vue composables ml ai webgpu wasm

inferis-vue

Vue 3 composables and plugin for inferis-ml -- run AI models directly in the browser with WebGPU/WASM.

Install

npm install inferis-vue inferis-ml

Quick Start

<script setup lang="ts">
import { provideInferis, useModel, useStream } from 'inferis-vue';
import { webLlmAdapter } from 'inferis-ml/adapters/web-llm';

provideInferis({ adapter: webLlmAdapter() });

const { model, state, progress } = useModel('text-generation', {
  model: 'Llama-3.2-1B-Instruct-q4f16_1-MLC',
  autoLoad: true,
});

const { text, isStreaming, start, stop } = useStream(model);

function generate() {
  start({ prompt: 'Explain quantum computing in 3 sentences' });
}
</script>

<template>
  <div v-if="state === 'loading'">
    Loading model... {{ progress ? Math.round((progress.loaded / (progress.total || 1)) * 100) : 0 }}%
  </div>
  <div v-else-if="state === 'error'">Failed to load model</div>
  <div v-else>
    <button @click="generate">Generate</button>
    <button v-if="isStreaming" @click="stop">Stop</button>
    <p>{{ text }}</p>
  </div>
</template>

Setup

Two ways to provide the inferis context:

Option 1: Vue Plugin (global)

import { createApp } from 'vue';
import { inferisPlugin } from 'inferis-vue';
import { webLlmAdapter } from 'inferis-ml/adapters/web-llm';
import App from './App.vue';

const app = createApp(App);
app.use(inferisPlugin, {
  adapter: webLlmAdapter(),
  poolConfig: { maxMemoryMB: 4096, maxWorkers: 2 },
});
app.mount('#app');

Option 2: provideInferis (scoped)

<script setup>
import { provideInferis } from 'inferis-vue';
import { webLlmAdapter } from 'inferis-ml/adapters/web-llm';

provideInferis({ adapter: webLlmAdapter() });
</script>

Child components can then use any composable.

API Reference

`inferisPlugin`

Vue plugin. Installs via app.use(inferisPlugin, options).

| Option | Type | Description | |--------|------|-------------| | adapter | ModelAdapterFactory | Required. Adapter from inferis-ml/adapters/* | | poolConfig | Partial<PoolConfig> | Optional pool settings (memory limit, workers, device, etc.) |

`provideInferis(options)`

Call in a component's setup() to provide inferis context to descendants. Same options as the plugin. Automatically terminates the pool when the component scope is disposed.

`useInferis()`

Raw access to the worker pool.

const { pool, isReady } = useInferis();

| Field | Type | Description | |-------|------|-------------| | pool | ShallowRef<WorkerPool \| null> | Pool instance, null while initializing | | isReady | ComputedRef<boolean> | true when pool is created |

`useCapabilities()`

Device capability detection (WebGPU, WASM SIMD, SharedWorker, etc.).

const { capabilities, isLoading } = useCapabilities();

if (capabilities.value?.webgpu.supported) {
  console.log('GPU:', capabilities.value.webgpu.adapter?.vendor);
}

| Field | Type | Description | |-------|------|-------------| | capabilities | ShallowRef<CapabilityReport \| null> | Detection result | | isLoading | ComputedRef<boolean> | true while detecting |

`useModel(task, config)`

Load and manage a model lifecycle. Auto-disposes on scope destruction.

const { model, state, progress, error, load, dispose } = useModel('text-generation', {
  model: 'Llama-3.2-1B-Instruct-q4f16_1-MLC',
  autoLoad: true,
});

| Config | Type | Default | Description | |--------|------|---------|-------------| | model | string | -- | Model ID (HuggingFace ID, URL, etc.) | | autoLoad | boolean | false | Load model when pool is ready | | estimatedMemoryMB | number | -- | Memory hint for budget pre-eviction |

| Return | Type | Description | |--------|------|-------------| | model | ShallowRef<ModelHandle \| null> | Model handle for inference | | state | Ref<ModelState \| 'pending'> | Current lifecycle state | | progress | ShallowRef<LoadProgressEvent \| null> | Download/load progress | | error | ShallowRef<Error \| null> | Load error | | load() | () => Promise<void> | Manually trigger loading | | dispose() | () => Promise<void> | Unload model and free memory |

Progress example:

<template>
  <div v-if="state === 'loading' && progress">
    <div :style="{ width: `${pct}%`, height: '4px', background: '#3b82f6' }" />
    <p>{{ progress.phase }} -- {{ pct }}%</p>
  </div>
</template>

<script setup>
import { computed } from 'vue';
import { useModel } from 'inferis-vue';

const { state, progress } = useModel('text-generation', {
  model: 'Llama-3.2-1B-Instruct-q4f16_1-MLC',
  autoLoad: true,
});

const pct = computed(() =>
  progress.value ? Math.round((progress.value.loaded / (progress.value.total || 1)) * 100) : 0
);
</script>

`useInference<T>(model)`

Single (non-streaming) inference request.

const { result, error, isLoading, run, reset } = useInference(model);

const output = await run({ text: 'This movie is great!' });

| Return | Type | Description | |--------|------|-------------| | result | ShallowRef<T \| null> | Last inference result | | error | ShallowRef<Error \| null> | Last error | | isLoading | Ref<boolean> | Request in flight | | run(input, options?) | (input, opts?) => Promise<T> | Execute inference | | reset() | () => void | Clear result and error |

Supports AbortSignal via options:

const controller = new AbortController();
run(input, { signal: controller.signal });
controller.abort();

`useStream<T>(model)`

Streaming inference with chunk accumulation.

const { chunks, text, isStreaming, start, stop, reset } = useStream(model);

start({ prompt: 'Explain quantum computing' });

| Return | Type | Description | |--------|------|-------------| | chunks | ShallowRef<T[]> | All received chunks | | text | Ref<string> | Accumulated text (for string chunks) | | isStreaming | Ref<boolean> | Stream active | | start(input, options?) | (input, opts?) => void | Start streaming | | stop() | () => void | Abort stream | | reset() | () => void | Clear chunks/text, stop if active |

`useMemoryBudget(intervalMs?)`

Monitor memory usage across loaded models. Polls at the given interval (default 1000ms).

const { totalMB, allocatedMB, availableMB } = useMemoryBudget();

Adapters

inferis-ml ships three adapters. Pass any of them to the plugin or provideInferis:

import { webLlmAdapter } from 'inferis-ml/adapters/web-llm';
import { transformersAdapter } from 'inferis-ml/adapters/transformers';
import { onnxAdapter } from 'inferis-ml/adapters/onnx';

Examples

Image Classification

<script setup>
import { useModel, useInference } from 'inferis-vue';

const { model } = useModel('image-classification', {
  model: 'google/vit-base-patch16-224',
  autoLoad: true,
});

const { result, isLoading, run } = useInference(model);

async function classify(event) {
  const file = event.target.files?.[0];
  if (file) await run(file);
}
</script>

<template>
  <input type="file" accept="image/*" @change="classify" />
  <p v-if="isLoading">Classifying...</p>
  <p v-for="r in result" :key="r.label">{{ r.label }}: {{ (r.score * 100).toFixed(1) }}%</p>
</template>

Capability Gate

<script setup>
import { useCapabilities } from 'inferis-vue';

const { capabilities, isLoading } = useCapabilities();
</script>

<template>
  <p v-if="isLoading">Detecting device capabilities...</p>
  <p v-else-if="!capabilities?.webgpu.supported && !capabilities?.wasm.supported">
    Your browser does not support WebGPU or WASM.
  </p>
  <slot v-else />
</template>

Memory Monitor

<script setup>
import { computed } from 'vue';
import { useMemoryBudget } from 'inferis-vue';

const { totalMB, allocatedMB } = useMemoryBudget();
const pct = computed(() => totalMB.value ? Math.round((allocatedMB.value / totalMB.value) * 100) : 0);
</script>

<template>
  <div>
    <div :style="{ width: `${pct}%`, height: '4px', background: pct > 80 ? '#ef4444' : '#22c55e' }" />
    <p>{{ allocatedMB }} / {{ totalMB }} MB</p>
  </div>
</template>

Nuxt 3

ML inference runs in the browser via Web Workers -- there is no server-side model loading. Use <ClientOnly> to prevent SSR rendering of components that use inferis composables.

<!-- pages/ai.vue -->
<template>
  <ClientOnly>
    <AiChat />
  </ClientOnly>
</template>

Create a Nuxt plugin for global setup:

// plugins/inferis.client.ts
import { inferisPlugin } from 'inferis-vue';
import { webLlmAdapter } from 'inferis-ml/adapters/web-llm';

export default defineNuxtPlugin((nuxtApp) => {
  nuxtApp.vueApp.use(inferisPlugin, {
    adapter: webLlmAdapter(),
  });
});

The .client.ts suffix ensures the plugin only runs in the browser.

Requirements

Vue 3.3+
inferis-ml 1.0+
Browser with WebGPU or WASM support

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

inferis-vue

Install

Quick Start

Setup

Option 1: Vue Plugin (global)

Option 2: provideInferis (scoped)

API Reference

inferisPlugin

provideInferis(options)

useInferis()

useCapabilities()

useModel(task, config)

useInference<T>(model)

useStream<T>(model)

useMemoryBudget(intervalMs?)

Adapters

Examples

Image Classification

Capability Gate

Memory Monitor

Nuxt 3

Requirements

License

`inferisPlugin`

`provideInferis(options)`

`useInferis()`

`useCapabilities()`

`useModel(task, config)`

`useInference<T>(model)`

`useStream<T>(model)`

`useMemoryBudget(intervalMs?)`