kvat

v0.1.4

Published

25 days ago

Automatic KV-Cache Optimization for HuggingFace Transformers - Find the optimal cache strategy, attention backend, and configuration for your model and hardware.

0High
0Medium
0Low

german-ocr

transformers llm kv-cache optimization inference huggingface deep-learning machine-learning pytorch flash-attention ai ml

kvat - KVCache Auto-Tuner

Automatic KV-Cache Optimization for HuggingFace Transformers

Find the optimal cache strategy, attention backend, and configuration for your model and hardware.

Requirements

Node.js 14.0+
Python 3.9+
PyTorch 2.0+
Transformers 4.35+

Installation

npm install kvat

Then install the Python package:

pip install kvat[full]

CLI Usage

# Optimize any HuggingFace model
kvat tune meta-llama/Llama-3.2-1B --profile chat-agent

# Quick test
kvat tune gpt2 --profile ci-micro -v

# Show system info
kvat info

JavaScript API

const kvat = require('kvat');

// Check if kvat is installed
if (!kvat.isKvatInstalled()) {
  await kvat.installKvat();  // Install Python package
}

// Run tuning
const result = await kvat.tune('gpt2', {
  profile: 'chat-agent',
  outputDir: './results',
  verbose: true
});

console.log('Results saved to:', result.outputDir);

// Get system info
const info = await kvat.info();
console.log(info);

// Run arbitrary command
const { stdout, stderr, code } = await kvat.run(['profiles']);

API Reference

`isKvatInstalled()`

Check if the kvat Python package is installed.

Returns: boolean

`installKvat(full = true)`

Install the kvat Python package.

full (boolean): Install with full dependencies (default: true)

Returns: Promise<void>

`tune(modelId, options)`

Run kvat tune command.

modelId (string): HuggingFace model ID
options.profile (string): Profile name (default: 'chat-agent')
options.device (string): Device cuda/cpu/mps (default: 'cuda')
options.outputDir (string): Output directory (default: './kvat_results')
options.verbose (boolean): Verbose output (default: false)

Returns: Promise<{success: boolean, outputDir: string, stdout: string, stderr: string}>

`info()`

Get system information.

Returns: Promise<string>

`run(args)`

Run arbitrary kvat command.

args (string[]): Command arguments

Returns: Promise<{stdout: string, stderr: string, code: number}>

Available Profiles

| Profile | Context | Output | Focus | |---------|---------|--------|-------| | chat-agent | 2-8K | 64-256 | TTFT (latency) | | rag | 8-32K | 256-512 | Balanced | | longform | 4-8K | 1-2K | Throughput | | ci-micro | 512 | 32 | Quick testing |

License

Apache 2.0

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme