npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2026 – Pkg Stats / Ryan Hefner

@fugood/buttress-server

v2.24.2

Published

A high-performance RPC server for managing GGML LLM generators with configurable defaults and runtime management.

Downloads

1,685

Readme

Buttress Server

A high-performance RPC server for managing GGML LLM generators with configurable defaults and runtime management.

Installation

npm install -g @fugood/buttress-server

Quick Start

Using CLI

# Start with config file
npx bricks-buttress --config ./config.toml

# Start without config (uses env vars and defaults)
npx bricks-buttress

Configuration

Configuration can be provided via:

  • --config / -c flag with TOML file path

Configuration Format (TOML)

# Environment variables (only set if not already defined in system)
[env]
HF_TOKEN = "your_huggingface_token_here"
CUDA_VISIBLE_DEVICES = "0"

[server]
port = 2080
log_level = "info"

[runtime]
cache_dir = "~/.buttress/models"

# Session state cache for ggml-llm (saves KV cache to disk for prompt reuse)
[runtime.session_cache]
enabled = true
max_size_bytes = "10GB"  # Supports string (e.g., "10GB", "500MB") or number
max_entries = 1000

# GGML LLM generator
[[generators]]
type = "ggml-llm"
[generators.backend]
variant_preference = ["cuda", "vulkan", "default"]
[generators.model]
repo_id = "ggml-org/gpt-oss-20b-GGUF"
quantization = "mxfp4"
n_ctx = 12800

# GGML STT (Speech-to-Text) generator
[[generators]]
type = "ggml-stt"
[generators.backend]
variant_preference = ["coreml", "default"]
[generators.model]
repo_id = "BricksDisplay/whisper-ggml"
filename = "ggml-small.bin"

Programmatic Usage

import { startServer } from '@fugood/buttress-server'

startServer({
  port: 3000,
  defaultConfig: {
    runtime: {
      cache_dir: './.buttress-cache'
    },
    generators: [
      {
        type: 'ggml-llm',
        model: {
          repo_id: 'ggml-org/gemma-3-270m-qat-GGUF',
          quantization: 'mxfp4',
        }
      }
    ]
  }
})
  .then(({ port }) => {
    console.log(`Server running on port ${port}`)
  })
  .catch(console.error)

Environment Variable Priority

Environment variables can be set in the [env] section of the TOML config. These values will only be applied if the environment variable is not already set in the system. This allows:

  1. Default values in config file
  2. System environment variables to override config values
  3. Command-line exports to have highest priority

Example:

# Config has: [env] HF_TOKEN = "default_token"

# This will use the system env variable (highest priority)
HF_TOKEN=my_token npx bricks-buttress

# This will use the config value
npx bricks-buttress

Port Priority

Port can be configured via multiple sources (highest priority first):

  1. Command-line flag: --port 3000
  2. Config file: [server] port = 2080
  3. Default: 2080

CLI Reference

bricks-buttress v2.23.0-beta.22

Buttress server for remote inference with GGML backends.

Usage:
  bricks-buttress [options]

Options:
  -h, --help                    Show this help message
  -v, --version                 Show version number
  -p, --port <port>             Port to listen on (default: 2080)
  -c, --config <path|toml>      Path to TOML config file or inline TOML string

Testing Options:
  --test-caps <backend>         Test model capabilities (ggml-llm or ggml-stt)
  --test-caps-model-id <id>     Model ID to test (used with --test-caps)
  --test-models <ids>           Comma-separated list of model IDs to test
  --test-models-default         Test default set of models

  Note: --test-models and --test-models-default output a markdown report
        file (e.g., ggml-llm-model-capabilities-YYYY-MM-DD.md)

Environment Variables:
  NODE_ENV                      Set to 'development' for dev mode

Examples:
  bricks-buttress
  bricks-buttress --port 3000
  bricks-buttress --config ./config.toml
  bricks-buttress --test-caps ggml-llm --test-models-default
  bricks-buttress --test-caps ggml-stt --test-caps-model-id BricksDisplay/whisper-ggml:ggml-small.bin

Session State Cache

The server supports session state caching for ggml-llm generators, which saves KV cache state to disk after completions. This enables:

  • Prompt reuse: Same or similar prompts can reuse cached state, skipping prompt processing
  • Multi-turn conversations: Conversation history state is preserved across requests

Configuration

[runtime.session_cache]
enabled = true                  # Enable/disable session caching (default: true)
max_size_bytes = "10GB"         # Supports string (e.g., "10GB", "500MB") or number (default: 10GB)
max_entries = 1000              # Max number of cached entries (default: 1000)

How it works

  1. After a successful completion, the KV cache state is saved to disk
  2. On new completions, the server checks if any cached state matches the prompt prefix
  3. If a match is found, the cached state is loaded, skipping redundant prompt processing
  4. LRU eviction removes oldest entries when limits are exceeded

Cache location

Cache files are stored in {cache_dir}/.session-state-cache/:

  • cache-map.json - Index of cached entries
  • states/ - Binary state files
  • temp/ - Temporary files (auto-cleaned after 1 hour)

Tips

  • macOS: Use sudo sysctl iogpu.wired_limit_mb=<number> to increase GPU memory allocation. The default available memory of GPU is about ~70%. For example, if the hardware have 128GB memory, you can use sudo sysctl iogpu.wired_limit_mb=137438 to increase to 128GB. Run sudo sysctl iogpu.wired_limit_mb=0 if you want to back to default.