vsegments

v0.1.6

Published

2 days ago

Visual segmentation and bounding box detection using Google Gemini AI

0High
0Medium
0Low

kotrotsos

image segmentation bounding-box object-detection google gemini ai computer-vision machine-learning

vsegments (Node.js)

Visual segmentation and bounding box detection using Google Gemini AI

vsegments is a powerful Node.js library and CLI tool that leverages Google's Gemini AI models to perform advanced visual segmentation and object detection on images. It provides an easy-to-use interface for detecting bounding boxes and generating segmentation masks with high accuracy.

Features

🎯 Bounding Box Detection: Automatically detect and label objects in images
🎨 Segmentation Masks: Generate precise segmentation masks for identified objects
🖼️ Visualization: Beautiful visualization with customizable colors, fonts, and transparency
📐 SVG Support: Automatic conversion of SVG files to raster format
🛠️ CLI Tool: Powerful command-line interface for batch processing
📦 Library: Clean JavaScript API for integration into your projects
🚀 Multiple Models: Support for various Gemini models (Flash, Pro, etc.)
⚙️ Customizable: Fine-tune prompts, system instructions, and output settings
📊 JSON Export: Export detection results in structured JSON format

Installation

From npm (Recommended)

npm install vsegments

Global Installation (for CLI)

npm install -g vsegments

From Source

git clone [email protected]:nxtphaseai/vsegments.git
cd node_vsegments
npm install
npm link

Quick Start

Prerequisites

You need a Google API key to use this library. Get one from Google AI Studio.

Set your API key as an environment variable:

export GOOGLE_API_KEY="your-api-key-here"

CLI Usage

Basic Bounding Box Detection

vsegments -f image.jpg

Save Output Image

vsegments -f image.jpg -o output.jpg

Perform Segmentation

vsegments -f image.jpg --segment -o segmented.jpg

Custom Prompt

vsegments -f image.jpg -p "Find all people wearing red shirts"

Export JSON Results

vsegments -f image.jpg --json results.json

Compact Output

vsegments -f image.jpg --compact

Library Usage

Basic Detection

const VSegments = require('vsegments');

// Initialize
const vs = new VSegments({ apiKey: 'your-api-key' });

// Detect bounding boxes
const result = await vs.detectBoxes('image.jpg');

// Print results
console.log(`Found ${result.boxes.length} objects`);
result.boxes.forEach(box => {
  console.log(`  - ${box.label}`);
});

// Visualize
await vs.visualize('image.jpg', result, { outputPath: 'output.jpg' });

Advanced Detection

const VSegments = require('vsegments');

// Initialize with custom settings
const vs = new VSegments({
  apiKey: 'your-api-key',
  model: 'gemini-2.5-pro',
  temperature: 0.7,
  maxObjects: 50
});

// Detect with custom prompt and instructions
const result = await vs.detectBoxes('image.jpg', {
  prompt: 'Find all vehicles in the image',
  customInstructions: 'Focus on cars, trucks, and motorcycles. Ignore bicycles.'
});

// Access individual boxes
result.boxes.forEach(box => {
  console.log(`${box.label}: [${box.x1}, ${box.y1}] -> [${box.x2}, ${box.y2}]`);
});

Segmentation

const VSegments = require('vsegments');

const vs = new VSegments({ apiKey: 'your-api-key' });

// Perform segmentation
const result = await vs.segment('image.jpg');

// Visualize with custom settings
await vs.visualize('image.jpg', result, {
  outputPath: 'segmented.jpg',
  lineWidth: 6,
  fontSize: 18,
  alpha: 0.6
});

CLI Reference

Required Arguments

-f, --file <image>: Path to input image file

Mode Options

--segment: Perform segmentation instead of bounding box detection

API Options

--api-key <key>: Google API key (default: GOOGLE_API_KEY env var)
-m, --model <model>: Model name (default: gemini-3-pro-preview)
--temperature <temp>: Sampling temperature 0.0-1.0 (default: 0.5)
--max-objects <n>: Maximum objects to detect (default: 25)

Prompt Options

-p, --prompt <text>: Custom detection prompt
--instructions <text>: Additional system instructions for grounding

Output Options

-o, --output <file>: Save visualized output to file
--json <file>: Export results as JSON
--no-show: Don't display the output image
--raw: Print raw API response

Visualization Options

--line-width <n>: Bounding box line width (default: 4)
--font-size <n>: Label font size (default: 14)
--alpha <a>: Mask transparency 0.0-1.0 (default: 0.7)
--max-size <n>: Maximum image dimension for processing (default: 1024)

Other Options

-V, --version: Show version information
-q, --quiet: Suppress informational output
--compact: Compact output format
-h, --help: Show help message

API Reference

`VSegments` Class

Constructor

new VSegments({
  apiKey: String,          // Optional (defaults to GOOGLE_API_KEY env var)
  model: String,           // Optional (default: 'gemini-flash-latest')
  temperature: Number,     // Optional (default: 0.5)
  maxObjects: Number       // Optional (default: 25)
})

Methods

`detectBoxes()`

Detect bounding boxes in an image.

await vs.detectBoxes(imagePath, {
  prompt: String,              // Optional custom prompt
  customInstructions: String,  // Optional system instructions
  maxSize: Number             // Optional (default: 1024)
})

Returns: Promise<SegmentationResult>

`segment()`

Perform segmentation on an image.

await vs.segment(imagePath, {
  prompt: String,    // Optional custom prompt
  maxSize: Number   // Optional (default: 1024)
})

Returns: Promise<SegmentationResult>

`visualize()`

Visualize detection/segmentation results.

await vs.visualize(imagePath, result, {
  outputPath: String,   // Optional output file path
  lineWidth: Number,    // Optional (default: 4)
  fontSize: Number,     // Optional (default: 14)
  alpha: Number        // Optional (default: 0.7)
})

Returns: Promise<Canvas>

Data Models

`BoundingBox`

{
  label: String,
  y1: Number,  // Normalized 0-1000
  x1: Number,
  y2: Number,
  x2: Number,
  
  toAbsolute(imgWidth, imgHeight)  // Returns [absX1, absY1, absX2, absY2]
}

`SegmentationResult`

{
  boxes: BoundingBox[],
  masks: SegmentationMask[] | null,
  rawResponse: String | null,
  length: Number  // Number of detected objects
}

Examples

See the examples/ directory for complete working examples:

basic.js - Basic object detection
segmentation.js - Image segmentation with masks

Run examples:

cd examples
node basic.js path/to/image.jpg
node segmentation.js path/to/image.jpg

Supported Models

gemini-flash-latest (default, fastest)
gemini-2.0-flash
gemini-2.5-flash-lite
gemini-2.5-flash
gemini-2.5-pro (best quality, slower)

Note: Segmentation features require 2.5 models or later.

Requirements

Node.js 16.0.0 or higher
Dependencies:
- @google/generative-ai ^0.21.0
- canvas ^2.11.2
- commander ^12.0.0
- sharp ^0.33.0 (for SVG support and better compatibility)

Publishing to npm

1. Build and Test

npm install
npm test

2. Update Version

Edit package.json and update the version number.

3. Login to npm

npm login

4. Publish

npm publish

5. Verify

npm info vsegments

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Troubleshooting

Common Issues

500 Internal Server Error

If you get a 500 error from the Google Gemini API:

Try a different model:

const vs = new VSegments({ 
  apiKey: 'YOUR_API_KEY',
  model: 'gemini-3-pro-preview'  // default model
});

Check your image: Ensure it's under 4MB and in a supported format (JPG, PNG, GIF, WEBP)
Wait and retry: The API may be experiencing temporary issues
Verify API key: Make sure your API key is valid and has proper permissions

For more detailed troubleshooting, see TROUBLESHOOTING.md

Recommended Models

Default (High quality): gemini-3-pro-preview
Alternative: gemini-2.5-flash

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built using Google Gemini AI
Inspired by the Google AI Cookbook

Support

Issues: GitHub Issues
Documentation: GitHub README
Troubleshooting: TROUBLESHOOTING.md

Made with ❤️ by Marco Kotrotsos