vsegments
v0.1.6
Published
Visual segmentation and bounding box detection using Google Gemini AI
Maintainers
Readme
vsegments (Node.js)
Visual segmentation and bounding box detection using Google Gemini AI
vsegments is a powerful Node.js library and CLI tool that leverages Google's Gemini AI models to perform advanced visual segmentation and object detection on images. It provides an easy-to-use interface for detecting bounding boxes and generating segmentation masks with high accuracy.
Features
- 🎯 Bounding Box Detection: Automatically detect and label objects in images
- 🎨 Segmentation Masks: Generate precise segmentation masks for identified objects
- 🖼️ Visualization: Beautiful visualization with customizable colors, fonts, and transparency
- 📐 SVG Support: Automatic conversion of SVG files to raster format
- 🛠️ CLI Tool: Powerful command-line interface for batch processing
- 📦 Library: Clean JavaScript API for integration into your projects
- 🚀 Multiple Models: Support for various Gemini models (Flash, Pro, etc.)
- ⚙️ Customizable: Fine-tune prompts, system instructions, and output settings
- 📊 JSON Export: Export detection results in structured JSON format
Installation
From npm (Recommended)
npm install vsegmentsGlobal Installation (for CLI)
npm install -g vsegmentsFrom Source
git clone [email protected]:nxtphaseai/vsegments.git
cd node_vsegments
npm install
npm linkQuick Start
Prerequisites
You need a Google API key to use this library. Get one from Google AI Studio.
Set your API key as an environment variable:
export GOOGLE_API_KEY="your-api-key-here"CLI Usage
Basic Bounding Box Detection
vsegments -f image.jpgSave Output Image
vsegments -f image.jpg -o output.jpgPerform Segmentation
vsegments -f image.jpg --segment -o segmented.jpgCustom Prompt
vsegments -f image.jpg -p "Find all people wearing red shirts"Export JSON Results
vsegments -f image.jpg --json results.jsonCompact Output
vsegments -f image.jpg --compactLibrary Usage
Basic Detection
const VSegments = require('vsegments');
// Initialize
const vs = new VSegments({ apiKey: 'your-api-key' });
// Detect bounding boxes
const result = await vs.detectBoxes('image.jpg');
// Print results
console.log(`Found ${result.boxes.length} objects`);
result.boxes.forEach(box => {
console.log(` - ${box.label}`);
});
// Visualize
await vs.visualize('image.jpg', result, { outputPath: 'output.jpg' });Advanced Detection
const VSegments = require('vsegments');
// Initialize with custom settings
const vs = new VSegments({
apiKey: 'your-api-key',
model: 'gemini-2.5-pro',
temperature: 0.7,
maxObjects: 50
});
// Detect with custom prompt and instructions
const result = await vs.detectBoxes('image.jpg', {
prompt: 'Find all vehicles in the image',
customInstructions: 'Focus on cars, trucks, and motorcycles. Ignore bicycles.'
});
// Access individual boxes
result.boxes.forEach(box => {
console.log(`${box.label}: [${box.x1}, ${box.y1}] -> [${box.x2}, ${box.y2}]`);
});Segmentation
const VSegments = require('vsegments');
const vs = new VSegments({ apiKey: 'your-api-key' });
// Perform segmentation
const result = await vs.segment('image.jpg');
// Visualize with custom settings
await vs.visualize('image.jpg', result, {
outputPath: 'segmented.jpg',
lineWidth: 6,
fontSize: 18,
alpha: 0.6
});CLI Reference
Required Arguments
-f, --file <image>: Path to input image file
Mode Options
--segment: Perform segmentation instead of bounding box detection
API Options
--api-key <key>: Google API key (default:GOOGLE_API_KEYenv var)-m, --model <model>: Model name (default:gemini-3-pro-preview)--temperature <temp>: Sampling temperature 0.0-1.0 (default: 0.5)--max-objects <n>: Maximum objects to detect (default: 25)
Prompt Options
-p, --prompt <text>: Custom detection prompt--instructions <text>: Additional system instructions for grounding
Output Options
-o, --output <file>: Save visualized output to file--json <file>: Export results as JSON--no-show: Don't display the output image--raw: Print raw API response
Visualization Options
--line-width <n>: Bounding box line width (default: 4)--font-size <n>: Label font size (default: 14)--alpha <a>: Mask transparency 0.0-1.0 (default: 0.7)--max-size <n>: Maximum image dimension for processing (default: 1024)
Other Options
-V, --version: Show version information-q, --quiet: Suppress informational output--compact: Compact output format-h, --help: Show help message
API Reference
VSegments Class
Constructor
new VSegments({
apiKey: String, // Optional (defaults to GOOGLE_API_KEY env var)
model: String, // Optional (default: 'gemini-flash-latest')
temperature: Number, // Optional (default: 0.5)
maxObjects: Number // Optional (default: 25)
})Methods
detectBoxes()
Detect bounding boxes in an image.
await vs.detectBoxes(imagePath, {
prompt: String, // Optional custom prompt
customInstructions: String, // Optional system instructions
maxSize: Number // Optional (default: 1024)
})Returns: Promise<SegmentationResult>
segment()
Perform segmentation on an image.
await vs.segment(imagePath, {
prompt: String, // Optional custom prompt
maxSize: Number // Optional (default: 1024)
})Returns: Promise<SegmentationResult>
visualize()
Visualize detection/segmentation results.
await vs.visualize(imagePath, result, {
outputPath: String, // Optional output file path
lineWidth: Number, // Optional (default: 4)
fontSize: Number, // Optional (default: 14)
alpha: Number // Optional (default: 0.7)
})Returns: Promise<Canvas>
Data Models
BoundingBox
{
label: String,
y1: Number, // Normalized 0-1000
x1: Number,
y2: Number,
x2: Number,
toAbsolute(imgWidth, imgHeight) // Returns [absX1, absY1, absX2, absY2]
}SegmentationResult
{
boxes: BoundingBox[],
masks: SegmentationMask[] | null,
rawResponse: String | null,
length: Number // Number of detected objects
}Examples
See the examples/ directory for complete working examples:
basic.js- Basic object detectionsegmentation.js- Image segmentation with masks
Run examples:
cd examples
node basic.js path/to/image.jpg
node segmentation.js path/to/image.jpgSupported Models
gemini-flash-latest(default, fastest)gemini-2.0-flashgemini-2.5-flash-litegemini-2.5-flashgemini-2.5-pro(best quality, slower)
Note: Segmentation features require 2.5 models or later.
Requirements
- Node.js 16.0.0 or higher
- Dependencies:
@google/generative-ai^0.21.0canvas^2.11.2commander^12.0.0sharp^0.33.0 (for SVG support and better compatibility)
Publishing to npm
1. Build and Test
npm install
npm test2. Update Version
Edit package.json and update the version number.
3. Login to npm
npm login4. Publish
npm publish5. Verify
npm info vsegmentsContributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Troubleshooting
Common Issues
500 Internal Server Error
If you get a 500 error from the Google Gemini API:
Try a different model:
const vs = new VSegments({ apiKey: 'YOUR_API_KEY', model: 'gemini-3-pro-preview' // default model });Check your image: Ensure it's under 4MB and in a supported format (JPG, PNG, GIF, WEBP)
Wait and retry: The API may be experiencing temporary issues
Verify API key: Make sure your API key is valid and has proper permissions
For more detailed troubleshooting, see TROUBLESHOOTING.md
Recommended Models
- Default (High quality):
gemini-3-pro-preview - Alternative:
gemini-2.5-flash
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Built using Google Gemini AI
- Inspired by the Google AI Cookbook
Support
- Issues: GitHub Issues
- Documentation: GitHub README
- Troubleshooting: TROUBLESHOOTING.md
Made with ❤️ by Marco Kotrotsos
