lyf-vision-understand

v1.0.4

Published

25 days ago

AI 视觉理解工具 - 支持图像分析、OCR、场景理解、Grounding 定位

0High
0Medium
0Low

lyf1351

vision image-analysis ocr glm-4v zhipuai mcp claude-code openclaw grounding

lyf-vision-understand

AI 视觉理解工具 - 支持图像分析、OCR、场景理解、Grounding 定位，默认使用免费模型GLM-4.6V-Flash，可以使用付费模型GLM-4.6V

特性

🔍 多模态视觉理解
📝 OCR 文字识别
🎯 Grounding 精确定位
📊 图表数据分析
📄 文档解析
🚀 零运行时依赖

支持的模型

| 模型 | 描述 | | ---------------- | ----------------------------- | | glm-4.6v-flash | 智谱AI 高速视觉模型（免费） | | glm-4.6v | 智谱AI 高质量视觉模型（付费） | | glm-oce | 智谱AI ocr模型（付费） |

安装

npm install -g lyf-vision-understand

CLI 使用

# 分析图片
ccm-vision analyze image.jpg "描述这张图片"

# OCR 文字识别
ccm-vision ocr receipt.jpg

# 物体检测
ccm-vision detect photo.jpg "猫"

# Grounding 定位
ccm-vision grounding photo.jpg "红色的车"

# 图表分析
ccm-vision chart chart.png

环境变量

export VISION_API_KEY="your-api-key"
export VISION_MODEL="glm-4.6v-flash"
export VISION_API_URL="https://..."

MCP 服务器

在 claude_desktop_config.json 中配置：

{
  "mcpServers": {
    "vision": {
      "command": "ccm-vision-mcp",
      "env": {
        "VISION_API_KEY": "your-api-key"
      }
    }
  }
}

编程使用

import { VisionClient } from "lyf-vision-understand";

const client = new VisionClient("your-api-key", {
  model: "glm-4.6v-flash",
});

// 分析图片
const result1 = await client.analyze("image.jpg", "描述这张图片");

// OCR 识别
const result2 = await client.ocr("receipt.jpg");

// 物体检测
const result3 = await client.detect("photo.jpg", "猫");

// Grounding 定位
const result4 = await client.grounding("photo.jpg", "找到红色车辆的位置");

API 文档

获取智谱AI API Key: https://open.bigmodel.cn/usercenter/apikeys

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

lyf-vision-understand

特性

支持的模型

安装

CLI 使用

环境变量

MCP 服务器

编程使用

API 文档

License