add visual understanding capabilities

更新时间:
复制 MD 格式

Some models in Model Studio Coding Plan, such as qwen3.6-plus, qwen3.5-plus, and kimi-k2.5, natively support visual understanding and can process images directly. For text-only models like glm-5 and MiniMax-M2.5, add a local Skill to enable visual capabilities.

Note

Running an image understanding Skill consumes your Coding Plan quota. No other charges apply.

Prerequisites

  1. You have subscribed to Coding Plan. See Getting started.

  2. You have configured the connection in your Coding Plan tool and can start conversations. See Clients and developer tools.

Visual support status

Model

Visual support

Description

  • qwen3.6-plus

  • qwen3.5-plus

  • kimi-k2.5

Yes

No extra configuration required. Pass images directly.

  • qwen3-max-2026-01-23

  • qwen3-coder-next

  • qwen3-coder-plus

  • glm-5

  • glm-4.7

  • MiniMax-M2.5

No

Requires a Skill or Agent for vision.

Method 1: Use a visual model directly (recommended)

qwen3.6-plus, qwen3.5-plus, and kimi-k2.5 have built-in vision support and are recommended if you frequently work with images.

Tool

How to switch models

Claude Code

/model qwen3.6-plus, /model qwen3.5-plus, or /model kimi-k2.5

OpenCode

/models → Select qwen3.6-plus, qwen3.5-plus, or kimi-k2.5

Qwen Code

/model → Select qwen3.6-plus, qwen3.5-plus, or kimi-k2.5

To switch models in other tools, see Clients and developer tools. After switching, reference image paths or drag and paste images into the conversation.

Method 2: Add visual capabilities using a Skill or Agent

To process images using models without vision support, such as glm-5 and MiniMax-M2.5, configure a Skill or Agent.

Claude Code

  1. Add a Skill

    In your project directory, create a skills/image-analyzer directory in the .claude folder:

    mkdir -p .claude/skills/image-analyzer

    In this directory, create a SKILL.md file with the following content:

    ---
    name: image-analyzer
    description: Helps models without visual capabilities understand images. Use this skill to analyze image content, extract information, text, and UI elements from an image, or understand any visual content such as screenshots, charts, or architecture diagrams. Pass the image path to get a description.
    model: qwen3.6-plus
    ---
    qwen3.6-plus has visual understanding capabilities. Use the qwen3.6-plus model directly for image understanding.

    Directory structure:

    .claude/
    └── skills/
        └── image-analyzer/
            └── SKILL.md
  2. Get started

    1. In your project directory, run claude to start Claude Code, and then run /model glm-5 to switch to glm-5.

    2. Download aliyun.png to your project directory and ask the following question: Load the image-analyzer skill and describe the information in the aliyun.png banner. The reply is similar to the following:

      image.png

OpenCode

  1. Add an Agent

    In your project directory, create an agents directory in the .opencode folder:

    mkdir -p .opencode/agents

    In this directory, create an image-analyzer.md file with the following content:

    Note

    The model field must use the provider and model name defined in the OpenCode config file. For example, based on the configuration in the OpenCode documentation, the value is bailian-coding-plan/qwen3.6-plus.

    ---
    description: Analyzes images using a vision-capable model. Use this agent when the user needs to understand image content, extract information from screenshots, diagrams, UI mockups, or any visual content. Invoke with @image-analyzer followed by the image path and your question.
    mode: subagent
    model: bailian-coding-plan/qwen3.6-plus
    tools:
      write: false
      edit: false
    ---
    You have vision capabilities. Analyze the provided image and return a clear, structured description focused on what the user is asking about.

    Directory structure:

    .opencode/
    └── agents/
        └── image-analyzer.md
  2. Get started

    1. In your project directory, run opencode to start OpenCode, and then switch to glm-5.

    2. Download aliyun.png to your project directory. Use the at sign to invoke image-analyzer and ask the following question: @image-analyzer, describe the information in the aliyun.png banner. The reply is similar to the following:

      image

FAQ

Why can't OpenCode + a vision model understand images?

Cause: OpenCode does not enable model visual capabilities by default. You must declare the modalities parameter in the config file.

Solution: In the model definition of the OpenCode config file, add the modalities field and set input to ["text", "image"]:

Replace sk-sp-xxx with your Coding Plan API key.
{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "bailian-coding-plan-test": {
      "npm": "@ai-sdk/anthropic",
      "name": "Model Studio Coding Plan",
      "options": {
        "baseURL": "https://coding.dashscope.aliyuncs.com/apps/anthropic/v1",
        "apiKey": "sk-sp-xxx"
      },
      "models": {
        "qwen3.6-plus": {
          "name": "Qwen3.6 Plus",
          "modalities": {
            "input": [
              "text",
              "image"
            ],
            "output": [
              "text"
            ]
          },
          "options": {
            "thinking": {
              "type": "enabled",
              "budgetTokens": 1024
            }
          }
        },
        "qwen3.5-plus": {
          "name": "Qwen3.5 Plus",
          "modalities": {
            "input": [
              "text",
              "image"
            ],
            "output": [
              "text"
            ]
          },
          "options": {
            "thinking": {
              "type": "enabled",
              "budgetTokens": 1024
            }
          }
        },
        "kimi-k2.5": {
          "name": "Kimi K2.5",
          "modalities": {
            "input": [
              "text",
              "image"
            ],
            "output": [
              "text"
            ]
          },
          "options": {
            "thinking": {
              "type": "enabled",
              "budgetTokens": 1024
            }
          }
        }
      }
    }
  }
}

Why can't OpenClaw + a vision model understand images?

Cause: OpenClaw uses the input field in the config file to determine whether a model supports vision.

Solution:

  1. In the ~/.openclaw/openclaw.json config file, ensure the model definition includes the "input": ["text", "image"] field.

    {
      "models": {
        "mode": "merge",
        "providers": {
          "bailian": {
            "baseUrl": "https://coding.dashscope.aliyuncs.com/v1",
            "apiKey": "YOUR_API_KEY",
            "api": "openai-completions",
            "models": [
              {
                "id": "qwen3.6-plus",
                "name": "qwen3.6-plus",
                "reasoning": false,
                "input": ["text", "image"],
                "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
                "contextWindow": 1000000,
                "maxTokens": 65536
              },
              {
                "id": "qwen3.5-plus",
                "name": "qwen3.5-plus",
                "reasoning": false,
                "input": ["text", "image"],
                "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
                "contextWindow": 1000000,
                "maxTokens": 65536
              },
              {
                "id": "kimi-k2.5",
                "name": "kimi-k2.5",
                "reasoning": false,
                "input": ["text", "image"],
                "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
                "contextWindow": 262144,
                "maxTokens": 32768
              }
            ]
          }
        }
      },
      "agents": {
        "defaults": {
          "model": {
            "primary": "bailian/qwen3.6-plus"
          },
          "models": {
            "bailian/qwen3.6-plus": {},
            "bailian/qwen3.5-plus": {},
            "bailian/kimi-k2.5": {}
          }
        }
      },
      "gateway": {
        "mode": "local"
      }
    }
  2. After modifying the configuration, clear the OpenClaw model cache and restart OpenClaw. Otherwise, the old configuration remains in effect.

    rm ~/.openclaw/agents/main/agent/models.json
    openclaw gateway restart