Add visual understanding capabilities-Alibaba Cloud Model Studio(Model Studio)-阿里云帮助中心

Prerequisites

You have subscribed to Token Plan.
You have completed the integration configuration in an AI tool and can chat normally. For details, see Clients and Developer Tools.

Vision support

Model	Vision support	Description
qwen3.8-max-preview qwen3.7-plus qwen3.6-plus kimi-k2.5 etc.	Yes	No additional configuration required. You can pass images directly.
qwen3-max-2026-01-23 qwen3-coder-next qwen3-coder-plus glm-5 glm-4.7 MiniMax-M2.5	No	Requires a Skill or Agent to enable visual capabilities

Method 1: Use a vision model directly (recommended)

qwen3.7-plus and other models have visual understanding capabilities. If you frequently need to process images, switching to one of these models is the simplest and recommended approach.

Tool	How to switch models
Claude Code	`/model qwen3.7-plus` or `/model qwen3.6-plus` or `/model qwen3.5-plus` or `/model kimi-k2.5`
OpenCode	`/models` then search and select `qwen3.7-plus` or `qwen3.6-plus` or `qwen3.5-plus` or `kimi-k2.5`
Qwen Code	`/model` then select `qwen3.7-plus` or `qwen3.6-plus` or `qwen3.5-plus` or `kimi-k2.5`

For more information about switching models in other coding tools, see Clients and Developer Tools. After switching, you can reference image paths directly in your conversation, or drag-and-drop/paste images.

Method 2: Add visual capabilities via Skill or Agent

If you need to use text-only models such as glm-5 or MiniMax-M2.5 for image processing, you can configure a Skill or Agent to enable visual capabilities.

Claude Code

Add the Skill

Create an .claude folder in your project directory, then create an skills/image-analyzer directory inside it:

mkdir -p .claude/skills/image-analyzer

Create a SKILL.md file in that directory with the following content:

---
name: image-analyzer
description: Helps models without vision capabilities understand images. Use this skill when you need to analyze image content, extract information, text, or UI elements from images, or understand screenshots, charts, architecture diagrams, or any visual content. Simply pass in the image path to get a description.
model: qwen3.7-plus
---
qwen3.7-plus has visual understanding capabilities. Use qwen3.7-plus directly for image understanding.

The resulting directory structure is as follows:

.claude/
└── skills/
    └── image-analyzer/
        └── SKILL.md

Get started
1. Run claude in your project directory to start Claude Code, then run /model glm-5 to switch to the glm-5 model.
2. Download aliyun.png to your project directory, then ask: 请加载image-analyzer skill，描述一下 aliyun.png banner位置是什么信息。 You will receive a response similar to the following:
  
  The alibabacloud.png is a screenshot of the Alibaba Cloud homepage. The banner area title is Coding Plan now supports Qwen3.5, and the text introduces that Alibaba Cloud Model Studio supports models such as Qwen3.5, Kimi-k2.5, and GLM-4.7, with new customers getting the first month for only 7.9 yuan. The page provides Subscribe now and Online consultation entry points.

OpenCode

Add the Agent

Create an .opencode folder in your project directory, then create an agents directory inside it:

mkdir -p .opencode/agents

Create an image-analyzer.md file in that directory with the following content:

Note

The model field must use the provider and model name defined in the OpenCode configuration file. Refer to the configuration example in the OpenCode documentation. It should be bailian-token-plan/qwen3.7-plus.

---
description: Analyzes images using a vision-capable model. Use this agent when the user needs to understand image content, extract information from screenshots, diagrams, UI mockups, or any visual content. Invoke with @image-analyzer followed by the image path and your question.
mode: subagent
model: bailian-token-plan/qwen3.7-plus
tools:
  write: false
  edit: false
---
You have vision capabilities. Analyze the provided image and return a clear, structured description focused on what the user is asking about.

The resulting directory structure is as follows:

.opencode/
└── agents/
    └── image-analyzer.md

Get started

Run opencode in your project directory to start OpenCode, then switch to the glm-5 model.

Download aliyun.png to your project directory, invoke @image-analyzer and ask: @image-analyzer，描述一下 aliyun.png banner位置是什么信息。 You will receive a response similar to the following:

The banner area contains:
Left side:
- Hamburger menu icon
- Alibaba Cloud orange logo and text
- Navigation menu: Large Models, Products, Solutions, Benefits, Pricing, Marketplace, Partners, Services, About Alibaba Cloud
Right side:
- Search box (displaying "Large Models")
- Icons: blue circle, globe, headset
- Links: Documentation, ICP Filing, Console
Build · glm-5 · 37.0s

FAQ

Why can't OpenCode + a vision model understand images?

Cause: OpenCode does not enable a model's vision capabilities by default. You must explicitly declare the modalities parameter in the configuration file.

Solution: Add a modalities field to the model definition in your OpenCode configuration file, and set input to ["text", "image"], as shown below:

Replace sk-sp-xxx with your Token Plan API Key.

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "bailian-token-plan": {
      "npm": "@ai-sdk/anthropic",
      "name": "Model Studio Token Plan",
      "options": {
        "baseURL": "https://token-plan.cn-beijing.maas.aliyuncs.com/compatible-mode/v1",
        "apiKey": "sk-sp-xxx"
      },
      "models": {
        "qwen3.6-plus": {
          "name": "Qwen3.6 Plus",
          "modalities": {
            "input": [
              "text",
              "image"
            ],
            "output": [
              "text"
            ]
          },
          "options": {
            "thinking": {
              "type": "enabled",
              "budgetTokens": 1024
            }
          }
        },
        "qwen3.5-plus": {
          "name": "Qwen3.5 Plus",
          "modalities": {
            "input": [
              "text",
              "image"
            ],
            "output": [
              "text"
            ]
          },
          "options": {
            "thinking": {
              "type": "enabled",
              "budgetTokens": 1024
            }
          }
        },
        "kimi-k2.5": {
          "name": "Kimi K2.5",
          "modalities": {
            "input": [
              "text",
              "image"
            ],
            "output": [
              "text"
            ]
          },
          "options": {
            "thinking": {
              "type": "enabled",
              "budgetTokens": 1024
            }
          }
        }
      }
    }
  }
}

Why can't OpenClaw + a vision model understand images?

Cause: OpenClaw requires the input field in the configuration file to determine whether a model supports vision capabilities.

Solution:

In the ~/.openclaw/openclaw.json configuration file, ensure the model definition includes the "input": ["text", "image"] field.

{
  "models": {
    "mode": "merge",
    "providers": {
      "bailian": {
        "baseUrl": "https://token-plan.cn-beijing.maas.aliyuncs.com/compatible-mode/v1",
        "apiKey": "YOUR_API_KEY",
        "api": "openai-completions",
        "models": [
          {
            "id": "qwen3.6-plus",
            "name": "qwen3.6-plus",
            "reasoning": false,
            "input": ["text", "image"],
            "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
            "contextWindow": 1000000,
            "maxTokens": 65536
          },
          {
            "id": "qwen3.5-plus",
            "name": "qwen3.5-plus",
            "reasoning": false,
            "input": ["text", "image"],
            "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
            "contextWindow": 1000000,
            "maxTokens": 65536
          },
          {
            "id": "kimi-k2.5",
            "name": "kimi-k2.5",
            "reasoning": false,
            "input": ["text", "image"],
            "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
            "contextWindow": 262144,
            "maxTokens": 32768
          }
        ]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "bailian/qwen3.6-plus"
      },
      "models": {
        "bailian/qwen3.6-plus": {},
        "bailian/qwen3.5-plus": {},
        "bailian/kimi-k2.5": {}
      }
    }
  },
  "gateway": {
    "mode": "local"
  }
}

After modifying the configuration, you must clear the OpenClaw model cache and restart. Otherwise, the old configuration will remain in effect.
```
rm ~/.openclaw/agents/main/agent/models.json
openclaw gateway restart
```