After you integrate your self-developed agent with the multimodal interaction developer suite, you can extend the protocol as described in this document. This extension enables the multimodal interaction developer suite to match user intents with your agent's skills.
Extension protocol declaration
To declare support for the extension protocol, add the URI of this document to the capabilities.extensions field of the AgentCard.
The URI for this document is https://help.aliyun.com/en/model-studio/multimodal-integration-a2a-intent.
Example:
{
...
"capabilities": {
"extensions": [
{
"uri": "https://help.aliyun.com/en/model-studio/multimodal-integration-a2a-intent",
"params": {
"skills": [
{
"id": "ai-calculate",
"inputSchema": {
"type": "object",
"properties": {
"num1": {
"type": "integer",
"description": "The first number"
},
"num2": {
"type": "integer",
"description": "The second number"
}
}
}
}
]
}
},
...
],
...
},
...
}
Field descriptions
AgentCard response fields
AgentExtension
|
Field name |
Type |
Required |
Description |
|
uri |
String |
Yes |
The value is fixed to "https://help.aliyun.com/en/model-studio/multimodal-integration-a2a-intent". |
|
params |
Map<String, SkillExtension[]> |
No |
Configuration parameters for the extension. Use this field to specify the skill parameters to detect. The map key must be "skills", and the value is an array of SkillExtension objects. |
SkillExtension
|
Field Name |
Type |
Required |
Description |
|
id |
String |
Yes |
The unique identifier for the skill within the agent. It is the same as AgentSkill.id. |
|
inputSchema |
Object |
Yes |
Set this field when the extension is enabled and the multimodal application must detect skill parameters. The structure is the same as defined in the MCP Tools protocol. |
Example
{
...
"capabilities": {
"extensions": [
{
"uri": "https://help.aliyun.com/en/model-studio/multimodal-integration-a2a-intent",
"params": {
"skills": [
{
"id": "ai-calculate",
"inputSchema": {
"type": "object",
"properties": {
"num1": {
"type": "integer",
"description": "The first number"
},
"num2": {
"type": "integer",
"description": "The second number"
}
}
}
}
]
}
},
...
],
...
},
...
}
Agent call request fields
Message
|
Field Name |
Type |
Required |
Description |
|
metadata |
Map<String, IntentInfo[]> |
Yes |
The metadata associated with this message. The multimodal application automatically injects this data.
|
IntentInfo
Specifies the intent routing information for the skill.
|
Field Name |
Type |
Required |
Description |
|
intent |
String |
Yes |
The ID of the skill to which the intent is routed. |
|
slots |
Slot[] |
No |
The skill parameter information. |
Slot
|
Field Name |
Type |
Required |
Description |
|
name |
String |
Yes |
The name of the skill parameter. |
|
value |
String |
Yes |
The value of the skill parameter. |
|
normValue |
String |
No |
The normalized value of the skill parameter. |
Example
{
"jsonrpc": "2.0",
"id": "request-1",
"method": "message/send",
"params": {
"message": {
"messageId": "msg-1",
"kind": "message",
"role": "user",
"parts": [
{
"kind": "text",
"text": "What is 101 plus 102?"
}
],
"metadata": {
"intentInfos": [
{
"intent": "ai-calculate",
"slots": [
{
"name": "num1",
"value": "101",
"normValue": "101"
},
{
"name": "num2",
"value": "102",
"normValue": "102"
}
]
}
]
}
}
}
}