Querying a smart tag task.
Try it now
Test
RAM authorization
|
Action |
Access level |
Resource type |
Condition key |
Dependent action |
|
ice:QuerySmarttagJob |
get |
*All Resource
|
None | None |
Request parameters
|
Parameter |
Type |
Required |
Description |
Example |
| JobId |
string |
Yes |
The ID of the smart tagging job. You can obtain this ID from the response to the SubmitSmarttagJob call. |
88c6ca184c0e47098a5b665e2**** |
| Params |
string |
No |
Additional request parameters, formatted as a JSON string. For example:
|
{"labelResultType":"auto"} |
Response elements
|
Element |
Type |
Description |
Example |
|
object |
|||
| JobStatus |
string |
The job status. Valid values:
|
Success |
| RequestId |
string |
The request ID. |
******11-DB8D-4A9A-875B-275798****** |
| UserData |
string |
The custom data passed through the MNS callback. For details on the message format, see the callback message format definitions below. |
{"userId":"123432412831"} |
| Results |
object |
||
| Result |
array<object> |
An array of analysis result objects. |
|
|
object |
|||
| Type |
string |
The type of the analysis result.
|
Meta |
| Data |
string |
The analysis result data, formatted as a JSON string. The data structure depends on the |
{"title":"example-title-****"} |
| Usages |
object |
||
| Usage |
array<object> |
||
|
object |
|||
| Type |
string |
||
| Quota |
integer |
Callback message format When the execution status of a smart tag job changes, MPS sends a callback message to a queue you specify using the MPS UpdatePipeline API. The message body is a JSON string containing the following fields:
| Parameter | Type | Description |
| Type | string | A fixed string with the value smarttag, indicating a smart tag job. |
| JobId | string | The unique ID of the job. |
| State | string | The current state of the job. This value matches the JobStatus returned by the QuerySmarttagJob operation. |
| State | string | The current state of the job. This value matches the JobStatus returned by the QuerySmarttagJob operation. |
| UserData | string | The user data passed to the SubmitSmarttagJob operation. |
| UserData | string | The user data passed to the SubmitSmarttagJob operation. |
Result parameters
VideoLabel data structure
| Parameter | Type | Description |
| persons | JSONArray | An array of objects, each representing a detected person. |
| persons.name | String | The name of the recognized person. |
| persons.category | String | The person's category. Possible values are: celebrity, politician, sensitive (sensitive person), and unknown (unknown person). For a person from a custom person library, this field contains the library's ID. |
| persons.ratio | double | The person's appearance ratio. The value ranges from 0 to 1. |
| persons.occurrences | JSONArray | An array of objects, each detailing an occurrence of the person. |
| persons.occurrences.score | double | The confidence score for the recognition. |
| persons.occurrences.from | double | The start time of the person's appearance, in seconds. |
| persons.occurrences.to | double | The end time of the person's appearance, in seconds. |
| persons.occurrences.position | JSONObject | The face coordinates. |
| persons.occurrences.position.leftTop | int[] | The x and y coordinates of the top-left corner of the face's bounding box. |
| persons.occurrences.position.rightBottom | int[] | The x and y coordinates of the bottom-right corner of the face's bounding box. |
| persons.occurrences.timestamp | double | The timestamp, in seconds, corresponding to the face coordinates. |
| persons.occurrences.scene | String | The shot type. Possible values are: closeUp (close-up shot), medium-closeUp (medium close-up shot), medium (medium shot), and medium-long (medium-long shot). |
| tags | JSONArray | An array of objects representing detected tags, such as objects and scenes. See the table below for examples. |
| tags.mainTagName | String | The main tag. |
| tags.subTagName | String | The sub-tag. |
| tags.ratio | double | The tag's appearance ratio. The value ranges from 0 to 1. |
| tags.occurrences | JSONArray | An array of objects, each detailing an occurrence of the tag. |
| tags.occurrences.score | double | The confidence score for the tag. |
| tags.occurrences.from | double | The start time of the occurrence, in seconds. |
| tags.occurrences.to | double | The end time of the occurrence, in seconds. |
| classifications | JSONArray | An array of objects, each representing a video classification. |
| classifications.score | double | The confidence score for the classification. |
| classifications.category1 | String | The level-1 category, such as Lifestyle, Anime, or Automotive. |
| classifications.category2 | String | The level-2 category, a sub-category of the level-1 category. For example, the Lifestyle category includes sub-categories like Health or Home. |
Video tags: examples
| Tag | Examples |
| program | "Where Are We Going, Dad?", "Top Funny Comedian" |
| character | doctor, nurse, teacher |
| object | piano, cup, table, scrambled eggs with tomatoes, car, cosmetics |
| logo | CCTV-1, CCTV-2, Youku, Dragon TV |
| action | dancing, kissing, hugging, meeting, singing, making a phone call, horseback riding, fighting |
| location | Tiananmen Square, the Statue of Liberty, Leshan Giant Buddha, China, the United States |
| scene | bedroom, subway station, terraced fields, beach, desert |
ImageLabel data structure
| Parameter | Type | Description |
| persons | JSONArray | An array of objects, each representing a detected person. |
| persons.name | String | The name of the recognized person. |
| persons.category | String | The person category. Valid values are celebrity, politician, and sensitive. |
| persons.score | double | The confidence score for the detected person. |
| persons.position | JSONObject | The face coordinates. |
| persons.position.leftTop | int[] | The x and y coordinates for the top-left corner of the bounding box. |
| persons.position.rightBottom | int[] | The x and y coordinates for the bottom-right corner of the bounding box. |
| persons.scene | String | The camera shot. Valid values include: closeUp (close-up), medium-closeUp (medium close-up), medium (medium shot), and medium-long (medium-long shot). |
| tags | JSONArray | Tags for detected objects and scenes. For more information, see the examples in the table below. |
| tags.mainTagName | String | The main tag. |
| tags.subTagName | String | The sub-tag. |
| tags.score | double | The confidence score for the tag. |
Image tag examples
| Main tag | Sub-tag |
| character | e.g., doctor, nurse, teacher |
| location | e.g., Tiananmen Square, the Statue of Liberty, Leshan Giant Buddha, China, United States |
| action | e.g., speaking |
| logo | e.g., CCTV1, CCTV2, Youku, Dragon TV |
| action | e.g., dancing, kissing, hugging, meeting, singing, making a phone call, horseback riding, fighting |
| object | e.g., piano, cup, table, scrambled eggs with tomatoes, car, cosmetics |
| scene | e.g., bedroom, subway station, terraced fields, beach, desert |
The TextLabel Data Structure (Source: ASR and OCR)
| Parameter | Type | Description |
| tags | JSONArray | A list of tags for the resource. |
| tags.name | String | The key of the tag. |
| tags.value | String | The value of the tag. Use a comma to separate multiple values. |
Example image tags
| Parameter | Value |
| region | Examples: Tiananmen Square, the Statue of Liberty, Leshan Giant Buddha, China, United States |
| organization | Examples: China Wildlife Conservation Association, China Media Group |
| logo | Examples: Nike, Li-Ning |
| keyword | Example: core strength |
CPVLabel data structure
-
cates: Hierarchical categories (e.g., level-1, level-2, and level-3). -
entities: Category attributes (enriched with knowledge graph information). -
hotwords: Popular keywords (terms reflecting user interest). -
freeTags: Free-form tags (user-defined keywords).
| Parameter | Type | Example value | Description |
| type | String | hmi | The result type. Valid values: hmi (human-machine interaction result) and autp (automated tagging result). |
| cates | JSONArray | - | The category classification results. |
| cates.labelLevel1 | String | Travel | The level-1 label. |
| cates.labelLevel2 | String | Travel Scenery | The level-2 label. |
| cates.label | String | "" | The label name. The algorithm may return an empty string. |
| cates.appearanceProbability | double | 0.96 | The appearance probability. |
| cates.detailInfo | JSONArray | - | Detailed information about the category. |
| cates.detailInfo.score | double | 0.9 | The confidence score. |
| cates.detailInfo.startTime | double | 0.021 | The start time. |
| cates.detailInfo.endTime | double | 29.021 | The end time. |
| entities | JSONArray | - | Detected entities. |
| entities.labelLevel1 | String | Region | The level-1 label. |
| entities.labelLevel2 | String | Landmark | The level-2 label. |
| entities.label | String | Huangguoshu Waterfall | The label name. |
| entities.appearanceProbability | double | 0.067 | The appearance probability. |
| entities.knowledgeInfo | String | {"name": "黄果树瀑布", "nameEn": "Huangguoshu Waterfall", "description": "亚洲四大瀑布之一"} | Information from the knowledge graph. For a complete list of fields, refer to the tables for the following knowledge graph types: film and television IP, music, person, landmark, and object. |
| entities.detailInfo | JSONArray | - | Detailed information about the entity. |
| entities.detailInfo.score | double | 0.33292606472969055 | The confidence score. |
| entities.detailInfo.startTime | double | 6.021 | The start time. |
| entities.detailInfo.endTime | double | 8.021 | The end time. |
| entities.detailInfo.trackData | JSONArray | - | The structured information of the entity label. |
| entities.detailInfo.trackData.score | double | 0.32 | The confidence score. |
| entities.detailInfo.trackData.bbox | integer[] | 23,43,45,67 | The bounding box coordinates. |
| entities.detailInfo.trackData.timestamp | double | 7.9 | The timestamp. |
| hotwords | JSONArray | - | Detected hotwords. |
| hotwords.labelLevel1 | String | keyword | The level-1 label. |
| hotwords.labelLevel1 | String | keyword | The level-1 label. |
| hotwords.labelLevel2 | String | "" | The level-2 label. |
| hotwords.label | String | 中国气象局 | The hotword content. |
| hotwords.appearanceProbability | double | 0.96 | The appearance probability. |
| hotwords.detailInfo | JSONArray | Detailed information about the hotword. | |
| hotwords.detailInfo.score | double | 1.0 | The confidence score. |
| hotwords.detailInfo.startTime | double | 0.021 | The start time. |
| hotwords.detailInfo.endTime | double | 29.021 | The end time. |
| freeTags | JSONArray | The free-form tags. | |
| freeTags.labelLevel1 | String | keyword | The level-1 label. |
| freeTags.labelLevel2 | String | "" | The level-2 label. |
| freeTags.label | String | 中央气象台 | The tag content. |
| freeTags.appearanceProbability | double | 0.96 | The appearance probability. |
| freeTags.detailInfo | JSONArray | Detailed information about the free-form tag. | |
| freeTags.detailInfo.score | double | 0.9 | The confidence score. |
| freeTags.detailInfo.startTime | double | 0.021 | The start time. |
| freeTags.detailInfo.endTime | double | 29.021 | The end time. |
ASR result
| Parameter | Type | Description |
| details | JSONArray | Detailed task results. |
| details.from | double | Start timestamp of the speech segment, in seconds. |
| details.to | double | End timestamp of the speech segment, in seconds. |
| details.content | String | Transcribed text from the speech segment. |
OCR results
| Parameter | Type | Description |
| details | JSONArray | An array of objects containing the task details. |
| details.timestamp | double | The timestamp, in seconds. |
| details.info | JSONArray | An array of objects, each containing recognized information for the corresponding timestamp. |
| details.info.score | double | The confidence score. |
| details.info.position | JSONObject | The text coordinates. |
| details.info.position.leftTop | int[] | The coordinates of the top-left corner. |
| details.info.position.rightBottom | int[] | The coordinates of the bottom-right corner. |
| details.info.content | String | The recognized text content. |
Meta annotation results
If you do not use human annotation and you specify the needMetaData parameter when submitting a task with SubmitSmarttagJob , QuerySmarttagJob will return the original title that you entered.
| Parameter | Type | Description |
| title | String | The title. |
Subtitle extraction results
| Parameter | Type | Description |
| details | JSONArray | Detailed results of the task. |
| details.allResultUrl | String | URL for all subtitle results. The URL is valid for six months after the task is completed. |
| details.chResultUrl | String | URL for the Chinese subtitle results. The URL is valid for six months after the task is completed. |
| details.engResultUrl | String | URL for the English subtitle results. The URL is valid for six months after the task is completed. |
The subtitle result URL returns content in the following format: sequence number + time range + subtitle content (one subtitle per line).
NLP results
| Parameter | Type | Description |
| transcription | object | The speech-to-text output. |
| autoChapters | object | The automatically generated chapters. |
| summarization | object | The summary generated by the large model. |
| meetingAssistance | object | The intelligent meeting minutes. |
| translation | object | The translated text. |
Transcription
| Parameter | Type | Description |
| transcription | object | The speech transcription result. |
| transcription.paragraphs | list[] | A list of paragraphs in the speech transcription. |
| transcription.paragraphs[i].paragraphId | string | The paragraph-level ID. |
| transcription.paragraphs[i].speakerId | string | The speaker ID. |
| transcription.paragraphs[i].words | list[] | A list of words in the paragraph. |
| transcription.paragraphs[i].words[i].id | int | The sequence number of the word. This field can typically be ignored. |
| transcription.paragraphs[i].words[i].sentenceId | int | The sentence ID. Words with the same sentenceId form a sentence. |
| transcription.paragraphs[i].words[i].start | long | The start time of the word, in milliseconds, relative to the beginning of the audio. |
| transcription.paragraphs[i].words[i].end | long | The end time of the word, in milliseconds, relative to the beginning of the audio. |
| transcription.paragraphs[i].words[i].text | string | The text of the word. |
Summarization: full-text summary, speaker summary, and question summary
| Parameter | Type | Description |
| summarization | object | The summary result object, which can contain results for multiple summary types. |
| summarization.paragraphSummary | string | The paragraph summary result. |
| summarization.conversationalSummary | list[] | A list of conversational summary results. |
| summarization.conversationalSummary[i].speakerId | string | The ID of the speaker. |
| summarization.conversationalSummary[i].speakerName | string | The name of the speaker. |
| summarization.conversationalSummary[i].summary | string | The summary for the speaker. |
| summarization.questionsAnsweringSummary | list[] | A list of question-answering summary results. |
| summarization.questionsAnsweringSummary[i].question | string | The question extracted from the speech transcription. |
| summarization.questionsAnsweringSummary[i].sentenceIdsOfQuestion | list[] | IDs of sentences from the original speech transcription that form the question. |
| summarization.questionsAnsweringSummary[i].answer | string | The answer to the question. |
| summarization.questionsAnsweringSummary[i].sentenceIdsOfAnswer | list[] | The IDs of sentences from the original speech transcription that are summarized in the answer. |
| summarization.mindMapSummary | list[object] | A list of mind map summary results, which may include topic summaries and their relationships. |
| summarization.mindMapSummary[i].title | string | The title of the mind map. |
| summarization.mindMapSummary[i].topic | list[object] | A list of the main topics, each with their respective sub-topics. |
| summarization.mindMapSummary[i].topic[i].title | string | The topic title. |
| summarization.mindMapSummary[i].topic[i].topic | list[object] | A list of sub-topics for the current topic, which can be empty. |
Full-text translation
| Parameter | Type | Description |
| translation | object | The translation object. |
| translation.paragraphs | list[] | A list of translated paragraphs, corresponding to the speech recognition result. |
| translation.paragraphs.paragraphId | string | The paragraph ID, which corresponds to the ParagraphId from the speech recognition result. |
| translation.paragraphs.sentences | list[] | The sentences that make up the paragraph. |
| translation.paragraphs.sentences[i].sentenceId | long | The unique ID for the sentence. |
| translation.paragraphs.sentences[i].start | long | The start time of the sentence in milliseconds, relative to the beginning of the audio. |
| translation.paragraphs.sentences[i].end | long | The end time of the sentence in milliseconds, relative to the beginning of the audio. |
| translation.paragraphs.sentences[i].text | string | The translated text of the sentence. |
autoChapters (chapter recognition)
| Parameter | Type | Description |
| autoChapters | list[] | A list of chapter overview objects. |
| autoChapters[i].id | int | Chapter ID. |
| autoChapters[i].start | long | The start time of the chapter, as a relative timestamp in milliseconds from the start of the audio. |
| autoChapters[i].end | long | The end time of the chapter, as a relative timestamp in milliseconds from the start of the audio. |
| autoChapters[i].headline | string | Chapter headline. |
| autoChapters[i].summary | string | Chapter summary. |
meetingAssistance (summary, keyword, key sentence, and action item extraction)
| Parameter | Type | Description |
| meetingAssistance | object | The intelligent minutes results object, which may contain zero or more results of various types. |
| meetingAssistance.keywords | list[] | The keyword extraction results. |
| meetingAssistance.keySentences | list[] | The key sentence extraction results, also known as key content. |
| meetingAssistance.keySentences[i].id | long | The ID of the key sentence. |
| meetingAssistance.keySentences[i].sentenceId | long | The ID of the corresponding sentence in the original ASR transcription. |
| meetingAssistance.keySentences[i].start | long | The start time relative to the beginning of the audio, in milliseconds. |
| meetingAssistance.keySentences[i].end | long | The end time relative to the beginning of the audio, in milliseconds. |
| meetingAssistance.keySentences[i].text | string | The text of the key sentence. |
| meetingAssistance.actions | list[] | A list of action items. |
| meetingAssistance.actions[i].id | long | The ID of the action item. |
| meetingAssistance.actions[i].sentenceId | long | The ID of the corresponding sentence in the original ASR transcription. |
| meetingAssistance.actions[i].start | long | The start time relative to the beginning of the audio, in milliseconds. |
| meetingAssistance.actions[i].end | long | The end time relative to the beginning of the audio, in milliseconds. |
| meetingAssistance.actions[i].text | string | The text of the action item. |
| meetingAssistance.classifications | object | The scene classification results. Currently, only three scene types are supported. |
| meetingAssistance.classifications.interview | float | The confidence score for the interview scene. |
| meetingAssistance.classifications.lecture | float | The confidence score for the lecture scene. |
| meetingAssistance.classifications.meeting | float | The confidence score for the meeting scene. |
Examples
Success response
JSON format
{
"JobStatus": "Success",
"RequestId": "******11-DB8D-4A9A-875B-275798******",
"UserData": "{\"userId\":\"123432412831\"}",
"Results": {
"Result": [
{
"Type": "Meta",
"Data": "{\"title\":\"example-title-****\"}\t\n"
}
]
},
"Usages": {
"Usage": [
{
"Type": "",
"Quota": 0
}
]
}
}
Error codes
See Error Codes for a complete list.
Release notes
See Release Notes for a complete list.