Querying video understanding task results
Try it now
Test
RAM authorization
|
Action |
Access level |
Resource type |
Condition key |
Dependent action |
|
ice:QueryVideoCognitionJob |
get |
*All Resource
|
None | None |
Request parameters
|
Parameter |
Type |
Required |
Description |
Example |
| JobId |
string |
Yes |
The ID of the intelligent tagging job. You can obtain this ID from the response of the SubmitIntelligentTaggingJob operation. |
****20b48fb04483915d4f2cd8ac**** |
| Params |
string |
No |
Additional request parameters, specified as a JSON string. |
{} |
| IncludeResults |
object |
No |
A container for parameters that determine which algorithm results to include in the response. |
|
| NeedAsr |
boolean |
No |
Specifies whether to return the ASR results. |
true |
| NeedOcr |
boolean |
No |
Specifies whether to return the OCR results. |
true |
| NeedProcess |
boolean |
No |
Specifies whether to return a link to the raw operator results. |
true |
Response elements
|
Element |
Type |
Description |
Example |
|
object |
|||
| JobStatus |
string |
The job status. Valid values:
|
Success |
| RequestId |
string |
The request ID. |
******11-DB8D-4A9A-875B-275798****** |
| UserData |
string |
The user data. |
{"userId":"123432412831"} |
| Results |
object |
||
| result |
array<object> |
An array of analysis result objects. |
|
|
object |
|||
| Type |
string |
The type of the analysis result. Valid values:
|
ASR |
| Data |
string |
The analysis result data, formatted as a JSON string. The data structure varies based on the Type value. For more information, see the descriptions of the Result parameter. |
{"title":"example-title-****"} |
| TemplateId |
string |
The template ID. |
|
| Params |
string |
The request parameters. |
|
| Input |
object |
The input file. |
|
| Type |
string |
The type of the input file. Valid value: OSS. |
|
| Media |
string |
The URL of the input file. |
Result parameter
VideoLabel data structure
| Parameter | Type | Description |
| persons | JSONArray | A list of detected persons. |
| persons.name | String | The name of the recognized person. |
| persons.category | String | The category of the person. Possible values: celebrity, politician, sensitive, and unknown. If recognized from a custom figure library, this field returns the library's ID. |
| persons.ratio | double | The occurrence rate of the person, ranging from 0 to 1. |
| persons.occurrences | JSONArray | A list of the person's occurrences. |
| persons.occurrences.score | double | The confidence score. |
| persons.occurrences.from | double | The start time of the person's occurrence, in seconds. |
| persons.occurrences.to | double | The end time of the person's occurrence, in seconds. |
| persons.occurrences.position | JSONObject | The face coordinates. |
| persons.occurrences.position.leftTop | int[] | The x and y coordinates of the top-left corner. |
| persons.occurrences.position.rightBottom | int[] | The x and y coordinates of the bottom-right corner. |
| persons.occurrences.timestamp | double | The timestamp of the face coordinates, in seconds. |
| persons.occurrences.scene | String | The camera shot type. Possible values: closeUp (close-up), medium-closeUp (medium close-up), medium (medium shot), and medium-long (medium-long shot). |
| tags | JSONArray | A list of tags for detected elements, such as objects and scenes. |
| tags.mainTagName | String | The main tag. |
| tags.subTagName | String | The subtag. |
| tags.ratio | double | The occurrence rate of the tag, ranging from 0 to 1. |
| tags.occurrences | JSONArray | A list of the tag's occurrences. |
| tags.occurrences.score | double | The confidence score. |
| tags.occurrences.from | double | The start time of the occurrence, in seconds. |
| tags.occurrences.to | double | The end time of the occurrence, in seconds. |
| classifications | JSONArray | A list of video classifications. |
| classifications.score | double | The confidence score for the classification. |
| classifications.category1 | String | The primary category. For example: Lifestyle, Animation, or Automotive. |
| classifications.category2 | String | The secondary category. For example, a video with the Lifestyle primary category might have a secondary category of Health or Home. |
Video Tag Examples
| Category | Example |
| program | e.g., The Amazing Race or America's Got Talent |
| role | e.g., doctor, nurse, or teacher |
| object | e.g., piano, cup, table, car, cosmetics, or food |
| logo | e.g., CNN, BBC, YouTube, or Netflix |
| action | e.g., dancing, kissing, hugging, meeting, singing, making a phone call, riding a horse, or fighting |
| location | e.g., Tiananmen Square, the Statue of Liberty, the Leshan Giant Buddha, China, or the United States |
| scene | e.g., a bedroom, subway station, terraced fields, beach, or desert |
ImageLabel Data Structure
| Parameter | Type | Description |
| persons | JSONArray | The detected persons. |
| persons.name | String | The name of the recognized person. |
| persons.category | String | The person category. Valid values: celebrity, politician, and sensitive person. |
| persons.score | double | The confidence score for the person recognition. |
| persons.position | JSONObject | The bounding box of the detected person. |
| persons.position.leftTop | int[] | The x and y coordinates of the top-left corner of the bounding box. |
| persons.position.rightBottom | int[] | The x and y coordinates of the bottom-right corner of the bounding box. |
| persons.scene | String | The shot type. Valid values: closeUp (close-up), medium-closeUp (medium close-up), medium (medium shot), and medium-long (long shot). |
| tags | JSONArray | The detected tags for elements such as objects and scenes. See the table below for examples. |
| tags.mainTagName | String | The main tag. |
| tags.subTagName | String | The sub-tag. |
| tags.score | double | The confidence score. |
Examples of image tags
| Main tag name | Sub tag name |
| character | such as doctor, nurse, or teacher |
| location | such as Tiananmen Square, the Statue of Liberty, the Leshan Giant Buddha, China, or the United States |
| action event | such as speaking |
| logo | such as CCTV-1, CCTV-2, Youku, or Dragon TV |
| action event | such as dancing, kissing, hugging, meeting, singing, making a phone call, horseback riding, or fighting |
| object | such as a piano, cup, table, scrambled eggs with tomatoes, car, or cosmetics |
| scene | such as a bedroom, subway station, terraced fields, beach, or desert |
The TextLabel data structure (output from ASR and OCR)
| Parameter | Type | Description |
| tags | JSONArray | An array of tags. |
| tags.name | String | The tag key. |
| tags.value | String | The tag value. Separate multiple values with a comma (,). |
Examples of image tags
| Parameter | Value |
| region | e.g., us-east-1, eu-west-2, or ap-southeast-1 |
| organization | e.g., Marketing Department, Development Team, or Acme Corp |
| identifier | e.g., user-12345, prod-db-01, or f7b4c278-6e54-4a8f-8f9d-3e58a9c8b4e7 |
| keyword | e.g., database, billing, or error |
CPVLabel data structure
cates: A list of categories (primary, secondary, or tertiary).entities: A list of category attributes containing information from a knowledge graph.hotwords: A list of popular terms.freeTags: A list of free tags (keywords).
| Parameter | Type | Example value | Description |
| type | String | hmi | The result type. Valid values: hmi (human-machine collaboration result) and autp (automated tagging result). |
| cates | JSONArray | - | An array of categorization results. |
| cates.labelLevel1 | String | Travel | The level-1 tag. |
| cates.labelLevel2 | String | Scenic Travel | The level-2 tag. |
| cates.label | String | "" | The tag name. This field may be empty if the algorithm does not return a value. |
| cates.appearanceProbability | double | 0.96 | The occurrence probability. |
| cates.detailInfo | JSONArray | - | An array of objects, each detailing an occurrence of the category. |
| cates.detailInfo.score | double | 0.9 | The confidence score. |
| cates.detailInfo.startTime | double | 0.021 | The start time. |
| cates.detailInfo.endTime | double | 29.021 | The end time. |
| entities | JSONArray | - | An array of detected entities. |
| entities.labelLevel1 | String | location | The level-1 tag. |
| entities.labelLevel2 | String | landmark | The level-2 tag. |
| entities.label | String | Huangguoshu Waterfall | The tag name. |
| entities.appearanceProbability | double | 0.067 | The occurrence probability. |
| entities.knowledgeInfo | String | {"name": "Huangguoshu Waterfall", "nameEn": "Huangguoshu Waterfall", "description": "One of the four major waterfalls in Asia"} | The knowledge graph information. For a complete list of fields, refer to the corresponding tables for Entertainment IPs, Music, Persons, Landmarks, and Objects. |
| entities.detailInfo | JSONArray | - | An array of objects, each detailing an occurrence of the entity. |
| entities.detailInfo.score | double | 0.33292606472969055 | The confidence score. |
| entities.detailInfo.startTime | double | 6.021 | The start time. |
| entities.detailInfo.endTime | double | 8.021 | The end time. |
| entities.detailInfo.trackData | JSONArray | - | An array of objects that contains structured tracking data for the entity. |
| entities.detailInfo.trackData.score | double | 0.32 | The confidence score. |
| entities.detailInfo.trackData.bbox | integer[] | 23, 43, 45, 67 | The bounding box. |
| entities.detailInfo.trackData.timestamp | double | 7.9 | The timestamp. |
| hotwords | JSONArray | - | An array of detected hotwords. |
| hotwords.labelLevel1 | String | keyword | The level-1 tag. |
| hotwords.labelLevel1 | String | keyword | The level-1 tag. |
| hotwords.labelLevel2 | String | "" | The level-2 tag. |
| hotwords.label | String | China Meteorological Administration | The hotword. |
| hotwords.appearanceProbability | double | 0.96 | The occurrence probability. |
| hotwords.detailInfo | JSONArray | An array of objects, each detailing an occurrence of the hotword. | |
| hotwords.detailInfo.score | double | 1.0 | The confidence score. |
| hotwords.detailInfo.startTime | double | 0.021 | The start time. |
| hotwords.detailInfo.endTime | double | 29.021 | The end time. |
| freeTags | JSONArray | An array of detected free tags. | |
| freeTags.labelLevel1 | String | keyword | The level-1 tag. |
| freeTags.labelLevel2 | String | "" | The level-2 tag. |
| freeTags.label | String | National Meteorological Center | The tag name. |
| freeTags.appearanceProbability | double | 0.96 | The occurrence probability. |
| freeTags.detailInfo | JSONArray | An array of objects, each detailing an occurrence of the free tag. | |
| freeTags.detailInfo.score | double | 0.9 | The confidence score. |
| freeTags.detailInfo.startTime | double | 0.021 | The start time. |
| freeTags.detailInfo.endTime | double | 29.021 | The end time. |
Automatic speech recognition (ASR) results
| Parameter | Type | Description |
| details | JSONArray | Detailed results of the task. |
| details.from | double | Start timestamp, in seconds. |
| details.to | double | End timestamp, in seconds. |
| details.content | String | The recognized text. |
Text recognition (OCR) results
| Parameter | Type | Description |
| details | JSONArray | Detailed task results. |
| details.timestamp | double | The timestamp, in seconds. |
| details.info | JSONArray | An array of recognition results for the corresponding timestamp. |
| details.info.score | double | The confidence score for the recognition. |
| details.info.position | JSONObject | Coordinates of the recognized text. |
| details.info.position.leftTop | int[] | The x and y coordinates of the top-left corner. |
| details.info.position.rightBottom | int[] | The x and y coordinates of the bottom-right corner. |
| details.info.content | String | The recognized text. |
Metadata Annotation Results
If a customer specifies needMetaData in a SubmitSmarttagJob call without using manual labeling, QuerySmarttagJob returns the original title.
| Parameter | Type | Description |
| title | String | The title. |
Subtitle
| Parameter | Type | Description |
| details | JSONArray | Detailed task results. |
| details.allResultUrl | String | URL for all subtitle results. The URL is valid for six months after task completion. |
| details.chResultUrl | String | URL for the Chinese subtitle results. The URL is valid for six months after task completion. |
| details.engResultUrl | String | URL for the English subtitle results. The URL is valid for six months after task completion. |
The content at the subtitle results URL uses the following format: sequence number+time period+subtitle content (one subtitle per line).
NLP results
| Parameter | Type | Description |
| transcription | object | The generated transcription. |
| autoChapters | object | The generated chapter overview. |
| summarization | object | The generated large model summary. |
| meetingAssistance | object | The generated intelligent meeting minutes. |
| translation | object | The generated text translation. |
Transcription
| Parameter | Type | Description |
| transcription | object | The transcription result object. |
| transcription.paragraphs | list[] | A list of paragraph objects that make up the transcription. |
| transcription.paragraphs[i].paragraphId | string | The paragraph id. |
| transcription.paragraphs[i].speakerId | string | The speaker id. |
| transcription.paragraphs[i].words | list[] | A list of word objects in the paragraph. |
| transcription.paragraphs[i].words[i].id | int | The sequence number of the word. This field can typically be ignored. |
| transcription.paragraphs[i].words[i].sentenceId | int | The sentence id. Words with the same sentenceId form a sentence. |
| transcription.paragraphs[i].words[i].start | long | The start time of the word as a relative timestamp in milliseconds. |
| transcription.paragraphs[i].words[i].end | long | The end time of the word as a relative timestamp in milliseconds. |
| transcription.paragraphs[i].words[i].text | string | The text of the word. |
Summarization (full-text, speaker-based, and question-based)
| Parameter | Type | Description |
| summarization | object | The summary result object, which contains results for zero or more summary types. |
| summarization.paragraphSummary | string | The full-text summary. |
| summarization.conversationalSummary | list[] | A list of conversational summaries. |
| summarization.conversationalSummary[i].speakerId | string | The speaker ID. |
| summarization.conversationalSummary[i].speakerName | string | The speaker name. |
| summarization.conversationalSummary[i].summary | string | The summary for this speaker. |
| summarization.questionsAnsweringSummary | list[] | A list of Q&A summaries. |
| summarization.questionsAnsweringSummary[i].question | string | The question. |
| summarization.questionsAnsweringSummary[i].sentenceIdsOfQuestion | list[] | A list of SentenceId values from the original transcription corresponding to the question. |
| summarization.questionsAnsweringSummary[i].answer | string | The answer to the question. |
| summarization.questionsAnsweringSummary[i].sentenceIdsOfAnswer | list[] | A list of SentenceId values from the original transcription corresponding to the answer. |
| summarization.mindMapSummary | list[object] | A list of mind map summaries, which represent topic hierarchies. |
| summarization.mindMapSummary[i].title | string | The title of the mind map. |
| summarization.mindMapSummary[i].topic | list[object] | An array of top-level topics, each of which can contain subtopics. |
| summarization.mindMapSummary[i].topic[i].title | string | The title of the topic. |
| summarization.mindMapSummary[i].topic[i].topic | list[object] | An array of subtopics for the parent topic. This array can be empty. |
Translation
| Parameter | Type | Description |
| translation | object | The translation result object. |
| translation.paragraphs | list[] | A list of translated paragraphs corresponding to the speech recognition result. |
| translation.paragraphs.paragraphId | string | The paragraph ID, which corresponds to the ParagraphId in the speech recognition result. |
| translation.paragraphs.sentences | list[] | A list of sentence objects. |
| translation.paragraphs.sentences[i].sentenceId | long | The sentence ID. |
| translation.paragraphs.sentences[i].start | long | The start time of the sentence in milliseconds, relative to the beginning of the audio. |
| translation.paragraphs.sentences[i].end | long | The end time of the sentence in milliseconds, relative to the beginning of the audio. |
| translation.paragraphs.sentences[i].text | string | The translated text, which corresponds to the sentence text in the speech recognition result. |
autoChapters
| Parameter | Type | Description |
| autoChapters | list[] | A list of chapter overview objects. |
| autoChapters[i].id | int | The chapter ID. |
| autoChapters[i].start | long | The chapter's start time, as a relative timestamp in milliseconds from the beginning of the audio. |
| autoChapters[i].end | long | The chapter's end time, as a relative timestamp in milliseconds from the beginning of the audio. |
| autoChapters[i].headline | string | The chapter's headline. |
| autoChapters[i].summary | string | The chapter's summary. |
meetingAssistance (intelligent meeting summary extraction: keywords, key sentences, and action items)
| Parameter | Type | Description |
| meetingAssistance | object | A container for the meeting assistance results. This object can hold multiple result types or be empty. |
| meetingAssistance.keywords | list[] | The keyword extraction results. |
| meetingAssistance.keySentences | list[] | The key sentence extraction results, also known as key content. |
| meetingAssistance.keySentences[i].id | long | The ID of the key sentence. |
| meetingAssistance.keySentences[i].sentenceId | long | The ID of the corresponding sentence in the original ASR transcription. |
| meetingAssistance.keySentences[i].start | long | The start time in milliseconds, relative to the beginning of the audio. |
| meetingAssistance.keySentences[i].end | long | The end time in milliseconds, relative to the beginning of the audio. |
| meetingAssistance.keySentences[i].text | string | The key sentence text. |
| meetingAssistance.actions | list[] | A list of action items and action item summaries. |
| meetingAssistance.actions[i].id | long | The ID of the action item. |
| meetingAssistance.actions[i].sentenceId | long | The ID of the corresponding sentence in the original ASR transcription. |
| meetingAssistance.actions[i].start | long | The start time in milliseconds, relative to the beginning of the audio. |
| meetingAssistance.actions[i].end | long | The end time in milliseconds, relative to the beginning of the audio. |
| meetingAssistance.actions[i].text | string | The action item text. |
| meetingAssistance.classifications | object | The scene classification results. Currently, three scene classes are supported. |
| meetingAssistance.classifications.interview | float | The confidence score for the interview class. |
| meetingAssistance.classifications.lecture | float | The confidence score for the lecture class. |
| meetingAssistance.classifications.meeting | float | The confidence score for the meeting class. |
Examples
Success response
JSON format
{
"JobStatus": "Success",
"RequestId": "******11-DB8D-4A9A-875B-275798******\n",
"UserData": "{\"userId\":\"123432412831\"}\n",
"Results": {
"result": [
{
"Type": "ASR",
"Data": "{\"title\":\"example-title-****\"}\t\n"
}
]
},
"TemplateId": "",
"Params": "",
"Input": {
"Type": "",
"Media": ""
}
}
Error codes
See Error Codes for a complete list.
Release notes
See Release Notes for a complete list.