QueryVideoCognitionJob-阿里云帮助中心

Querying video understanding task results

Try it now

Try this API in OpenAPI Explorer, no manual signing needed. Successful calls auto-generate SDK code matching your parameters. Download it with built-in credential security for local usage.

Test

RAM authorization

The table below describes the authorization required to call this API. You can define it in a Resource Access Management (RAM) policy. The table's columns are detailed below:

Action: The actions can be used in the Action element of RAM permission policy statements to grant permissions to perform the operation.
API: The API that you can call to perform the action.
Access level: The predefined level of access granted for each API. Valid values: create, list, get, update, and delete.
Resource type: The type of the resource that supports authorization to perform the action. It indicates if the action supports resource-level permission. The specified resource must be compatible with the action. Otherwise, the policy will be ineffective.
- For APIs with resource-level permissions, required resource types are marked with an asterisk (*). Specify the corresponding Alibaba Cloud Resource Name (ARN) in the Resource element of the policy.
- For APIs without resource-level permissions, it is shown as All Resources. Use an asterisk (*) in the Resource element of the policy.
Condition key: The condition keys defined by the service. The key allows for granular control, applying to either actions alone or actions associated with specific resources. In addition to service-specific condition keys, Alibaba Cloud provides a set of common condition keys applicable across all RAM-supported services.
Dependent action: The dependent actions required to run the action. To complete the action, the RAM user or the RAM role must have the permissions to perform all dependent actions.

Action

Access level

Resource type

Condition key

Dependent action

ice:QueryVideoCognitionJob

get

*All Resource

*

None

Request parameters

Parameter	Type	Required	Description	Example
JobId	string	Yes	The ID of the intelligent tagging job. You can obtain this ID from the response of the SubmitIntelligentTaggingJob operation.	**20b48fb04483915d4f2cd8ac**
Params	string	No	Additional request parameters, specified as a JSON string.	{}
IncludeResults	object	No	A container for parameters that determine which algorithm results to include in the response.
NeedAsr	boolean	No	Specifies whether to return the ASR results.	true
NeedOcr	boolean	No	Specifies whether to return the OCR results.	true
NeedProcess	boolean	No	Specifies whether to return a link to the raw operator results.	true

Response elements

Element	Type	Description	Example
	object
JobStatus	string	The job status. Valid values: Success: The job succeeded. Fail: The job failed. Processing: The job is in progress. Submitted: The job has been submitted and is awaiting processing.	Success
RequestId	string	The request ID.	****11-DB8D-4A9A-875B-275798****
UserData	string	The user data.	{"userId":"123432412831"}
Results	object
result	array<object>	An array of analysis result objects.
	object
Type	string	The type of the analysis result. Valid values: TextLabel: text label VideoLabel: video label ASR: raw ASR result (not returned by default) OCR: raw OCR result (not returned by default) NLP: raw NLP result (not returned by default) Process: URL to the raw operator result (not returned by default)	ASR
Data	string	The analysis result data, formatted as a JSON string. The data structure varies based on the Type value. For more information, see the descriptions of the Result parameter.	{"title":"example-title-****"}
TemplateId	string	The template ID.
Params	string	The request parameters.
Input	object	The input file.
Type	string	The type of the input file. Valid value: OSS.
Media	string	The URL of the input file.

Result parameter

VideoLabel data structure

Parameter	Type	Description
persons	JSONArray	A list of detected persons.
persons.name	String	The name of the recognized person.
persons.category	String	The category of the person. Possible values: `celebrity`, `politician`, `sensitive`, and `unknown`. If recognized from a custom figure library, this field returns the library's ID.
persons.ratio	double	The occurrence rate of the person, ranging from 0 to 1.
persons.occurrences	JSONArray	A list of the person's occurrences.
persons.occurrences.score	double	The confidence score.
persons.occurrences.from	double	The start time of the person's occurrence, in seconds.
persons.occurrences.to	double	The end time of the person's occurrence, in seconds.
persons.occurrences.position	JSONObject	The face coordinates.
persons.occurrences.position.leftTop	int[]	The x and y coordinates of the top-left corner.
persons.occurrences.position.rightBottom	int[]	The x and y coordinates of the bottom-right corner.
persons.occurrences.timestamp	double	The timestamp of the face coordinates, in seconds.
persons.occurrences.scene	String	The camera shot type. Possible values: `closeUp` (close-up), `medium-closeUp` (medium close-up), `medium` (medium shot), and `medium-long` (medium-long shot).
tags	JSONArray	A list of tags for detected elements, such as objects and scenes.
tags.mainTagName	String	The main tag.
tags.subTagName	String	The subtag.
tags.ratio	double	The occurrence rate of the tag, ranging from 0 to 1.
tags.occurrences	JSONArray	A list of the tag's occurrences.
tags.occurrences.score	double	The confidence score.
tags.occurrences.from	double	The start time of the occurrence, in seconds.
tags.occurrences.to	double	The end time of the occurrence, in seconds.
classifications	JSONArray	A list of video classifications.
classifications.score	double	The confidence score for the classification.
classifications.category1	String	The primary category. For example: `Lifestyle`, `Animation`, or `Automotive`.
classifications.category2	String	The secondary category. For example, a video with the `Lifestyle` primary category might have a secondary category of `Health` or `Home`.

Video Tag Examples

Category	Example
program	e.g., The Amazing Race or America's Got Talent
role	e.g., doctor, nurse, or teacher
object	e.g., piano, cup, table, car, cosmetics, or food
logo	e.g., CNN, BBC, YouTube, or Netflix
action	e.g., dancing, kissing, hugging, meeting, singing, making a phone call, riding a horse, or fighting
location	e.g., Tiananmen Square, the Statue of Liberty, the Leshan Giant Buddha, China, or the United States
scene	e.g., a bedroom, subway station, terraced fields, beach, or desert

ImageLabel Data Structure

Parameter	Type	Description
persons	JSONArray	The detected persons.
persons.name	String	The name of the recognized person.
persons.category	String	The person category. Valid values: `celebrity`, `politician`, and `sensitive person`.
persons.score	double	The confidence score for the person recognition.
persons.position	JSONObject	The bounding box of the detected person.
persons.position.leftTop	int[]	The x and y coordinates of the top-left corner of the bounding box.
persons.position.rightBottom	int[]	The x and y coordinates of the bottom-right corner of the bounding box.
persons.scene	String	The shot type. Valid values: `closeUp` (close-up), `medium-closeUp` (medium close-up), `medium` (medium shot), and `medium-long` (long shot).
tags	JSONArray	The detected tags for elements such as objects and scenes. See the table below for examples.
tags.mainTagName	String	The main tag.
tags.subTagName	String	The sub-tag.
tags.score	double	The confidence score.

Examples of image tags

Main tag name	Sub tag name
character	such as doctor, nurse, or teacher
location	such as Tiananmen Square, the Statue of Liberty, the Leshan Giant Buddha, China, or the United States
action event	such as speaking
logo	such as CCTV-1, CCTV-2, Youku, or Dragon TV
action event	such as dancing, kissing, hugging, meeting, singing, making a phone call, horseback riding, or fighting
object	such as a piano, cup, table, scrambled eggs with tomatoes, car, or cosmetics
scene	such as a bedroom, subway station, terraced fields, beach, or desert

The TextLabel data structure (output from ASR and OCR)

Parameter	Type	Description
tags	JSONArray	An array of tags.
tags.name	String	The tag key.
tags.value	String	The tag value. Separate multiple values with a comma (`,`).

Examples of image tags

Parameter	Value
region	e.g., us-east-1, eu-west-2, or ap-southeast-1
organization	e.g., Marketing Department, Development Team, or Acme Corp
identifier	e.g., user-12345, prod-db-01, or f7b4c278-6e54-4a8f-8f9d-3e58a9c8b4e7
keyword	e.g., database, billing, or error

CPVLabel data structure

cates: A list of categories (primary, secondary, or tertiary).
entities: A list of category attributes containing information from a knowledge graph.
hotwords: A list of popular terms.
freeTags: A list of free tags (keywords).

Parameter	Type	Example value	Description
type	String	hmi	The result type. Valid values: `hmi` (human-machine collaboration result) and `autp` (automated tagging result).
cates	JSONArray	-	An array of categorization results.
cates.labelLevel1	String	Travel	The level-1 tag.
cates.labelLevel2	String	Scenic Travel	The level-2 tag.
cates.label	String	""	The tag name. This field may be empty if the algorithm does not return a value.
cates.appearanceProbability	double	0.96	The occurrence probability.
cates.detailInfo	JSONArray	-	An array of objects, each detailing an occurrence of the category.
cates.detailInfo.score	double	0.9	The confidence score.
cates.detailInfo.startTime	double	0.021	The start time.
cates.detailInfo.endTime	double	29.021	The end time.
entities	JSONArray	-	An array of detected entities.
entities.labelLevel1	String	location	The level-1 tag.
entities.labelLevel2	String	landmark	The level-2 tag.
entities.label	String	Huangguoshu Waterfall	The tag name.
entities.appearanceProbability	double	0.067	The occurrence probability.
entities.knowledgeInfo	String	{"name": "Huangguoshu Waterfall", "nameEn": "Huangguoshu Waterfall", "description": "One of the four major waterfalls in Asia"}	The knowledge graph information. For a complete list of fields, refer to the corresponding tables for Entertainment IPs, Music, Persons, Landmarks, and Objects.
entities.detailInfo	JSONArray	-	An array of objects, each detailing an occurrence of the entity.
entities.detailInfo.score	double	0.33292606472969055	The confidence score.
entities.detailInfo.startTime	double	6.021	The start time.
entities.detailInfo.endTime	double	8.021	The end time.
entities.detailInfo.trackData	JSONArray	-	An array of objects that contains structured tracking data for the entity.
entities.detailInfo.trackData.score	double	0.32	The confidence score.
entities.detailInfo.trackData.bbox	integer[]	23, 43, 45, 67	The bounding box.
entities.detailInfo.trackData.timestamp	double	7.9	The timestamp.
hotwords	JSONArray	-	An array of detected hotwords.
hotwords.labelLevel1	String	keyword	The level-1 tag.
hotwords.labelLevel1	String	keyword	The level-1 tag.
hotwords.labelLevel2	String	""	The level-2 tag.
hotwords.label	String	China Meteorological Administration	The hotword.
hotwords.appearanceProbability	double	0.96	The occurrence probability.
hotwords.detailInfo	JSONArray		An array of objects, each detailing an occurrence of the hotword.
hotwords.detailInfo.score	double	1.0	The confidence score.
hotwords.detailInfo.startTime	double	0.021	The start time.
hotwords.detailInfo.endTime	double	29.021	The end time.
freeTags	JSONArray		An array of detected free tags.
freeTags.labelLevel1	String	keyword	The level-1 tag.
freeTags.labelLevel2	String	""	The level-2 tag.
freeTags.label	String	National Meteorological Center	The tag name.
freeTags.appearanceProbability	double	0.96	The occurrence probability.
freeTags.detailInfo	JSONArray		An array of objects, each detailing an occurrence of the free tag.
freeTags.detailInfo.score	double	0.9	The confidence score.
freeTags.detailInfo.startTime	double	0.021	The start time.
freeTags.detailInfo.endTime	double	29.021	The end time.

Automatic speech recognition (ASR) results

Parameter	Type	Description
details	JSONArray	Detailed results of the task.
details.from	double	Start timestamp, in seconds.
details.to	double	End timestamp, in seconds.
details.content	String	The recognized text.

Text recognition (OCR) results

Parameter	Type	Description
details	JSONArray	Detailed task results.
details.timestamp	double	The timestamp, in seconds.
details.info	JSONArray	An array of recognition results for the corresponding timestamp.
details.info.score	double	The confidence score for the recognition.
details.info.position	JSONObject	Coordinates of the recognized text.
details.info.position.leftTop	int[]	The x and y coordinates of the top-left corner.
details.info.position.rightBottom	int[]	The x and y coordinates of the bottom-right corner.
details.info.content	String	The recognized text.

Metadata Annotation Results

Note

If a customer specifies needMetaData in a SubmitSmarttagJob call without using manual labeling, QuerySmarttagJob returns the original title.

Parameter	Type	Description
title	String	The title.

Subtitle

Parameter	Type	Description
details	JSONArray	Detailed task results.
details.allResultUrl	String	URL for all subtitle results. The URL is valid for six months after task completion.
details.chResultUrl	String	URL for the Chinese subtitle results. The URL is valid for six months after task completion.
details.engResultUrl	String	URL for the English subtitle results. The URL is valid for six months after task completion.

Note

The content at the subtitle results URL uses the following format: sequence number+time period+subtitle content (one subtitle per line).

NLP results

Parameter	Type	Description
transcription	object	The generated transcription.
autoChapters	object	The generated chapter overview.
summarization	object	The generated large model summary.
meetingAssistance	object	The generated intelligent meeting minutes.
translation	object	The generated text translation.

Transcription

Parameter	Type	Description
transcription	object	The `transcription` result object.
transcription.paragraphs	list[]	A list of `paragraph` objects that make up the transcription.
transcription.paragraphs[i].paragraphId	string	The `paragraph` id.
transcription.paragraphs[i].speakerId	string	The `speaker` id.
transcription.paragraphs[i].words	list[]	A list of `word` objects in the `paragraph`.
transcription.paragraphs[i].words[i].id	int	The sequence number of the `word`. This field can typically be ignored.
transcription.paragraphs[i].words[i].sentenceId	int	The `sentence` id. Words with the same `sentenceId` form a sentence.
transcription.paragraphs[i].words[i].start	long	The start time of the `word` as a `relative timestamp` in milliseconds.
transcription.paragraphs[i].words[i].end	long	The end time of the `word` as a `relative timestamp` in milliseconds.
transcription.paragraphs[i].words[i].text	string	The text of the `word`.

Summarization (full-text, speaker-based, and question-based)

Parameter	Type	Description
summarization	object	The summary result object, which contains results for zero or more summary types.
summarization.paragraphSummary	string	The full-text summary.
summarization.conversationalSummary	list[]	A list of conversational summaries.
summarization.conversationalSummary[i].speakerId	string	The speaker ID.
summarization.conversationalSummary[i].speakerName	string	The speaker name.
summarization.conversationalSummary[i].summary	string	The summary for this speaker.
summarization.questionsAnsweringSummary	list[]	A list of Q&A summaries.
summarization.questionsAnsweringSummary[i].question	string	The question.
summarization.questionsAnsweringSummary[i].sentenceIdsOfQuestion	list[]	A list of SentenceId values from the original transcription corresponding to the question.
summarization.questionsAnsweringSummary[i].answer	string	The answer to the question.
summarization.questionsAnsweringSummary[i].sentenceIdsOfAnswer	list[]	A list of SentenceId values from the original transcription corresponding to the answer.
summarization.mindMapSummary	list[object]	A list of mind map summaries, which represent topic hierarchies.
summarization.mindMapSummary[i].title	string	The title of the mind map.
summarization.mindMapSummary[i].topic	list[object]	An array of top-level topics, each of which can contain subtopics.
summarization.mindMapSummary[i].topic[i].title	string	The title of the topic.
summarization.mindMapSummary[i].topic[i].topic	list[object]	An array of subtopics for the parent topic. This array can be empty.

Translation

Parameter	Type	Description
translation	object	The translation result object.
translation.paragraphs	list[]	A list of translated paragraphs corresponding to the speech recognition result.
translation.paragraphs.paragraphId	string	The paragraph ID, which corresponds to the `ParagraphId` in the speech recognition result.
translation.paragraphs.sentences	list[]	A list of sentence objects.
translation.paragraphs.sentences[i].sentenceId	long	The sentence ID.
translation.paragraphs.sentences[i].start	long	The start time of the sentence in milliseconds, relative to the beginning of the audio.
translation.paragraphs.sentences[i].end	long	The end time of the sentence in milliseconds, relative to the beginning of the audio.
translation.paragraphs.sentences[i].text	string	The translated text, which corresponds to the sentence text in the speech recognition result.

autoChapters

Parameter	Type	Description
autoChapters	list[]	A list of chapter overview objects.
autoChapters[i].id	int	The chapter ID.
autoChapters[i].start	long	The chapter's start time, as a relative timestamp in milliseconds from the beginning of the audio.
autoChapters[i].end	long	The chapter's end time, as a relative timestamp in milliseconds from the beginning of the audio.
autoChapters[i].headline	string	The chapter's headline.
autoChapters[i].summary	string	The chapter's summary.

meetingAssistance (intelligent meeting summary extraction: keywords, key sentences, and action items)

Parameter	Type	Description
meetingAssistance	object	A container for the meeting assistance results. This object can hold multiple result types or be empty.
meetingAssistance.keywords	list[]	The keyword extraction results.
meetingAssistance.keySentences	list[]	The key sentence extraction results, also known as key content.
meetingAssistance.keySentences[i].id	long	The ID of the key sentence.
meetingAssistance.keySentences[i].sentenceId	long	The ID of the corresponding sentence in the original ASR transcription.
meetingAssistance.keySentences[i].start	long	The start time in milliseconds, relative to the beginning of the audio.
meetingAssistance.keySentences[i].end	long	The end time in milliseconds, relative to the beginning of the audio.
meetingAssistance.keySentences[i].text	string	The key sentence text.
meetingAssistance.actions	list[]	A list of action items and action item summaries.
meetingAssistance.actions[i].id	long	The ID of the action item.
meetingAssistance.actions[i].sentenceId	long	The ID of the corresponding sentence in the original ASR transcription.
meetingAssistance.actions[i].start	long	The start time in milliseconds, relative to the beginning of the audio.
meetingAssistance.actions[i].end	long	The end time in milliseconds, relative to the beginning of the audio.
meetingAssistance.actions[i].text	string	The action item text.
meetingAssistance.classifications	object	The scene classification results. Currently, three scene classes are supported.
meetingAssistance.classifications.interview	float	The confidence score for the interview class.
meetingAssistance.classifications.lecture	float	The confidence score for the lecture class.
meetingAssistance.classifications.meeting	float	The confidence score for the meeting class.

Examples

Success response

JSON format

{
  "JobStatus": "Success",
  "RequestId": "******11-DB8D-4A9A-875B-275798******\n",
  "UserData": "{\"userId\":\"123432412831\"}\n",
  "Results": {
    "result": [
      {
        "Type": "ASR",
        "Data": "{\"title\":\"example-title-****\"}\t\n"
      }
    ]
  },
  "TemplateId": "",
  "Params": "",
  "Input": {
    "Type": "",
    "Media": ""
  }
}

Error codes

See Error Codes for a complete list.

Release notes

See Release Notes for a complete list.