QueryVideoCognitionJob

更新时间:
复制 MD 格式

Querying video understanding task results

Try it now

Try this API in OpenAPI Explorer, no manual signing needed. Successful calls auto-generate SDK code matching your parameters. Download it with built-in credential security for local usage.

Test

RAM authorization

The table below describes the authorization required to call this API. You can define it in a Resource Access Management (RAM) policy. The table's columns are detailed below:

  • Action: The actions can be used in the Action element of RAM permission policy statements to grant permissions to perform the operation.

  • API: The API that you can call to perform the action.

  • Access level: The predefined level of access granted for each API. Valid values: create, list, get, update, and delete.

  • Resource type: The type of the resource that supports authorization to perform the action. It indicates if the action supports resource-level permission. The specified resource must be compatible with the action. Otherwise, the policy will be ineffective.

    • For APIs with resource-level permissions, required resource types are marked with an asterisk (*). Specify the corresponding Alibaba Cloud Resource Name (ARN) in the Resource element of the policy.

    • For APIs without resource-level permissions, it is shown as All Resources. Use an asterisk (*) in the Resource element of the policy.

  • Condition key: The condition keys defined by the service. The key allows for granular control, applying to either actions alone or actions associated with specific resources. In addition to service-specific condition keys, Alibaba Cloud provides a set of common condition keys applicable across all RAM-supported services.

  • Dependent action: The dependent actions required to run the action. To complete the action, the RAM user or the RAM role must have the permissions to perform all dependent actions.

Action

Access level

Resource type

Condition key

Dependent action

ice:QueryVideoCognitionJob

get

*All Resource

*

None None

Request parameters

Parameter

Type

Required

Description

Example

JobId

string

Yes

The ID of the intelligent tagging job. You can obtain this ID from the response of the SubmitIntelligentTaggingJob operation.

****20b48fb04483915d4f2cd8ac****

Params

string

No

Additional request parameters, specified as a JSON string.

{}

IncludeResults

object

No

A container for parameters that determine which algorithm results to include in the response.

NeedAsr

boolean

No

Specifies whether to return the ASR results.

true

NeedOcr

boolean

No

Specifies whether to return the OCR results.

true

NeedProcess

boolean

No

Specifies whether to return a link to the raw operator results.

true

Response elements

Element

Type

Description

Example

object

JobStatus

string

The job status. Valid values:

  • Success: The job succeeded.

  • Fail: The job failed.

  • Processing: The job is in progress.

  • Submitted: The job has been submitted and is awaiting processing.

Success

RequestId

string

The request ID.

******11-DB8D-4A9A-875B-275798******

UserData

string

The user data.

{"userId":"123432412831"}

Results

object

result

array<object>

An array of analysis result objects.

object

Type

string

The type of the analysis result. Valid values:

  1. TextLabel: text label

  2. VideoLabel: video label

  3. ASR: raw ASR result (not returned by default)

  4. OCR: raw OCR result (not returned by default)

  5. NLP: raw NLP result (not returned by default)

  6. Process: URL to the raw operator result (not returned by default)

ASR

Data

string

The analysis result data, formatted as a JSON string. The data structure varies based on the Type value. For more information, see the descriptions of the Result parameter.

{"title":"example-title-****"}

TemplateId

string

The template ID.

Params

string

The request parameters.

Input

object

The input file.

Type

string

The type of the input file. Valid value: OSS.

Media

string

The URL of the input file.

Result parameter

VideoLabel data structure

ParameterTypeDescription
personsJSONArrayA list of detected persons.
persons.nameStringThe name of the recognized person.
persons.categoryStringThe category of the person. Possible values: celebrity, politician, sensitive, and unknown. If recognized from a custom figure library, this field returns the library's ID.
persons.ratiodoubleThe occurrence rate of the person, ranging from 0 to 1.
persons.occurrencesJSONArrayA list of the person's occurrences.
persons.occurrences.scoredoubleThe confidence score.
persons.occurrences.fromdoubleThe start time of the person's occurrence, in seconds.
persons.occurrences.todoubleThe end time of the person's occurrence, in seconds.
persons.occurrences.positionJSONObjectThe face coordinates.
persons.occurrences.position.leftTopint[]The x and y coordinates of the top-left corner.
persons.occurrences.position.rightBottomint[]The x and y coordinates of the bottom-right corner.
persons.occurrences.timestampdoubleThe timestamp of the face coordinates, in seconds.
persons.occurrences.sceneStringThe camera shot type. Possible values: closeUp (close-up), medium-closeUp (medium close-up), medium (medium shot), and medium-long (medium-long shot).
tagsJSONArrayA list of tags for detected elements, such as objects and scenes.
tags.mainTagNameStringThe main tag.
tags.subTagNameStringThe subtag.
tags.ratiodoubleThe occurrence rate of the tag, ranging from 0 to 1.
tags.occurrencesJSONArrayA list of the tag's occurrences.
tags.occurrences.scoredoubleThe confidence score.
tags.occurrences.fromdoubleThe start time of the occurrence, in seconds.
tags.occurrences.todoubleThe end time of the occurrence, in seconds.
classificationsJSONArrayA list of video classifications.
classifications.scoredoubleThe confidence score for the classification.
classifications.category1StringThe primary category. For example: Lifestyle, Animation, or Automotive.
classifications.category2StringThe secondary category. For example, a video with the Lifestyle primary category might have a secondary category of Health or Home.

Video Tag Examples

CategoryExample
programe.g., The Amazing Race or America's Got Talent
rolee.g., doctor, nurse, or teacher
objecte.g., piano, cup, table, car, cosmetics, or food
logoe.g., CNN, BBC, YouTube, or Netflix
actione.g., dancing, kissing, hugging, meeting, singing, making a phone call, riding a horse, or fighting
locatione.g., Tiananmen Square, the Statue of Liberty, the Leshan Giant Buddha, China, or the United States
scenee.g., a bedroom, subway station, terraced fields, beach, or desert

ImageLabel Data Structure

ParameterTypeDescription
personsJSONArrayThe detected persons.
persons.nameStringThe name of the recognized person.
persons.categoryStringThe person category. Valid values: celebrity, politician, and sensitive person.
persons.scoredoubleThe confidence score for the person recognition.
persons.positionJSONObjectThe bounding box of the detected person.
persons.position.leftTopint[]The x and y coordinates of the top-left corner of the bounding box.
persons.position.rightBottomint[]The x and y coordinates of the bottom-right corner of the bounding box.
persons.sceneStringThe shot type. Valid values: closeUp (close-up), medium-closeUp (medium close-up), medium (medium shot), and medium-long (long shot).
tagsJSONArrayThe detected tags for elements such as objects and scenes. See the table below for examples.
tags.mainTagNameStringThe main tag.
tags.subTagNameStringThe sub-tag.
tags.scoredoubleThe confidence score.

Examples of image tags

Main tag nameSub tag name
charactersuch as doctor, nurse, or teacher
locationsuch as Tiananmen Square, the Statue of Liberty, the Leshan Giant Buddha, China, or the United States
action eventsuch as speaking
logosuch as CCTV-1, CCTV-2, Youku, or Dragon TV
action eventsuch as dancing, kissing, hugging, meeting, singing, making a phone call, horseback riding, or fighting
objectsuch as a piano, cup, table, scrambled eggs with tomatoes, car, or cosmetics
scenesuch as a bedroom, subway station, terraced fields, beach, or desert

The TextLabel data structure (output from ASR and OCR)

ParameterTypeDescription
tagsJSONArrayAn array of tags.
tags.nameStringThe tag key.
tags.valueStringThe tag value. Separate multiple values with a comma (,).

Examples of image tags

ParameterValue
regione.g., us-east-1, eu-west-2, or ap-southeast-1
organizatione.g., Marketing Department, Development Team, or Acme Corp
identifiere.g., user-12345, prod-db-01, or f7b4c278-6e54-4a8f-8f9d-3e58a9c8b4e7
keyworde.g., database, billing, or error

CPVLabel data structure

  • cates: A list of categories (primary, secondary, or tertiary).

  • entities: A list of category attributes containing information from a knowledge graph.

  • hotwords: A list of popular terms.

  • freeTags: A list of free tags (keywords).

ParameterTypeExample valueDescription
typeStringhmiThe result type. Valid values: hmi (human-machine collaboration result) and autp (automated tagging result).
catesJSONArray-An array of categorization results.
cates.labelLevel1StringTravelThe level-1 tag.
cates.labelLevel2StringScenic TravelThe level-2 tag.
cates.labelString""The tag name. This field may be empty if the algorithm does not return a value.
cates.appearanceProbabilitydouble0.96The occurrence probability.
cates.detailInfoJSONArray-An array of objects, each detailing an occurrence of the category.
cates.detailInfo.scoredouble0.9The confidence score.
cates.detailInfo.startTimedouble0.021The start time.
cates.detailInfo.endTimedouble29.021The end time.
entitiesJSONArray-An array of detected entities.
entities.labelLevel1StringlocationThe level-1 tag.
entities.labelLevel2StringlandmarkThe level-2 tag.
entities.labelStringHuangguoshu WaterfallThe tag name.
entities.appearanceProbabilitydouble0.067The occurrence probability.
entities.knowledgeInfoString{"name": "Huangguoshu Waterfall", "nameEn": "Huangguoshu Waterfall", "description": "One of the four major waterfalls in Asia"}The knowledge graph information. For a complete list of fields, refer to the corresponding tables for Entertainment IPs, Music, Persons, Landmarks, and Objects.
entities.detailInfoJSONArray-An array of objects, each detailing an occurrence of the entity.
entities.detailInfo.scoredouble0.33292606472969055The confidence score.
entities.detailInfo.startTimedouble6.021The start time.
entities.detailInfo.endTimedouble8.021The end time.
entities.detailInfo.trackDataJSONArray-An array of objects that contains structured tracking data for the entity.
entities.detailInfo.trackData.scoredouble0.32The confidence score.
entities.detailInfo.trackData.bboxinteger[]23, 43, 45, 67The bounding box.
entities.detailInfo.trackData.timestampdouble7.9The timestamp.
hotwordsJSONArray-An array of detected hotwords.
hotwords.labelLevel1StringkeywordThe level-1 tag.
hotwords.labelLevel1StringkeywordThe level-1 tag.
hotwords.labelLevel2String""The level-2 tag.
hotwords.labelStringChina Meteorological AdministrationThe hotword.
hotwords.appearanceProbabilitydouble0.96The occurrence probability.
hotwords.detailInfoJSONArrayAn array of objects, each detailing an occurrence of the hotword.
hotwords.detailInfo.scoredouble1.0The confidence score.
hotwords.detailInfo.startTimedouble0.021The start time.
hotwords.detailInfo.endTimedouble29.021The end time.
freeTagsJSONArrayAn array of detected free tags.
freeTags.labelLevel1StringkeywordThe level-1 tag.
freeTags.labelLevel2String""The level-2 tag.
freeTags.labelStringNational Meteorological CenterThe tag name.
freeTags.appearanceProbabilitydouble0.96The occurrence probability.
freeTags.detailInfoJSONArrayAn array of objects, each detailing an occurrence of the free tag.
freeTags.detailInfo.scoredouble0.9The confidence score.
freeTags.detailInfo.startTimedouble0.021The start time.
freeTags.detailInfo.endTimedouble29.021The end time.

Automatic speech recognition (ASR) results

ParameterTypeDescription
detailsJSONArrayDetailed results of the task.
details.fromdoubleStart timestamp, in seconds.
details.todoubleEnd timestamp, in seconds.
details.contentStringThe recognized text.

Text recognition (OCR) results

ParameterTypeDescription
detailsJSONArrayDetailed task results.
details.timestampdoubleThe timestamp, in seconds.
details.infoJSONArrayAn array of recognition results for the corresponding timestamp.
details.info.scoredoubleThe confidence score for the recognition.
details.info.positionJSONObjectCoordinates of the recognized text.
details.info.position.leftTopint[]The x and y coordinates of the top-left corner.
details.info.position.rightBottomint[]The x and y coordinates of the bottom-right corner.
details.info.contentStringThe recognized text.

Metadata Annotation Results

Note

If a customer specifies needMetaData in a SubmitSmarttagJob call without using manual labeling, QuerySmarttagJob returns the original title.

ParameterTypeDescription
titleStringThe title.

Subtitle

ParameterTypeDescription
detailsJSONArrayDetailed task results.
details.allResultUrlStringURL for all subtitle results. The URL is valid for six months after task completion.
details.chResultUrlStringURL for the Chinese subtitle results. The URL is valid for six months after task completion.
details.engResultUrlStringURL for the English subtitle results. The URL is valid for six months after task completion.
Note

The content at the subtitle results URL uses the following format: sequence number+time period+subtitle content (one subtitle per line).

NLP results

ParameterTypeDescription
transcriptionobjectThe generated transcription.
autoChaptersobjectThe generated chapter overview.
summarizationobjectThe generated large model summary.
meetingAssistanceobjectThe generated intelligent meeting minutes.
translationobjectThe generated text translation.

Transcription

ParameterTypeDescription
transcriptionobjectThe transcription result object.
transcription.paragraphslist[]A list of paragraph objects that make up the transcription.
transcription.paragraphs[i].paragraphIdstringThe paragraph id.
transcription.paragraphs[i].speakerIdstringThe speaker id.
transcription.paragraphs[i].wordslist[]A list of word objects in the paragraph.
transcription.paragraphs[i].words[i].idintThe sequence number of the word. This field can typically be ignored.
transcription.paragraphs[i].words[i].sentenceIdintThe sentence id. Words with the same sentenceId form a sentence.
transcription.paragraphs[i].words[i].startlongThe start time of the word as a relative timestamp in milliseconds.
transcription.paragraphs[i].words[i].endlongThe end time of the word as a relative timestamp in milliseconds.
transcription.paragraphs[i].words[i].textstringThe text of the word.

Summarization (full-text, speaker-based, and question-based)

ParameterTypeDescription
summarizationobjectThe summary result object, which contains results for zero or more summary types.
summarization.paragraphSummarystringThe full-text summary.
summarization.conversationalSummarylist[]A list of conversational summaries.
summarization.conversationalSummary[i].speakerIdstringThe speaker ID.
summarization.conversationalSummary[i].speakerNamestringThe speaker name.
summarization.conversationalSummary[i].summarystringThe summary for this speaker.
summarization.questionsAnsweringSummarylist[]A list of Q&A summaries.
summarization.questionsAnsweringSummary[i].questionstringThe question.
summarization.questionsAnsweringSummary[i].sentenceIdsOfQuestionlist[]A list of SentenceId values from the original transcription corresponding to the question.
summarization.questionsAnsweringSummary[i].answerstringThe answer to the question.
summarization.questionsAnsweringSummary[i].sentenceIdsOfAnswerlist[]A list of SentenceId values from the original transcription corresponding to the answer.
summarization.mindMapSummarylist[object]A list of mind map summaries, which represent topic hierarchies.
summarization.mindMapSummary[i].titlestringThe title of the mind map.
summarization.mindMapSummary[i].topiclist[object]An array of top-level topics, each of which can contain subtopics.
summarization.mindMapSummary[i].topic[i].titlestringThe title of the topic.
summarization.mindMapSummary[i].topic[i].topiclist[object]An array of subtopics for the parent topic. This array can be empty.

Translation

ParameterTypeDescription
translationobjectThe translation result object.
translation.paragraphslist[]A list of translated paragraphs corresponding to the speech recognition result.
translation.paragraphs.paragraphIdstringThe paragraph ID, which corresponds to the ParagraphId in the speech recognition result.
translation.paragraphs.sentenceslist[]A list of sentence objects.
translation.paragraphs.sentences[i].sentenceIdlongThe sentence ID.
translation.paragraphs.sentences[i].startlongThe start time of the sentence in milliseconds, relative to the beginning of the audio.
translation.paragraphs.sentences[i].endlongThe end time of the sentence in milliseconds, relative to the beginning of the audio.
translation.paragraphs.sentences[i].textstringThe translated text, which corresponds to the sentence text in the speech recognition result.

autoChapters

ParameterTypeDescription
autoChapterslist[]A list of chapter overview objects.
autoChapters[i].idintThe chapter ID.
autoChapters[i].startlongThe chapter's start time, as a relative timestamp in milliseconds from the beginning of the audio.
autoChapters[i].endlongThe chapter's end time, as a relative timestamp in milliseconds from the beginning of the audio.
autoChapters[i].headlinestringThe chapter's headline.
autoChapters[i].summarystringThe chapter's summary.

meetingAssistance (intelligent meeting summary extraction: keywords, key sentences, and action items)

ParameterTypeDescription
meetingAssistanceobjectA container for the meeting assistance results. This object can hold multiple result types or be empty.
meetingAssistance.keywordslist[]The keyword extraction results.
meetingAssistance.keySentenceslist[]The key sentence extraction results, also known as key content.
meetingAssistance.keySentences[i].idlongThe ID of the key sentence.
meetingAssistance.keySentences[i].sentenceIdlongThe ID of the corresponding sentence in the original ASR transcription.
meetingAssistance.keySentences[i].startlongThe start time in milliseconds, relative to the beginning of the audio.
meetingAssistance.keySentences[i].endlongThe end time in milliseconds, relative to the beginning of the audio.
meetingAssistance.keySentences[i].textstringThe key sentence text.
meetingAssistance.actionslist[]A list of action items and action item summaries.
meetingAssistance.actions[i].idlongThe ID of the action item.
meetingAssistance.actions[i].sentenceIdlongThe ID of the corresponding sentence in the original ASR transcription.
meetingAssistance.actions[i].startlongThe start time in milliseconds, relative to the beginning of the audio.
meetingAssistance.actions[i].endlongThe end time in milliseconds, relative to the beginning of the audio.
meetingAssistance.actions[i].textstringThe action item text.
meetingAssistance.classificationsobjectThe scene classification results. Currently, three scene classes are supported.
meetingAssistance.classifications.interviewfloatThe confidence score for the interview class.
meetingAssistance.classifications.lecturefloatThe confidence score for the lecture class.
meetingAssistance.classifications.meetingfloatThe confidence score for the meeting class.

Examples

Success response

JSON format

{
  "JobStatus": "Success",
  "RequestId": "******11-DB8D-4A9A-875B-275798******\n",
  "UserData": "{\"userId\":\"123432412831\"}\n",
  "Results": {
    "result": [
      {
        "Type": "ASR",
        "Data": "{\"title\":\"example-title-****\"}\t\n"
      }
    ]
  },
  "TemplateId": "",
  "Params": "",
  "Input": {
    "Type": "",
    "Media": ""
  }
}

Error codes

See Error Codes for a complete list.

Release notes

See Release Notes for a complete list.