QuerySmarttagJob

更新时间:
复制 MD 格式

Querying a smart tag task.

Try it now

Try this API in OpenAPI Explorer, no manual signing needed. Successful calls auto-generate SDK code matching your parameters. Download it with built-in credential security for local usage.

Test

RAM authorization

The table below describes the authorization required to call this API. You can define it in a Resource Access Management (RAM) policy. The table's columns are detailed below:

  • Action: The actions can be used in the Action element of RAM permission policy statements to grant permissions to perform the operation.

  • API: The API that you can call to perform the action.

  • Access level: The predefined level of access granted for each API. Valid values: create, list, get, update, and delete.

  • Resource type: The type of the resource that supports authorization to perform the action. It indicates if the action supports resource-level permission. The specified resource must be compatible with the action. Otherwise, the policy will be ineffective.

    • For APIs with resource-level permissions, required resource types are marked with an asterisk (*). Specify the corresponding Alibaba Cloud Resource Name (ARN) in the Resource element of the policy.

    • For APIs without resource-level permissions, it is shown as All Resources. Use an asterisk (*) in the Resource element of the policy.

  • Condition key: The condition keys defined by the service. The key allows for granular control, applying to either actions alone or actions associated with specific resources. In addition to service-specific condition keys, Alibaba Cloud provides a set of common condition keys applicable across all RAM-supported services.

  • Dependent action: The dependent actions required to run the action. To complete the action, the RAM user or the RAM role must have the permissions to perform all dependent actions.

Action

Access level

Resource type

Condition key

Dependent action

ice:QuerySmarttagJob

get

*All Resource

*

None None

Request parameters

Parameter

Type

Required

Description

Example

JobId

string

Yes

The ID of the smart tagging job. You can obtain this ID from the response to the SubmitSmarttagJob call.

88c6ca184c0e47098a5b665e2****

Params

string

No

Additional request parameters, formatted as a JSON string. For example: {"labelResultType":"auto"}. The labelResultType parameter supports the following values:

  • auto: machine-generated tagging results

  • hmi: human-in-the-loop tagging results

{"labelResultType":"auto"}

Response elements

Element

Type

Description

Example

object

JobStatus

string

The job status. Valid values:

  • Success: The job was successful.

  • Fail: The job failed.

  • Processing: The job is in progress.

  • Submitted: The job is queued for processing.

Success

RequestId

string

The request ID.

******11-DB8D-4A9A-875B-275798******

UserData

string

The custom data passed through the MNS callback. For details on the message format, see the callback message format definitions below.

{"userId":"123432412831"}

Results

object

Result

array<object>

An array of analysis result objects.

object

Type

string

The type of the analysis result.

  • - Analysis result types for Tagging v1.0:

  1. TextLabel: text labels

  2. VideoLabel: video labels

  3. ASR: Raw results from automatic speech recognition. This is not returned by default.

  4. OCR: Raw results from optical character recognition. This is not returned by default.

  5. NLP: Results from natural language processing. This is not returned by default.

  • - Analysis result types for Tagging v2.0:

  1. CPVLabel

  2. Meta: Metadata such as the video title. This is not returned by default.

  • - Analysis result types for Tagging v2.0-custom:

  1. CPVLabel

  2. Meta: Metadata such as the video title. This is not returned by default.

Meta

Data

string

The analysis result data, formatted as a JSON string. The data structure depends on the Type value. For more information, see the Result parameter descriptions below.

{"title":"example-title-****"}

Usages

object

Usage

array<object>

object

Type

string

Quota

integer

Callback message format When the execution status of a smart tag job changes, MPS sends a callback message to a queue you specify using the MPS UpdatePipeline API. The message body is a JSON string containing the following fields:

ParameterTypeDescription
TypestringA fixed string with the value smarttag, indicating a smart tag job.
JobIdstringThe unique ID of the job.
StatestringThe current state of the job. This value matches the JobStatus returned by the QuerySmarttagJob operation.
StatestringThe current state of the job. This value matches the JobStatus returned by the QuerySmarttagJob operation.
UserDatastringThe user data passed to the SubmitSmarttagJob operation.
UserDatastringThe user data passed to the SubmitSmarttagJob operation.

Result parameters

VideoLabel data structure

ParameterTypeDescription
personsJSONArrayAn array of objects, each representing a detected person.
persons.nameStringThe name of the recognized person.
persons.categoryStringThe person's category. Possible values are: celebrity, politician, sensitive (sensitive person), and unknown (unknown person). For a person from a custom person library, this field contains the library's ID.
persons.ratiodoubleThe person's appearance ratio. The value ranges from 0 to 1.
persons.occurrencesJSONArrayAn array of objects, each detailing an occurrence of the person.
persons.occurrences.scoredoubleThe confidence score for the recognition.
persons.occurrences.fromdoubleThe start time of the person's appearance, in seconds.
persons.occurrences.todoubleThe end time of the person's appearance, in seconds.
persons.occurrences.positionJSONObjectThe face coordinates.
persons.occurrences.position.leftTopint[]The x and y coordinates of the top-left corner of the face's bounding box.
persons.occurrences.position.rightBottomint[]The x and y coordinates of the bottom-right corner of the face's bounding box.
persons.occurrences.timestampdoubleThe timestamp, in seconds, corresponding to the face coordinates.
persons.occurrences.sceneStringThe shot type. Possible values are: closeUp (close-up shot), medium-closeUp (medium close-up shot), medium (medium shot), and medium-long (medium-long shot).
tagsJSONArrayAn array of objects representing detected tags, such as objects and scenes. See the table below for examples.
tags.mainTagNameStringThe main tag.
tags.subTagNameStringThe sub-tag.
tags.ratiodoubleThe tag's appearance ratio. The value ranges from 0 to 1.
tags.occurrencesJSONArrayAn array of objects, each detailing an occurrence of the tag.
tags.occurrences.scoredoubleThe confidence score for the tag.
tags.occurrences.fromdoubleThe start time of the occurrence, in seconds.
tags.occurrences.todoubleThe end time of the occurrence, in seconds.
classificationsJSONArrayAn array of objects, each representing a video classification.
classifications.scoredoubleThe confidence score for the classification.
classifications.category1StringThe level-1 category, such as Lifestyle, Anime, or Automotive.
classifications.category2StringThe level-2 category, a sub-category of the level-1 category. For example, the Lifestyle category includes sub-categories like Health or Home.

Video tags: examples

TagExamples
program"Where Are We Going, Dad?", "Top Funny Comedian"
characterdoctor, nurse, teacher
objectpiano, cup, table, scrambled eggs with tomatoes, car, cosmetics
logoCCTV-1, CCTV-2, Youku, Dragon TV
actiondancing, kissing, hugging, meeting, singing, making a phone call, horseback riding, fighting
locationTiananmen Square, the Statue of Liberty, Leshan Giant Buddha, China, the United States
scenebedroom, subway station, terraced fields, beach, desert

ImageLabel data structure

ParameterTypeDescription
personsJSONArrayAn array of objects, each representing a detected person.
persons.nameStringThe name of the recognized person.
persons.categoryStringThe person category. Valid values are celebrity, politician, and sensitive.
persons.scoredoubleThe confidence score for the detected person.
persons.positionJSONObjectThe face coordinates.
persons.position.leftTopint[]The x and y coordinates for the top-left corner of the bounding box.
persons.position.rightBottomint[]The x and y coordinates for the bottom-right corner of the bounding box.
persons.sceneStringThe camera shot. Valid values include: closeUp (close-up), medium-closeUp (medium close-up), medium (medium shot), and medium-long (medium-long shot).
tagsJSONArrayTags for detected objects and scenes. For more information, see the examples in the table below.
tags.mainTagNameStringThe main tag.
tags.subTagNameStringThe sub-tag.
tags.scoredoubleThe confidence score for the tag.

Image tag examples

Main tagSub-tag
charactere.g., doctor, nurse, teacher
locatione.g., Tiananmen Square, the Statue of Liberty, Leshan Giant Buddha, China, United States
actione.g., speaking
logoe.g., CCTV1, CCTV2, Youku, Dragon TV
actione.g., dancing, kissing, hugging, meeting, singing, making a phone call, horseback riding, fighting
objecte.g., piano, cup, table, scrambled eggs with tomatoes, car, cosmetics
scenee.g., bedroom, subway station, terraced fields, beach, desert

The TextLabel Data Structure (Source: ASR and OCR)

ParameterTypeDescription
tagsJSONArrayA list of tags for the resource.
tags.nameStringThe key of the tag.
tags.valueStringThe value of the tag. Use a comma to separate multiple values.

Example image tags

ParameterValue
regionExamples: Tiananmen Square, the Statue of Liberty, Leshan Giant Buddha, China, United States
organizationExamples: China Wildlife Conservation Association, China Media Group
logoExamples: Nike, Li-Ning
keywordExample: core strength

CPVLabel data structure

  • cates: Hierarchical categories (e.g., level-1, level-2, and level-3).

  • entities: Category attributes (enriched with knowledge graph information).

  • hotwords: Popular keywords (terms reflecting user interest).

  • freeTags: Free-form tags (user-defined keywords).

ParameterTypeExample valueDescription
typeStringhmiThe result type. Valid values: hmi (human-machine interaction result) and autp (automated tagging result).
catesJSONArray-The category classification results.
cates.labelLevel1StringTravelThe level-1 label.
cates.labelLevel2StringTravel SceneryThe level-2 label.
cates.labelString""The label name. The algorithm may return an empty string.
cates.appearanceProbabilitydouble0.96The appearance probability.
cates.detailInfoJSONArray-Detailed information about the category.
cates.detailInfo.scoredouble0.9The confidence score.
cates.detailInfo.startTimedouble0.021The start time.
cates.detailInfo.endTimedouble29.021The end time.
entitiesJSONArray-Detected entities.
entities.labelLevel1StringRegionThe level-1 label.
entities.labelLevel2StringLandmarkThe level-2 label.
entities.labelStringHuangguoshu WaterfallThe label name.
entities.appearanceProbabilitydouble0.067The appearance probability.
entities.knowledgeInfoString{"name": "黄果树瀑布", "nameEn": "Huangguoshu Waterfall", "description": "亚洲四大瀑布之一"}Information from the knowledge graph. For a complete list of fields, refer to the tables for the following knowledge graph types: film and television IP, music, person, landmark, and object.
entities.detailInfoJSONArray-Detailed information about the entity.
entities.detailInfo.scoredouble0.33292606472969055The confidence score.
entities.detailInfo.startTimedouble6.021The start time.
entities.detailInfo.endTimedouble8.021The end time.
entities.detailInfo.trackDataJSONArray-The structured information of the entity label.
entities.detailInfo.trackData.scoredouble0.32The confidence score.
entities.detailInfo.trackData.bboxinteger[]23,43,45,67The bounding box coordinates.
entities.detailInfo.trackData.timestampdouble7.9The timestamp.
hotwordsJSONArray-Detected hotwords.
hotwords.labelLevel1StringkeywordThe level-1 label.
hotwords.labelLevel1StringkeywordThe level-1 label.
hotwords.labelLevel2String""The level-2 label.
hotwords.labelString中国气象局The hotword content.
hotwords.appearanceProbabilitydouble0.96The appearance probability.
hotwords.detailInfoJSONArrayDetailed information about the hotword.
hotwords.detailInfo.scoredouble1.0The confidence score.
hotwords.detailInfo.startTimedouble0.021The start time.
hotwords.detailInfo.endTimedouble29.021The end time.
freeTagsJSONArrayThe free-form tags.
freeTags.labelLevel1StringkeywordThe level-1 label.
freeTags.labelLevel2String""The level-2 label.
freeTags.labelString中央气象台The tag content.
freeTags.appearanceProbabilitydouble0.96The appearance probability.
freeTags.detailInfoJSONArrayDetailed information about the free-form tag.
freeTags.detailInfo.scoredouble0.9The confidence score.
freeTags.detailInfo.startTimedouble0.021The start time.
freeTags.detailInfo.endTimedouble29.021The end time.

ASR result

ParameterTypeDescription
detailsJSONArrayDetailed task results.
details.fromdoubleStart timestamp of the speech segment, in seconds.
details.todoubleEnd timestamp of the speech segment, in seconds.
details.contentStringTranscribed text from the speech segment.

OCR results

ParameterTypeDescription
detailsJSONArrayAn array of objects containing the task details.
details.timestampdoubleThe timestamp, in seconds.
details.infoJSONArrayAn array of objects, each containing recognized information for the corresponding timestamp.
details.info.scoredoubleThe confidence score.
details.info.positionJSONObjectThe text coordinates.
details.info.position.leftTopint[]The coordinates of the top-left corner.
details.info.position.rightBottomint[]The coordinates of the bottom-right corner.
details.info.contentStringThe recognized text content.

Meta annotation results

Note

If you do not use human annotation and you specify the needMetaData parameter when submitting a task with SubmitSmarttagJob , QuerySmarttagJob will return the original title that you entered.

ParameterTypeDescription
titleStringThe title.

Subtitle extraction results

ParameterTypeDescription
detailsJSONArrayDetailed results of the task.
details.allResultUrlStringURL for all subtitle results. The URL is valid for six months after the task is completed.
details.chResultUrlStringURL for the Chinese subtitle results. The URL is valid for six months after the task is completed.
details.engResultUrlStringURL for the English subtitle results. The URL is valid for six months after the task is completed.
Note

The subtitle result URL returns content in the following format: sequence number + time range + subtitle content (one subtitle per line).

NLP results

ParameterTypeDescription
transcriptionobjectThe speech-to-text output.
autoChaptersobjectThe automatically generated chapters.
summarizationobjectThe summary generated by the large model.
meetingAssistanceobjectThe intelligent meeting minutes.
translationobjectThe translated text.

Transcription

ParameterTypeDescription
transcriptionobjectThe speech transcription result.
transcription.paragraphslist[]A list of paragraphs in the speech transcription.
transcription.paragraphs[i].paragraphIdstringThe paragraph-level ID.
transcription.paragraphs[i].speakerIdstringThe speaker ID.
transcription.paragraphs[i].wordslist[]A list of words in the paragraph.
transcription.paragraphs[i].words[i].idintThe sequence number of the word. This field can typically be ignored.
transcription.paragraphs[i].words[i].sentenceIdintThe sentence ID. Words with the same sentenceId form a sentence.
transcription.paragraphs[i].words[i].startlongThe start time of the word, in milliseconds, relative to the beginning of the audio.
transcription.paragraphs[i].words[i].endlongThe end time of the word, in milliseconds, relative to the beginning of the audio.
transcription.paragraphs[i].words[i].textstringThe text of the word.

Summarization: full-text summary, speaker summary, and question summary

ParameterTypeDescription
summarizationobjectThe summary result object, which can contain results for multiple summary types.
summarization.paragraphSummarystringThe paragraph summary result.
summarization.conversationalSummarylist[]A list of conversational summary results.
summarization.conversationalSummary[i].speakerIdstringThe ID of the speaker.
summarization.conversationalSummary[i].speakerNamestringThe name of the speaker.
summarization.conversationalSummary[i].summarystringThe summary for the speaker.
summarization.questionsAnsweringSummarylist[]A list of question-answering summary results.
summarization.questionsAnsweringSummary[i].questionstringThe question extracted from the speech transcription.
summarization.questionsAnsweringSummary[i].sentenceIdsOfQuestionlist[]IDs of sentences from the original speech transcription that form the question.
summarization.questionsAnsweringSummary[i].answerstringThe answer to the question.
summarization.questionsAnsweringSummary[i].sentenceIdsOfAnswerlist[]The IDs of sentences from the original speech transcription that are summarized in the answer.
summarization.mindMapSummarylist[object]A list of mind map summary results, which may include topic summaries and their relationships.
summarization.mindMapSummary[i].titlestringThe title of the mind map.
summarization.mindMapSummary[i].topiclist[object]A list of the main topics, each with their respective sub-topics.
summarization.mindMapSummary[i].topic[i].titlestringThe topic title.
summarization.mindMapSummary[i].topic[i].topiclist[object]A list of sub-topics for the current topic, which can be empty.

Full-text translation

ParameterTypeDescription
translationobjectThe translation object.
translation.paragraphslist[]A list of translated paragraphs, corresponding to the speech recognition result.
translation.paragraphs.paragraphIdstringThe paragraph ID, which corresponds to the ParagraphId from the speech recognition result.
translation.paragraphs.sentenceslist[]The sentences that make up the paragraph.
translation.paragraphs.sentences[i].sentenceIdlongThe unique ID for the sentence.
translation.paragraphs.sentences[i].startlongThe start time of the sentence in milliseconds, relative to the beginning of the audio.
translation.paragraphs.sentences[i].endlongThe end time of the sentence in milliseconds, relative to the beginning of the audio.
translation.paragraphs.sentences[i].textstringThe translated text of the sentence.

autoChapters (chapter recognition)

ParameterTypeDescription
autoChapterslist[]A list of chapter overview objects.
autoChapters[i].idintChapter ID.
autoChapters[i].startlongThe start time of the chapter, as a relative timestamp in milliseconds from the start of the audio.
autoChapters[i].endlongThe end time of the chapter, as a relative timestamp in milliseconds from the start of the audio.
autoChapters[i].headlinestringChapter headline.
autoChapters[i].summarystringChapter summary.

meetingAssistance (summary, keyword, key sentence, and action item extraction)

ParameterTypeDescription
meetingAssistanceobjectThe intelligent minutes results object, which may contain zero or more results of various types.
meetingAssistance.keywordslist[]The keyword extraction results.
meetingAssistance.keySentenceslist[]The key sentence extraction results, also known as key content.
meetingAssistance.keySentences[i].idlongThe ID of the key sentence.
meetingAssistance.keySentences[i].sentenceIdlongThe ID of the corresponding sentence in the original ASR transcription.
meetingAssistance.keySentences[i].startlongThe start time relative to the beginning of the audio, in milliseconds.
meetingAssistance.keySentences[i].endlongThe end time relative to the beginning of the audio, in milliseconds.
meetingAssistance.keySentences[i].textstringThe text of the key sentence.
meetingAssistance.actionslist[]A list of action items.
meetingAssistance.actions[i].idlongThe ID of the action item.
meetingAssistance.actions[i].sentenceIdlongThe ID of the corresponding sentence in the original ASR transcription.
meetingAssistance.actions[i].startlongThe start time relative to the beginning of the audio, in milliseconds.
meetingAssistance.actions[i].endlongThe end time relative to the beginning of the audio, in milliseconds.
meetingAssistance.actions[i].textstringThe text of the action item.
meetingAssistance.classificationsobjectThe scene classification results. Currently, only three scene types are supported.
meetingAssistance.classifications.interviewfloatThe confidence score for the interview scene.
meetingAssistance.classifications.lecturefloatThe confidence score for the lecture scene.
meetingAssistance.classifications.meetingfloatThe confidence score for the meeting scene.

Examples

Success response

JSON format

{
  "JobStatus": "Success",
  "RequestId": "******11-DB8D-4A9A-875B-275798******",
  "UserData": "{\"userId\":\"123432412831\"}",
  "Results": {
    "Result": [
      {
        "Type": "Meta",
        "Data": "{\"title\":\"example-title-****\"}\t\n"
      }
    ]
  },
  "Usages": {
    "Usage": [
      {
        "Type": "",
        "Quota": 0
      }
    ]
  }
}

Error codes

See Error Codes for a complete list.

Release notes

See Release Notes for a complete list.