QuerySmarttagJob-阿里云帮助中心

Querying a smart tag task.

Try it now

Try this API in OpenAPI Explorer, no manual signing needed. Successful calls auto-generate SDK code matching your parameters. Download it with built-in credential security for local usage.

Test

RAM authorization

The table below describes the authorization required to call this API. You can define it in a Resource Access Management (RAM) policy. The table's columns are detailed below:

Action: The actions can be used in the Action element of RAM permission policy statements to grant permissions to perform the operation.
API: The API that you can call to perform the action.
Access level: The predefined level of access granted for each API. Valid values: create, list, get, update, and delete.
Resource type: The type of the resource that supports authorization to perform the action. It indicates if the action supports resource-level permission. The specified resource must be compatible with the action. Otherwise, the policy will be ineffective.
- For APIs with resource-level permissions, required resource types are marked with an asterisk (*). Specify the corresponding Alibaba Cloud Resource Name (ARN) in the Resource element of the policy.
- For APIs without resource-level permissions, it is shown as All Resources. Use an asterisk (*) in the Resource element of the policy.
Condition key: The condition keys defined by the service. The key allows for granular control, applying to either actions alone or actions associated with specific resources. In addition to service-specific condition keys, Alibaba Cloud provides a set of common condition keys applicable across all RAM-supported services.
Dependent action: The dependent actions required to run the action. To complete the action, the RAM user or the RAM role must have the permissions to perform all dependent actions.

Action

Access level

Resource type

Condition key

Dependent action

ice:QuerySmarttagJob

get

*All Resource

*

None

Request parameters

Parameter	Type	Required	Description	Example
JobId	string	Yes	The ID of the smart tagging job. You can obtain this ID from the response to the SubmitSmarttagJob call.	88c6ca184c0e47098a5b665e2****
Params	string	No	Additional request parameters, formatted as a JSON string. For example: `{"labelResultType":"auto"}`. The `labelResultType` parameter supports the following values: `auto`: machine-generated tagging results `hmi`: human-in-the-loop tagging results	{"labelResultType":"auto"}

Response elements

Element	Type	Description	Example
	object
JobStatus	string	The job status. Valid values: Success: The job was successful. Fail: The job failed. Processing: The job is in progress. Submitted: The job is queued for processing.	Success
RequestId	string	The request ID.	****11-DB8D-4A9A-875B-275798****
UserData	string	The custom data passed through the MNS callback. For details on the message format, see the callback message format definitions below.	{"userId":"123432412831"}
Results	object
Result	array<object>	An array of analysis result objects.
	object
Type	string	The type of the analysis result. - Analysis result types for Tagging v1.0: TextLabel: text labels VideoLabel: video labels ASR: Raw results from automatic speech recognition. This is not returned by default. OCR: Raw results from optical character recognition. This is not returned by default. NLP: Results from natural language processing. This is not returned by default. - Analysis result types for Tagging v2.0: CPVLabel Meta: Metadata such as the video title. This is not returned by default. - Analysis result types for Tagging v2.0-custom: CPVLabel Meta: Metadata such as the video title. This is not returned by default.	Meta
Data	string	The analysis result data, formatted as a JSON string. The data structure depends on the `Type` value. For more information, see the `Result` parameter descriptions below.	{"title":"example-title-****"}
Usages	object
Usage	array<object>
	object
Type	string
Quota	integer

Callback message format When the execution status of a smart tag job changes, MPS sends a callback message to a queue you specify using the MPS UpdatePipeline API. The message body is a JSON string containing the following fields:

Parameter	Type	Description
Type	string	A fixed string with the value `smarttag`, indicating a smart tag job.
JobId	string	The unique ID of the job.
State	string	The current state of the job. This value matches the `JobStatus` returned by the `QuerySmarttagJob` operation.
State	string	The current state of the job. This value matches the `JobStatus` returned by the `QuerySmarttagJob` operation.
UserData	string	The user data passed to the `SubmitSmarttagJob` operation.
UserData	string	The user data passed to the `SubmitSmarttagJob` operation.

Result parameters

VideoLabel data structure

Parameter	Type	Description
persons	JSONArray	An array of objects, each representing a detected person.
persons.name	String	The name of the recognized person.
persons.category	String	The person's category. Possible values are: `celebrity`, `politician`, `sensitive` (sensitive person), and `unknown` (unknown person). For a person from a custom person library, this field contains the library's ID.
persons.ratio	double	The person's appearance ratio. The value ranges from 0 to 1.
persons.occurrences	JSONArray	An array of objects, each detailing an occurrence of the person.
persons.occurrences.score	double	The confidence score for the recognition.
persons.occurrences.from	double	The start time of the person's appearance, in seconds.
persons.occurrences.to	double	The end time of the person's appearance, in seconds.
persons.occurrences.position	JSONObject	The face coordinates.
persons.occurrences.position.leftTop	int[]	The x and y coordinates of the top-left corner of the face's bounding box.
persons.occurrences.position.rightBottom	int[]	The x and y coordinates of the bottom-right corner of the face's bounding box.
persons.occurrences.timestamp	double	The timestamp, in seconds, corresponding to the face coordinates.
persons.occurrences.scene	String	The shot type. Possible values are: `closeUp` (close-up shot), `medium-closeUp` (medium close-up shot), `medium` (medium shot), and `medium-long` (medium-long shot).
tags	JSONArray	An array of objects representing detected tags, such as objects and scenes. See the table below for examples.
tags.mainTagName	String	The main tag.
tags.subTagName	String	The sub-tag.
tags.ratio	double	The tag's appearance ratio. The value ranges from 0 to 1.
tags.occurrences	JSONArray	An array of objects, each detailing an occurrence of the tag.
tags.occurrences.score	double	The confidence score for the tag.
tags.occurrences.from	double	The start time of the occurrence, in seconds.
tags.occurrences.to	double	The end time of the occurrence, in seconds.
classifications	JSONArray	An array of objects, each representing a video classification.
classifications.score	double	The confidence score for the classification.
classifications.category1	String	The level-1 category, such as Lifestyle, Anime, or Automotive.
classifications.category2	String	The level-2 category, a sub-category of the level-1 category. For example, the Lifestyle category includes sub-categories like Health or Home.

Video tags: examples

Tag	Examples
program	"Where Are We Going, Dad?", "Top Funny Comedian"
character	doctor, nurse, teacher
object	piano, cup, table, scrambled eggs with tomatoes, car, cosmetics
logo	CCTV-1, CCTV-2, Youku, Dragon TV
action	dancing, kissing, hugging, meeting, singing, making a phone call, horseback riding, fighting
location	Tiananmen Square, the Statue of Liberty, Leshan Giant Buddha, China, the United States
scene	bedroom, subway station, terraced fields, beach, desert

ImageLabel data structure

Parameter	Type	Description
persons	JSONArray	An array of objects, each representing a detected person.
persons.name	String	The name of the recognized person.
persons.category	String	The person category. Valid values are `celebrity`, `politician`, and `sensitive`.
persons.score	double	The confidence score for the detected person.
persons.position	JSONObject	The face coordinates.
persons.position.leftTop	int[]	The x and y coordinates for the top-left corner of the bounding box.
persons.position.rightBottom	int[]	The x and y coordinates for the bottom-right corner of the bounding box.
persons.scene	String	The camera shot. Valid values include: `closeUp` (close-up), `medium-closeUp` (medium close-up), `medium` (medium shot), and `medium-long` (medium-long shot).
tags	JSONArray	Tags for detected objects and scenes. For more information, see the examples in the table below.
tags.mainTagName	String	The main tag.
tags.subTagName	String	The sub-tag.
tags.score	double	The confidence score for the tag.

Image tag examples

Main tag	Sub-tag
character	e.g., doctor, nurse, teacher
location	e.g., Tiananmen Square, the Statue of Liberty, Leshan Giant Buddha, China, United States
action	e.g., speaking
logo	e.g., CCTV1, CCTV2, Youku, Dragon TV
action	e.g., dancing, kissing, hugging, meeting, singing, making a phone call, horseback riding, fighting
object	e.g., piano, cup, table, scrambled eggs with tomatoes, car, cosmetics
scene	e.g., bedroom, subway station, terraced fields, beach, desert

The TextLabel Data Structure (Source: ASR and OCR)

Parameter	Type	Description
tags	JSONArray	A list of tags for the resource.
tags.name	String	The key of the tag.
tags.value	String	The value of the tag. Use a comma to separate multiple values.

Example image tags

Parameter	Value
region	Examples: Tiananmen Square, the Statue of Liberty, Leshan Giant Buddha, China, United States
organization	Examples: China Wildlife Conservation Association, China Media Group
logo	Examples: Nike, Li-Ning
keyword	Example: core strength

CPVLabel data structure

cates: Hierarchical categories (e.g., level-1, level-2, and level-3).
entities: Category attributes (enriched with knowledge graph information).
hotwords: Popular keywords (terms reflecting user interest).
freeTags: Free-form tags (user-defined keywords).

Parameter	Type	Example value	Description
type	String	hmi	The result type. Valid values: `hmi` (human-machine interaction result) and `autp` (automated tagging result).
cates	JSONArray	-	The category classification results.
cates.labelLevel1	String	Travel	The level-1 label.
cates.labelLevel2	String	Travel Scenery	The level-2 label.
cates.label	String	""	The label name. The algorithm may return an empty string.
cates.appearanceProbability	double	0.96	The appearance probability.
cates.detailInfo	JSONArray	-	Detailed information about the category.
cates.detailInfo.score	double	0.9	The confidence score.
cates.detailInfo.startTime	double	0.021	The start time.
cates.detailInfo.endTime	double	29.021	The end time.
entities	JSONArray	-	Detected entities.
entities.labelLevel1	String	Region	The level-1 label.
entities.labelLevel2	String	Landmark	The level-2 label.
entities.label	String	Huangguoshu Waterfall	The label name.
entities.appearanceProbability	double	0.067	The appearance probability.
entities.knowledgeInfo	String	{"name": "黄果树瀑布", "nameEn": "Huangguoshu Waterfall", "description": "亚洲四大瀑布之一"}	Information from the knowledge graph. For a complete list of fields, refer to the tables for the following knowledge graph types: film and television IP, music, person, landmark, and object.
entities.detailInfo	JSONArray	-	Detailed information about the entity.
entities.detailInfo.score	double	0.33292606472969055	The confidence score.
entities.detailInfo.startTime	double	6.021	The start time.
entities.detailInfo.endTime	double	8.021	The end time.
entities.detailInfo.trackData	JSONArray	-	The structured information of the entity label.
entities.detailInfo.trackData.score	double	0.32	The confidence score.
entities.detailInfo.trackData.bbox	integer[]	23，43，45，67	The bounding box coordinates.
entities.detailInfo.trackData.timestamp	double	7.9	The timestamp.
hotwords	JSONArray	-	Detected hotwords.
hotwords.labelLevel1	String	keyword	The level-1 label.
hotwords.labelLevel1	String	keyword	The level-1 label.
hotwords.labelLevel2	String	""	The level-2 label.
hotwords.label	String	中国气象局	The hotword content.
hotwords.appearanceProbability	double	0.96	The appearance probability.
hotwords.detailInfo	JSONArray		Detailed information about the hotword.
hotwords.detailInfo.score	double	1.0	The confidence score.
hotwords.detailInfo.startTime	double	0.021	The start time.
hotwords.detailInfo.endTime	double	29.021	The end time.
freeTags	JSONArray		The free-form tags.
freeTags.labelLevel1	String	keyword	The level-1 label.
freeTags.labelLevel2	String	""	The level-2 label.
freeTags.label	String	中央气象台	The tag content.
freeTags.appearanceProbability	double	0.96	The appearance probability.
freeTags.detailInfo	JSONArray		Detailed information about the free-form tag.
freeTags.detailInfo.score	double	0.9	The confidence score.
freeTags.detailInfo.startTime	double	0.021	The start time.
freeTags.detailInfo.endTime	double	29.021	The end time.

ASR result

Parameter	Type	Description
details	JSONArray	Detailed task results.
details.from	double	Start timestamp of the speech segment, in seconds.
details.to	double	End timestamp of the speech segment, in seconds.
details.content	String	Transcribed text from the speech segment.

OCR results

Parameter	Type	Description
details	JSONArray	An array of objects containing the task details.
details.timestamp	double	The timestamp, in seconds.
details.info	JSONArray	An array of objects, each containing recognized information for the corresponding timestamp.
details.info.score	double	The confidence score.
details.info.position	JSONObject	The text coordinates.
details.info.position.leftTop	int[]	The coordinates of the top-left corner.
details.info.position.rightBottom	int[]	The coordinates of the bottom-right corner.
details.info.content	String	The recognized text content.

Meta annotation results

Note

If you do not use human annotation and you specify the needMetaData parameter when submitting a task with SubmitSmarttagJob , QuerySmarttagJob will return the original title that you entered.

Parameter	Type	Description
title	String	The title.

Subtitle extraction results

Parameter	Type	Description
details	JSONArray	Detailed results of the task.
details.allResultUrl	String	URL for all subtitle results. The URL is valid for six months after the task is completed.
details.chResultUrl	String	URL for the Chinese subtitle results. The URL is valid for six months after the task is completed.
details.engResultUrl	String	URL for the English subtitle results. The URL is valid for six months after the task is completed.

Note

The subtitle result URL returns content in the following format: sequence number + time range + subtitle content (one subtitle per line).

NLP results

Parameter	Type	Description
transcription	object	The speech-to-text output.
autoChapters	object	The automatically generated chapters.
summarization	object	The summary generated by the large model.
meetingAssistance	object	The intelligent meeting minutes.
translation	object	The translated text.

Transcription

Parameter	Type	Description
transcription	object	The speech transcription result.
transcription.paragraphs	list[]	A list of paragraphs in the speech transcription.
transcription.paragraphs[i].paragraphId	string	The paragraph-level ID.
transcription.paragraphs[i].speakerId	string	The speaker ID.
transcription.paragraphs[i].words	list[]	A list of words in the paragraph.
transcription.paragraphs[i].words[i].id	int	The sequence number of the word. This field can typically be ignored.
transcription.paragraphs[i].words[i].sentenceId	int	The sentence ID. Words with the same sentenceId form a sentence.
transcription.paragraphs[i].words[i].start	long	The start time of the word, in milliseconds, relative to the beginning of the audio.
transcription.paragraphs[i].words[i].end	long	The end time of the word, in milliseconds, relative to the beginning of the audio.
transcription.paragraphs[i].words[i].text	string	The text of the word.

Summarization: full-text summary, speaker summary, and question summary

Parameter	Type	Description
summarization	object	The summary result object, which can contain results for multiple summary types.
summarization.paragraphSummary	string	The paragraph summary result.
summarization.conversationalSummary	list[]	A list of conversational summary results.
summarization.conversationalSummary[i].speakerId	string	The ID of the speaker.
summarization.conversationalSummary[i].speakerName	string	The name of the speaker.
summarization.conversationalSummary[i].summary	string	The summary for the speaker.
summarization.questionsAnsweringSummary	list[]	A list of question-answering summary results.
summarization.questionsAnsweringSummary[i].question	string	The question extracted from the speech transcription.
summarization.questionsAnsweringSummary[i].sentenceIdsOfQuestion	list[]	IDs of sentences from the original speech transcription that form the question.
summarization.questionsAnsweringSummary[i].answer	string	The answer to the question.
summarization.questionsAnsweringSummary[i].sentenceIdsOfAnswer	list[]	The IDs of sentences from the original speech transcription that are summarized in the answer.
summarization.mindMapSummary	list[object]	A list of mind map summary results, which may include topic summaries and their relationships.
summarization.mindMapSummary[i].title	string	The title of the mind map.
summarization.mindMapSummary[i].topic	list[object]	A list of the main topics, each with their respective sub-topics.
summarization.mindMapSummary[i].topic[i].title	string	The topic title.
summarization.mindMapSummary[i].topic[i].topic	list[object]	A list of sub-topics for the current topic, which can be empty.

Full-text translation

Parameter	Type	Description
translation	object	The translation object.
translation.paragraphs	list[]	A list of translated paragraphs, corresponding to the speech recognition result.
translation.paragraphs.paragraphId	string	The paragraph ID, which corresponds to the `ParagraphId` from the speech recognition result.
translation.paragraphs.sentences	list[]	The sentences that make up the paragraph.
translation.paragraphs.sentences[i].sentenceId	long	The unique ID for the sentence.
translation.paragraphs.sentences[i].start	long	The start time of the sentence in milliseconds, relative to the beginning of the audio.
translation.paragraphs.sentences[i].end	long	The end time of the sentence in milliseconds, relative to the beginning of the audio.
translation.paragraphs.sentences[i].text	string	The translated text of the sentence.

autoChapters (chapter recognition)

Parameter	Type	Description
autoChapters	list[]	A list of chapter overview objects.
autoChapters[i].id	int	Chapter ID.
autoChapters[i].start	long	The start time of the chapter, as a relative timestamp in milliseconds from the start of the audio.
autoChapters[i].end	long	The end time of the chapter, as a relative timestamp in milliseconds from the start of the audio.
autoChapters[i].headline	string	Chapter headline.
autoChapters[i].summary	string	Chapter summary.

meetingAssistance (summary, keyword, key sentence, and action item extraction)

Parameter	Type	Description
meetingAssistance	object	The intelligent minutes results object, which may contain zero or more results of various types.
meetingAssistance.keywords	list[]	The keyword extraction results.
meetingAssistance.keySentences	list[]	The key sentence extraction results, also known as key content.
meetingAssistance.keySentences[i].id	long	The ID of the key sentence.
meetingAssistance.keySentences[i].sentenceId	long	The ID of the corresponding sentence in the original ASR transcription.
meetingAssistance.keySentences[i].start	long	The start time relative to the beginning of the audio, in milliseconds.
meetingAssistance.keySentences[i].end	long	The end time relative to the beginning of the audio, in milliseconds.
meetingAssistance.keySentences[i].text	string	The text of the key sentence.
meetingAssistance.actions	list[]	A list of action items.
meetingAssistance.actions[i].id	long	The ID of the action item.
meetingAssistance.actions[i].sentenceId	long	The ID of the corresponding sentence in the original ASR transcription.
meetingAssistance.actions[i].start	long	The start time relative to the beginning of the audio, in milliseconds.
meetingAssistance.actions[i].end	long	The end time relative to the beginning of the audio, in milliseconds.
meetingAssistance.actions[i].text	string	The text of the action item.
meetingAssistance.classifications	object	The scene classification results. Currently, only three scene types are supported.
meetingAssistance.classifications.interview	float	The confidence score for the interview scene.
meetingAssistance.classifications.lecture	float	The confidence score for the lecture scene.
meetingAssistance.classifications.meeting	float	The confidence score for the meeting scene.

Examples

Success response

JSON format

{
  "JobStatus": "Success",
  "RequestId": "******11-DB8D-4A9A-875B-275798******",
  "UserData": "{\"userId\":\"123432412831\"}",
  "Results": {
    "Result": [
      {
        "Type": "Meta",
        "Data": "{\"title\":\"example-title-****\"}\t\n"
      }
    ]
  },
  "Usages": {
    "Usage": [
      {
        "Type": "",
        "Quota": 0
      }
    ]
  }
}

Error codes

See Error Codes for a complete list.

Release notes

See Release Notes for a complete list.