Workflows

更新时间: 2026-05-20 05:50:18

Workflows are the core feature for automating unstructured data processing in your datasets. By orchestrating multiple operators in a workflow, you can build a complete data processing pipeline—from data ingestion and processing to output—to handle various types of data in batches, including audio, video, images, text, and documents.

Overview

A workflow consists of one or more operators arranged in sequence. The output of one operator serves as the input for the next, creating a data processing pipeline. The typical process involves creating a workflow, adding and configuring operators, setting its properties, running it, and then submitting it.

The workflow page consists of five main sections: the top navigation bar, the operator library, the canvas, the information panel on the right, and the bottom status bar.

image

Section

Description

Top navigation bar

  • Save: Saves the workflow.

  • Run: Executes the entire workflow. Running a workflow consumes Credit. If your Credit balance is 100 or less when you click Run, the workflow may fail. Purchase a resource plan before retrying. If your Credit balance is 0 or less, the workflow will fail. If your Credit balance is depleted during a run, the workflow terminates immediately. You must purchase a resource plan and retry. Credit is deducted for any operators that completed successfully.

  • Submit: Submits the workflow to the production environment after it passes validation checks. For more information, see workflow submission overview.

Operator library

The operator library provides all available operators. You can search for operators by keyword or filter them by category. The library can be expanded or collapsed. Operators are categorized by function as follows:

  • General: MD5-based deduplication, basic file information, basic audio information, and basic video information.

  • Text: Special character removal, non-compliant content replacement, sensitive information masking, SimHash value calculation, text inference (LLM), multi-language text quality scoring, simplified-to-traditional Chinese conversion, HTML body extraction, and text chunking.

  • Document: PDF parsing.

  • Image: Image content moderation (NSFW), image aesthetic scoring, image OCR, image quality scoring, and image understanding.

  • Audio: Audio slicing, audio-to-text (ASR), audio timestamping, audio language detection, voice activity detection (VAD), audio transcoding, audio enhancement, audio quality scoring, and audio speaker diarization (DIA).

  • Video: Audio extraction from video and video quality scoring.

  • Vector: Image embedding and text embedding.

Drag operators from the operator library onto the canvas.

Canvas

The canvas is the main editing area for your workflow. To build a complete data processing pipeline, drag operators from the operator library on the left and connect them on the canvas. The bottom of the canvas provides zoom controls to adjust the view. You can run, copy, or delete operators directly on the canvas.

  • Run: Runs the selected operator or all operators from the root node up to the selected one.

  • Copy: Creates a duplicate of the operator with the same configuration.

  • Delete: Removes the selected operator.

Information panel

Bottom status bar

  • View console: After a workflow runs, click to expand the console and view runtime information and raw logs. Runtime information includes the total Credit consumed and a per-operator breakdown.

  • Status: Displays the workflow status: Draft, In development, Submitting, or Submitted.

  • Submitted details: For workflows in Submitting or Submitted status, click to view submission records. For more information, see workflow submission overview.

  • Last saved: The time the workflow was last saved.

上一篇: Datasets 下一篇: Create a workflow
阿里云首页 智能数据建设与治理 Dataphin 相关技术圈