ETL workflow quick start

更新时间:
复制 MD 格式

DataWorks provides a collection of ETL workflow templates to help you quickly learn product best practices. You can import these templates into your workspace with a single click to replicate a full use case and explore product features.

Tutorials

The following table lists the ETL workflow templates that are currently available in DataWorks.

Note

Once an ETL workflow template is imported into the DataStudio module, you can view the use case details on the Zero-Load Node, which is the first node in the workflow.

Tutorial

Products involved

Modules involved

DataWorks edition

Description

Analyze website user behavior

  • DataWorks

  • MaxCompute

  • MySQL (no activation required)

  • OSS (no activation required)

  • Data Integration

  • Data Development

Basic Edition

(Activate for free to get started)

This use case demonstrates how to analyze user access logs. By combining logs with user information, you can generate and automatically update user profiles to enable more targeted website operations.

Related documentation: Simple user profile analysis (MaxCompute).

Analyze an e-commerce funnel

  • DataWorks

  • MaxCompute

Data Development

Using a funnel model, this tutorial shows how to use the DataStudio module to extract user purchase paths from raw data and calculate traffic conversion rates across the browsing, clicking, and purchasing stages.

Analyze population and property data for a smart city

  • DataWorks

  • MaxCompute

Data Development

This tutorial uses the integration of population and real estate data as an example to demonstrate how to use the DataStudio module. You will learn to process data, configure scheduling policies, and automate the data processing workflow for a smart city project.

Find the top 10 programming languages on GitHub

  • DataWorks

  • MaxCompute

  • Function Compute

  • OSS

  • Data Integration

  • Data Development

Based on the public GitHub Archive dataset, this use case uses the DataWorks Data Integration module to obtain data about the programming languages with the most commits on GitHub in the last hour. In the DataStudio module, a Function Compute node is periodically scheduled to send the processed data to a specified email address.

Note

This tutorial uses real data that is updated every hour.

Related documentation (for use with Function Compute): Use a Function Compute node to implement real-time data analysis and result delivery for GitHub.

Analyze retail and e-commerce GMV

  • DataWorks

  • MaxCompute

  • Data Integration

  • Data Development

This use case is based on the Data Modeling feature of DataWorks. It uses a built-in retail data warehouse model to demonstrate the technology and process for building a data warehouse in DataWorks.

Related documentation: Retail and e-commerce data modeling.

Personalized video recommendations (collaborative filtering)

  • DataWorks

  • MaxCompute

  • PAI

Data Development

Taking features like "You may also like" and "Related recommendations" on social media platforms as examples, this use case demonstrates how to implement personalized video recommendations. It shows how to call the etrec collaborative filtering algorithm from PAI within a node in the DataStudio module.

Related documentation (for use with PAI): Personalized video recommendations (collaborative filtering).

Note

You can modify the sample data to generate your own item recommendation list.

Implement a zipper table

  • DataWorks

  • MaxCompute

  • Data Development

  • Operation Center

This use case shows how to implement a zipper table using DataWorks and MaxCompute. It uses the DataStudio and Operation Center modules to load data into a zipper table. The table records all changes in e-commerce orders, from creation to the current status (created, paid, or completed).

Related documentation: Implement a zipper table based on MaxCompute.

Use common scheduling parameters

  • DataWorks

  • MaxCompute

Data Development

This use case demonstrates how to use scheduling parameters, which are automatically replaced with specific values at runtime. You will learn how to dynamically pass time-based values to your tasks.

Related documentation: Configure and use scheduling parameters.

Use a merge node

  • DataWorks

  • MaxCompute

Data Development

Standard Edition or later

A merge node is a logical control node in DataStudio that consolidates the statuses of multiple upstream nodes. This use case shows how to use a merge node to ensure that a downstream task runs even if one of its upstream branch tasks fails.

Related documentation: Merge Node.

Use an assignment node

  • DataWorks

  • MaxCompute

Data Development

The assignment node in the DataStudio DataStudio module supports three assignment languages (ODPS SQL, Shell, and Python) to pass the query or output results from an upstream node to a downstream node.

Related documentation: Assignment Node.

Notes

  • Importing a template may incur minor fees. For details, refer to the billing information for the specific use case.

  • The data provided in the templates is for trial purposes on the DataWorks platform only.

  • Only users with the workspace administrator role can import ETL templates into a target workspace. To grant this role to an account, see Manage permissions on modules at the workspace level.

  • If you select a serverless resource group when you import an ETL workflow template, you must enable a NAT gateway for the VPC that is bound to the resource group. For more information, see Network connection solutions.

  • All ETL workflows in this topic require a workspace that uses the legacy version of DataStudio. When you create the workspace, do not select Use Data Studio (New Version).

Import an ETL workflow template

You can import a DataWorks ETL workflow template directly into a workspace. Follow these steps:

  1. Log on to the DataWorks console. In the left-side navigation pane, click Quick Start > DataWorks Gallery, and then select Workflow from the categories.

  2. View the use case details.

    Click a use case card to go to the details page. For a list of supported use cases, see Tutorials.

  3. Import the use case into a workspace.

    On the use case details page, click Load Template. In the Load Template dialog box, configure the parameters and click Confirm.

    Note

    Different tutorials have different product and import validation requirements. Follow the instructions in the Load Template dialog box to configure and import the template.

Clean up resources

After you finish a tutorial, you can delete the created resources using the linked instructions.

Related documents