Use ETL workflow templates-DataWorks(DataWorks)-阿里云帮助中心

Tutorials

The following table lists the ETL workflow templates that are currently available in DataWorks.

Note

Once an ETL workflow template is imported into the DataStudio module, you can view the use case details on the Zero-Load Node, which is the first node in the workflow.

Tutorial	Products involved	Modules involved	DataWorks edition	Description
Analyze website user behavior	DataWorks MaxCompute MySQL (no activation required) OSS (no activation required)	Data Integration Data Development	Basic Edition (Activate for free to get started)	This use case demonstrates how to analyze user access logs. By combining logs with user information, you can generate and automatically update user profiles to enable more targeted website operations. Related documentation: Simple user profile analysis (MaxCompute).
Analyze an e-commerce funnel	DataWorks MaxCompute	Data Development		Using a funnel model, this tutorial shows how to use the DataStudio module to extract user purchase paths from raw data and calculate traffic conversion rates across the browsing, clicking, and purchasing stages.
Analyze population and property data for a smart city	DataWorks MaxCompute	Data Development		This tutorial uses the integration of population and real estate data as an example to demonstrate how to use the DataStudio module. You will learn to process data, configure scheduling policies, and automate the data processing workflow for a smart city project.
Find the top 10 programming languages on GitHub	DataWorks MaxCompute Function Compute OSS	Data Integration Data Development		Based on the public GitHub Archive dataset, this use case uses the DataWorks Data Integration module to obtain data about the programming languages with the most commits on GitHub in the last hour. In the DataStudio module, a Function Compute node is periodically scheduled to send the processed data to a specified email address. Note This tutorial uses real data that is updated every hour. Related documentation (for use with Function Compute): Use a Function Compute node to implement real-time data analysis and result delivery for GitHub.
Analyze retail and e-commerce GMV	DataWorks MaxCompute	Data Integration Data Development		This use case is based on the Data Modeling feature of DataWorks. It uses a built-in retail data warehouse model to demonstrate the technology and process for building a data warehouse in DataWorks. Related documentation: Retail and e-commerce data modeling.
Personalized video recommendations (collaborative filtering)	DataWorks MaxCompute PAI	Data Development		Taking features like "You may also like" and "Related recommendations" on social media platforms as examples, this use case demonstrates how to implement personalized video recommendations. It shows how to call the etrec collaborative filtering algorithm from PAI within a node in the DataStudio module. Related documentation (for use with PAI): Personalized video recommendations (collaborative filtering). Note You can modify the sample data to generate your own item recommendation list.
Implement a zipper table	DataWorks MaxCompute	Data Development Operation Center		This use case shows how to implement a zipper table using DataWorks and MaxCompute. It uses the DataStudio and Operation Center modules to load data into a zipper table. The table records all changes in e-commerce orders, from creation to the current status (created, paid, or completed). Related documentation: Implement a zipper table based on MaxCompute.
Use common scheduling parameters	DataWorks MaxCompute	Data Development		This use case demonstrates how to use scheduling parameters, which are automatically replaced with specific values at runtime. You will learn how to dynamically pass time-based values to your tasks. Related documentation: Configure and use scheduling parameters.
Use a merge node	DataWorks MaxCompute	Data Development	Standard Edition or later	A merge node is a logical control node in DataStudio that consolidates the statuses of multiple upstream nodes. This use case shows how to use a merge node to ensure that a downstream task runs even if one of its upstream branch tasks fails. Related documentation: Merge Node.
Use an assignment node	DataWorks MaxCompute	Data Development	Standard Edition or later	The assignment node in the DataStudio DataStudio module supports three assignment languages (ODPS SQL, Shell, and Python) to pass the query or output results from an upstream node to a downstream node. Related documentation: Assignment Node.

Notes

Importing a template may incur minor fees. For details, refer to the billing information for the specific use case.
The data provided in the templates is for trial purposes on the DataWorks platform only.
Only users with the workspace administrator role can import ETL templates into a target workspace. To grant this role to an account, see Manage permissions on modules at the workspace level.
If you select a serverless resource group when you import an ETL workflow template, you must enable a NAT gateway for the VPC that is bound to the resource group. For more information, see Network connection solutions.
All ETL workflows in this topic require a workspace that uses the legacy version of DataStudio. When you create the workspace, do not select Use Data Studio (New Version).

Import an ETL workflow template

You can import a DataWorks ETL workflow template directly into a workspace. Follow these steps:

Log on to the DataWorks console. In the left-side navigation pane, click Quick Start > DataWorks Gallery, and then select Workflow from the categories.
View the use case details.

Click a use case card to go to the details page. For a list of supported use cases, see Tutorials.
Import the use case into a workspace.

On the use case details page, click Load Template. In the Load Template dialog box, configure the parameters and click Confirm.

Note
Different tutorials have different product and import validation requirements. Follow the instructions in the Load Template dialog box to configure and import the template.

Clean up resources

After you finish a tutorial, you can delete the created resources using the linked instructions.

Delete tables: Batch delete MaxCompute tables.
Take tasks offline: take a task offline.

ETL workflow quick start

Tutorials

Notes

Import an ETL workflow template

Clean up resources

Related documents