DataWorks provides a collection of ETL workflow templates to help you quickly learn product best practices. You can import these templates into your workspace with a single click to replicate a full use case and explore product features.
Tutorials
The following table lists the ETL workflow templates that are currently available in DataWorks.
Once an ETL workflow template is imported into the DataStudio module, you can view the use case details on the Zero-Load Node, which is the first node in the workflow.
|
Tutorial |
Products involved |
Modules involved |
DataWorks edition |
Description |
|
|
Basic Edition (Activate for free to get started) |
This use case demonstrates how to analyze user access logs. By combining logs with user information, you can generate and automatically update user profiles to enable more targeted website operations. Related documentation: Simple user profile analysis (MaxCompute). |
|
|
Data Development |
Using a funnel model, this tutorial shows how to use the DataStudio module to extract user purchase paths from raw data and calculate traffic conversion rates across the browsing, clicking, and purchasing stages. |
||
|
Data Development |
This tutorial uses the integration of population and real estate data as an example to demonstrate how to use the DataStudio module. You will learn to process data, configure scheduling policies, and automate the data processing workflow for a smart city project. |
||
|
|
Based on the public GitHub Archive dataset, this use case uses the DataWorks Data Integration module to obtain data about the programming languages with the most commits on GitHub in the last hour. In the DataStudio module, a Function Compute node is periodically scheduled to send the processed data to a specified email address. Note
This tutorial uses real data that is updated every hour. Related documentation (for use with Function Compute): Use a Function Compute node to implement real-time data analysis and result delivery for GitHub. |
||
|
|
This use case is based on the Data Modeling feature of DataWorks. It uses a built-in retail data warehouse model to demonstrate the technology and process for building a data warehouse in DataWorks. Related documentation: Retail and e-commerce data modeling. |
||
|
Personalized video recommendations (collaborative filtering) |
|
Data Development |
Taking features like "You may also like" and "Related recommendations" on social media platforms as examples, this use case demonstrates how to implement personalized video recommendations. It shows how to call the etrec collaborative filtering algorithm from PAI within a node in the DataStudio module. Related documentation (for use with PAI): Personalized video recommendations (collaborative filtering). Note
You can modify the sample data to generate your own item recommendation list. |
|
|
|
This use case shows how to implement a zipper table using DataWorks and MaxCompute. It uses the DataStudio and Operation Center modules to load data into a zipper table. The table records all changes in e-commerce orders, from creation to the current status (created, paid, or completed). Related documentation: Implement a zipper table based on MaxCompute. |
||
|
Data Development |
This use case demonstrates how to use scheduling parameters, which are automatically replaced with specific values at runtime. You will learn how to dynamically pass time-based values to your tasks. Related documentation: Configure and use scheduling parameters. |
||
|
Data Development |
Standard Edition or later |
A merge node is a logical control node in DataStudio that consolidates the statuses of multiple upstream nodes. This use case shows how to use a merge node to ensure that a downstream task runs even if one of its upstream branch tasks fails. Related documentation: Merge Node. |
|
|
Data Development |
The assignment node in the DataStudio DataStudio module supports three assignment languages (ODPS SQL, Shell, and Python) to pass the query or output results from an upstream node to a downstream node. Related documentation: Assignment Node. |
Notes
-
Importing a template may incur minor fees. For details, refer to the billing information for the specific use case.
-
The data provided in the templates is for trial purposes on the DataWorks platform only.
-
Only users with the workspace administrator role can import ETL templates into a target workspace. To grant this role to an account, see Manage permissions on modules at the workspace level.
-
If you select a serverless resource group when you import an ETL workflow template, you must enable a NAT gateway for the VPC that is bound to the resource group. For more information, see Network connection solutions.
-
All ETL workflows in this topic require a workspace that uses the legacy version of DataStudio. When you create the workspace, do not select Use Data Studio (New Version).
Import an ETL workflow template
You can import a DataWorks ETL workflow template directly into a workspace. Follow these steps:
-
Log on to the DataWorks console. In the left-side navigation pane, click , and then select Workflow from the categories.
-
View the use case details.
Click a use case card to go to the details page. For a list of supported use cases, see Tutorials.
-
Import the use case into a workspace.
On the use case details page, click Load Template. In the Load Template dialog box, configure the parameters and click Confirm.
NoteDifferent tutorials have different product and import validation requirements. Follow the instructions in the Load Template dialog box to configure and import the template.
Clean up resources
After you finish a tutorial, you can delete the created resources using the linked instructions.
-
Delete tables: Batch delete MaxCompute tables.
-
Take tasks offline: take a task offline.
Related documents
-
To learn how to use each module, see the DataWorks module usage guide.
-
To learn about DataWorks editions and billing, see Select and pay for a software edition.