DataWorks provides several tutorials to guide you through the entire workflow. These tutorials demonstrate how to prepare your environment, collect data, develop data, and display results. They help you understand the end-to-end process and master the core features of DataWorks.
Comprehensive example: User profile analysis
This tutorial uses a case study of website user profile analysis to demonstrate the end-to-end process, which includes data integration, data warehouse construction in Data Studio, and data governance. Using the DataWorks platform, you can efficiently sync and perform fine-grained cleaning of website user information and behavioral logs to build a comprehensive user profile model.
Tutorial link: Comprehensive: Website user profile analysis
Compute engines used in the tutorial: You can choose MaxCompute, StarRocks, or EMR.
Modules involved: Data Integration, Data Studio, Operation Center, Data Quality, Data Map, DataService Studio, and DataAnalysis.
Tutorials for different modules
Data development and scheduling
Related tutorials | Description | Compute engines used in the tutorial | Modules involved |
DataWorks provides a collection of extract, transform, and load (ETL) workflow templates. These templates help you quickly understand the product's best practices. You can import a template into your workspace with one-click to replicate the example and explore the product's capabilities. | Different ETL templates use different compute engines, including MaxCompute, Function Compute, and PAI. | Data Integration Data Development |
Data analysis and visualization
Related tutorials | Description | Compute engines used in the tutorial | Modules involved |
DataWorks provides a rich collection of official, real-world datasets with desensitized data. Each dataset includes SQL queries for specific business scenarios. Select a public dataset of interest and run the sample SQL. Then, generate visual charts and reports from the analysis results to quickly explore DataWorks features. |
| DataAnalysis | |
This tutorial shows you how to use DataWorks and the cloud-native big data computing service MaxCompute. Use public datasets for big data and AI, such as data from Taobao, Fliggy, AliMusic, GitHub, and TPC. The tutorial guides you on how to quickly perform big data analysis and familiarize yourself with the DataWorks interface and basic data analysis capabilities. | MaxCompute | DataAnalysis | |
In Expenses and Costs, you can subscribe to different types of bill data, such as detailed billing item bills and daily summary bills. After you subscribe, the bill data is regularly synchronized to MaxCompute. You can use the data analysis feature of DataWorks to query and analyze the bill data. You can then generate visual charts and reports from the analysis results. You can also share your Alibaba Cloud consumption analysis report with other users. | MaxCompute | DataAnalysis | |
This tutorial is based on the GitHub Archive public dataset. It shows how to use DataWorks to collect more than 20 types of event data, such as projects and behaviors, from GitHub to Hologres in real time for analysis. It also uses built-in DataV templates to quickly build a real-time data dashboard. This lets you understand real-time changes in GitHub data from multiple dimensions, such as developers, projects, and programming languages. | Hologres | Data Integration Data Development |
Data warehouse modeling and configuration
Related tutorials | Description | Compute engines used in the tutorial | Modules involved |
Many small and medium-sized enterprises (SMEs) find building data warehouse models challenging. The process requires specialized talent, long development cycles, and high costs. To solve these problems, the DataWorks intelligent data modeling team worked with experienced data architects. They drew on a decade of experience from millions of Alibaba Cloud users across many business scenarios. By combining this experience with Alibaba Group's technology, they provide best practices for industry models. These models cover retail, e-commerce, finance, manufacturing, and other fields. | Applicable to all engines | Data modeling | |
The DataWorks intelligent data modeling product includes a data warehouse industry model template for retail and e-commerce. You can import the template with one click. This tutorial uses a retail and e-commerce business background and core model building steps to help you understand dimensional modeling theory and the intelligent data modeling product. | Applicable to all engines | Data modeling | |
This tutorial combines products such as DataWorks, MaxCompute, and Hologres. It provides a detailed explanation of data warehouse capabilities, including offline and real-time processing, analysis services, data modeling, data governance, and the data lakehouse architecture. | DataWorks, MaxCompute, Hologres | Data Integration DataAnalysis Data modeling Data Governance Center | |
This tutorial is based on MaxCompute. It describes how to optimize a data warehouse through several modules, including data research, data domain partitioning, building a bus matrix, and defining statistical metrics. | MaxCompute | Data modeling | |
This tutorial describes how to build an enterprise data warehouse based on AnalyticDB and perform operations such as O&M and metadata management. | AnalyticDB for MySQL | Data Integration Data Development Operation Center Data Map | |
This tutorial uses an experiment for building a data warehouse in the retail and e-commerce industry as an example. It describes the technology selection, technical flow, and process implementation of DataWorks in data warehouse construction. This helps you gain a deeper understanding of Alibaba Cloud DataWorks. | Applicable to all engines. This tutorial uses MaxCompute as an example. | Data Integration Data modeling Data Development Operation Center Data governance DataService Studio |
Data governance
Related tutorials | Description | Compute engines used in the tutorial | Modules involved |
This tutorial walks you through the operational process of a governance owner. It shows you how to use the data governance planning feature to efficiently set and achieve data governance goals. | MaxCompute, E-MapReduce | Data Governance Center |