This tutorial shows how to use the DataWorks and StarRocks product portfolio for big data development and analysis. A user profile analysis case study walks you through the capabilities of the Data Integration, Data Studio, and Operation Center modules in DataWorks.
Tutorial overview
To create better business strategies, you can obtain basic profile data about your website users from their online behavior. This data includes geographical and social attributes. You can then perform profile analysis at scheduled times for fine-grained website traffic operations. You can use the DataWorks and MaxCompute product portfolio to complete data synchronization, data transformation, data management, and data consumption.
To follow this tutorial, read Tutorial objectives and design to understand the overall flow of the user persona analysis.
Data development platform
This tutorial uses the DataWorks Data Studio (legacy version) platform. Ensure that your workspace is not set to Use Data Studio (New Version).
When you create a workspace, do not select the Use Data Studio (New Version) option.
After February 18, 2025, the Data Studio (new version) is enabled by default when you use an Alibaba Cloud account to enable DataWorks and create a workspace for the first time in the following regions. If the Data Studio (new version) is already enabled by default for your account, see Use Data Studio (new version).
China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Ulanqab), China (Shenzhen), China (Chengdu), China (Hong Kong), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Thailand (Bangkok), Germany (Frankfurt), UK (London), US (Silicon Valley), US (Virginia)
Procedure
Create the MaxCompute project and DataWorks workspace required for this tutorial. Then, complete the network configurations for the resource group.
In DataWorks, configure a data synchronization task to synchronize the user information and website log data provided in this tutorial to MaxCompute. Then, query the synchronized data.
Use a MaxCompute SQL node in DataWorks to transform the data in the user information table and the access log table that were synchronized to MaxCompute. This process produces the target user profile data.
Configure data quality monitoring rules for the tables generated from data transformation. This helps you identify and block dirty data early to prevent its impact from spreading.
After the user profile analysis task flow is complete, the corresponding data tables are created in MaxCompute. You can view the generated tables in the Data Map module and check their data lineage to see the relationships between them.
Consume data
After the user profile analysis is complete, you can use the DataAnalysis module to visualize the transformed data. This helps you quickly extract key information and gain insights into business trends from the data.
After you obtain the final transformed data, you can use the DataService Studio module to share and use the data through standard API data services. This provides data to other business modules that receive data through APIs.