Use DataWorks and StarRocks to build a user profile analysis pipeline. The tutorial covers Data Integration, Data Studio, and Operation Center capabilities for data integration, development, quality monitoring, and data consumption.
Use case
Extract user profile data — geographical and social attributes — from website behavior to drive targeted traffic operations. DataWorks and StarRocks handle data synchronization, transformation, management, and consumption.
Before you begin, read Tutorial objectives and design for a workflow overview.
Data Studio
This tutorial requires the new Data Studio in DataWorks. To enable it:
-
When you create a workspace, select Use Data Studio (New Version).
-
To upgrade from an older version, click Upgrading at the top of the page.
-
After February 18, 2025, the new Data Studio is enabled by default for accounts that create their first workspace in these regions:
China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Ulanqab), China (Shenzhen), China (Chengdu), China (Hong Kong), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Thailand (Bangkok), Germany (Frankfurt), UK (London), US (Silicon Valley), US (Virginia)
Procedure
-
Create the StarRocks instance and DataWorks workspace required for this tutorial, and configure the resource group and network settings.
-
Configure a data synchronization task in DataWorks to sync user information and website log data to StarRocks, then query the synchronized data.
-
Use StarRocks nodes in DataWorks to process the synchronized user information and access log tables into target user profile data.
-
Configure data quality monitoring rules for the transformed tables to detect and block dirty data early, preventing the impact of dirty data from spreading.
-
After the analysis tasks complete, view the data lineage between the resulting StarRocks tables in Data Map.
-
Consume data
-
Use the DataAnalysis module to visualize the transformed data and gain insights into business trends.
-
Use DataService Studio to expose the transformed data as standardized API data service interfaces for downstream business modules.
-