User persona analysis with DataWorks Data Studio (legacy)-DataWorks(DataWorks)-阿里云帮助中心

Build a user persona analysis pipeline with DataWorks and EMR Serverless Spark, covering data integration, transformation, quality monitoring, and visualization in Data Studio (legacy version).

Overview

Extract user profile data—such as geographic and social attributes—from website behavior to drive business strategy. Use DataWorks and EMR Serverless Spark to synchronize, transform, manage, and consume the data on a recurring schedule.

Note

Before you begin, read Tutorial objectives and design for a workflow overview.

Data development platform

This tutorial uses DataWorks Data Studio (legacy version). Make sure your workspace is not set to Use Data Studio (New Version).

When you create a workspace, do not select the Use Data Studio (New Version) option.
After February 18, 2025, the Data Studio (new version) is enabled by default for new workspaces in the following regions. If your account already uses the new version, follow Use Data Studio (new version) instead.

China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Ulanqab), China (Shenzhen), China (Chengdu), China (Hong Kong), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Thailand (Bangkok), Germany (Frankfurt), UK (London), US (Silicon Valley), US (Virginia)

Procedure

Prepare the environment

Create the Spark project and DataWorks workspace, then configure the network for the resource group.
Synchronize data

Configure a DataWorks synchronization pipeline to load the sample user and website log data into Spark, then verify the results.
Transform data

Use an EMR Spark SQL node in DataWorks to transform the user information and access log data and produce the target persona dataset.
Monitor data quality

Set up Data Quality monitoring rules on the transformed tables to detect and block dirty data early.
Manage data

After the analysis tasks complete, view the generated tables and their lineage in Data Map.
Consume data

Visualize the transformed data in DataAnalysis to extract key insights and identify business trends.