Data warehouse summary (DWS)-Dataphin(Dataphin)-阿里云帮助中心

The data warehouse summary (DWS) layer is modeled based on the subject of analysis. In this layer, aggregate tables are built with a common granularity based on the metric requirements of upstream applications and products. A table in the DWS layer typically corresponds to a statistic granularity, which is a dimension or a combination of dimensions, and includes several derived metrics at that granularity.

Aggregate table design principles

Aggregation is the process of summarizing raw data from a detailed granularity. The DWS layer is modeled by aggregating data based on the subject of analysis. For example, if your final analysis goals are to calculate the total sales of products in a category (such as kitchenware) by province for the last day, find the top 10 products by sales in that category, and determine the distribution of user purchasing power by province, you can summarize the data from the last day from the perspectives of successfully transacted products, categories, and buyers. Keep the following points in mind about data aggregation:

Aggregation does not cross facts. Aggregation summarizes data from the original star schema. To obtain query results that are consistent with the original model, the dimensions and measures used for aggregation must also be consistent with the original model. An atomic metric can only be defined based on a single fact table. However, you can combine atomic metrics to create composite metrics.
Aggregation can improve query performance, but it also makes extract, transform, and load (ETL) maintenance more difficult. For example, if the primary category of a subcategory changes, existing data that is already summarized in the aggregate table must be readjusted.

In addition, you should design the DWS layer with data reusability in mind. This involves considering whether the aggregated summary can be used by other applications or teams. Consider whether an aggregation based on a specific dimension is frequently used in data analytics. If it is, you should summarize the detailed data and store it in an aggregate table.

Aggregate table specifications

The naming convention for public aggregate tables is: dws_statistic_granularity. For example:

dws_report (report aggregate table)
dws_user (user aggregate table)

Create a logical aggregate table

Statistical metrics for an aggregate table can come from two sources:

The system performs automatic aggregation based on a consistent statistic granularity. After you submit a derived metric, the system automatically generates a new aggregate table. The following figure shows the components of a derived metric.
You can create a logical aggregate table that uses non-derived metrics. For more information, see Add derived and composite metrics.

Query a logical aggregate table

Modeling TaskLogical Table Task InstanceLogical table operations and maintenance (O&M) includes the and pages:

Logical Table Task lets you view the relationships between tasks for a logical table. For more information, see View and manage detailed aggregate table tasks.
Logical Table Task InstancesUse the page to view the node instances that are included in a running logical table task and their statuses. For more information, see Overview of general O&M for recurring instances.