Data warehouse layering is an architectural practice that organizes data models into a hierarchical structure. Based on a comprehensive analysis of business scenarios, data, and system usage, this approach classifies data with different purposes into distinct layers. A layered architecture improves data organization, management, and maintenance.
Layer categories
DataWorks provides layer categories such as Data Import Layer, Common Layer, and Application Layer. You can assign layers with different functions to the appropriate categories to present your data in a hierarchical structure. The following figure shows a default five-layer data architecture as an example. After you plan the data layers, you can store table data in the corresponding layers.
-
Data Import Layer: This layer is used to import source data, such as databases, logs, and messages. After undergoing a series of ETL operations, the source data is loaded into this layer. Only source tables can be attached to this layer.
-
Common Layer: This layer is used to process and integrate common data from the Data Import Layer, establish unified metrics and dimensions, and build reusable detailed fact data and summary data for analysis and statistics. This layer can contain detail tables, dimension tables, and aggregate tables. Data Domain/Business Process are part of this layer.
-
Application Layer: Generates customized data statistics for specific application scenarios or designated products based on the processed and integrated data from the Common Layer. This layer can contain application tables and dimension tables. Data Mart/Subject Area are part of this layer.
Define data warehouse layers
Built-in layers
DataWorks provides five built-in layers by default: Operational Data Store (ODS) , Dimension (DIM) , Data Warehouse Detail (DWD) , Data Warehouse Summary (DWS) , and Application Data Service (ADS) .
Each layer serves a specific function:
Operational Data Store (ODS)
The ODS layer receives and stages raw data for the data warehouse system. Its table structure matches that of the source system, acting as a staging area for the data warehouse. The ODS layer performs the following operations on raw data:
-
Synchronizes raw structured data to the data warehouse, either incrementally or in full.
-
Structures raw unstructured data, such as log information, and stores it in the data warehouse.
-
Table names in the ODS layer must start with
ods.
Data Warehouse Detail (DWD)
The DWD layer builds data models based on business events. It uses the most granular data to create detailed fact tables. Based on your data usage patterns, you can denormalize important dimension attributes into a wide table format. This practice improves query performance and usability by reducing the need for joins between fact tables and dimension tables.
Data Warehouse Summary (DWS)
The DWS layer builds data models based on analytical subjects. Based on metric requirements from upstream applications and products, it creates fact tables at a common granularity for aggregated metrics.
For example, user behavior from the ODS layer can be pre-aggregated to derive common dimensions, such as time, IP address, and ID. These dimensions can be used to calculate statistics, such as the number of products a user purchased from different IP addresses over various time periods. The DWS layer can then perform further lightweight aggregations to improve calculation efficiency. For example, calculating 7-day, 30-day, and 90-day behavior metrics from these summaries can save significant processing time.
Application Data Service (ADS)
The ADS layer stores custom statistical metrics for data products and is used to generate various reports. For example, an e-commerce company could report on the sales volumes and rankings of various sports balls sold in a specific region between June 9 and June 19.
Dimension (DIM)
The DIM layer builds data models by using dimensions. Based on your business requirements, this layer can store dimension tables from logical models or dimension definitions from conceptual models. By defining dimensions, specifying primary keys, adding dimension attributes, and associating different dimensions, you can build consistent, enterprise-wide analytical dimension tables. This practice reduces inconsistencies in calculation logic and algorithms.
Custom layers
Go to the Data Layer page to view the built-in data warehouse layers. You can switch between Tiled Display and Hierarchy Display to adjust the page layout.
If the built-in layers do not meet your requirements, you can create custom layers:
-
Click Create. In the Create Data Layer panel, configure the basic information for the layer. The following table describes the key parameters.
Parameter
Description
Category
This parameter is used with Model Type to assign the created layer to a specific category and associate it with corresponding model types.
NoteThe Category cannot be changed after it is set. Plan your layer categories carefully.
Model Type
This parameter is used with Category to attach the corresponding model types to different layer categories.
-
Data Import Layer: You can attach only ODS Table.
-
Common Layer: You can attach Fact Table, Aggregate Table, Dimension Table, and Conceptual Dimensions. You can attach Dimension Table and Conceptual Dimensions separately or together.
-
Application Layer: You can attach Application Table, Dimension Table, and Conceptual Dimensions. You can attach Dimension Table and Conceptual Dimensions separately or together.
NoteThe Model Type cannot be changed after it is set. Plan your model types carefully.
Description
Provide a functional description of the layer so that users can quickly understand its purpose and store data in the correct layer.
-
-
To delete a layer, you must first delete all the data tables it contains.
Configure data warehouse layer checkers
A data warehouse layer checker enforces naming conventions for models and metrics. When you create an object, the checker can automatically generate its name, which makes the published physical table name intuitive and easy to understand.
-
Click a layer to go to its details page.
-
In the Checker Management section, click Create Rule and select Create Rule for Model or Create Rule for Metric .
-
The following tables describe the checker parameters.
Model rule
Parameter
Description
Rule Name
The name of the checker rule.
Rule Type
-
Model rule: Defaults to Table Name (the English table name). When you create a model in Dimensional Modeling , you can select the corresponding checker to automatically generate a table name.
Rule Definition
Defines the naming convention for the data model. This rule is used to check model names within the layer in Dimensional Modeling.
-
Model rule:
-
Enter a prefix for the table to validate against. For example, if the data warehouse layer requires table names to start with
dim, you can enterdim. -
You can also click the
icon to define the rule by using elements such as business category abbreviation, data domain abbreviation, data mart abbreviation, or subject area abbreviation. You can freely combine these elements. -
The following are recommended naming conventions:
-
Dimension (DIM): dim_{business_category}_{data_domain}_{custom_content}_{storage_policy}
-
Data Warehouse Detail (DWD): dwd_{business_category}_{data_domain}_{business_process}_{custom_content}_{storage_policy}
-
Data Warehouse Summary (DWS): dws_{business_category}_{data_domain}_{custom_content}_{time_period}
-
Application Data Service (ADS): ads_{business_category}_{data_mart}_{subject_area}_{custom_content}_{time_period}
-
-
Description
A description of the checker.
Metric rule
Parameter
Description
Rule Name
The name of the checker rule.
Rule Type
-
Metric rule: Can be configured for Display Name or Abbreviation. When you create a metric, the system automatically generates a name for the derived metric based on the specified default rule.
Rule Definition
Defines the naming convention for metrics. This rule is used to check metric names within the layer in Data Metrics.
-
Metric rule: Define the rule by selecting modifiers, an atomic metric, and a statistical period. You can combine these elements in any order.
Description
A description of the checker.
-
-
Set a default checker.
In the Default Rule section above the checker list, you can set a default checker for models or metrics. If you do not explicitly specify a rule when creating a model or metric, the system uses the default checker to automatically generate its name.
-
Understand strong and weak rules.
-
Weak Rule: When a naming checker is set as a Weak Rule, its naming convention is treated as a recommendation. You can ignore the rule and still create the model or metric successfully with a custom name.
-
Strong Rule: When a naming checker is set as a Strong Rule, you must follow its naming convention to create a model or metric. Otherwise, you cannot save the model or metric.
-
Use data warehouse layers
You can then go to Dimensional Modeling or Data Metrics, select the appropriate data warehouse layer, and create logical models or various metrics.
In Dimensional Modeling, after you select the Common Layer tab and create a fact table, the Data Warehouse Layering field automatically displays Common Layer - Data Warehouse Detail.
Use checkers
-
Use checkers in Dimensional Modeling or Data Metrics.
-
Dimensional Modeling:
-
Auto-generate table name: When you create a logical model, you can select a checker in the Naming Rule field. The system automatically suggests a Table Name for the table based on the checker rule.
-
Validate table name: When you create or edit a table name without selecting a checker, the system validates the name that you entered against the checker rules when you save the logical model.
-
If multiple checkers exist, the validation passes as long as the name conforms to any one of them.
-
-
Data Metric:
-
Auto-generate metric name: When you create a Derived Metric, select a Period, Modifier, and an Atomic Metric, and then click Intelligent Recommendation. The system automatically generates both a display name and an abbreviation based on the Default Rule.
-
Validate metric name: When you create or edit the Abbreviation and Display Name of a metric, the system automatically checks whether the names conform to the checker rules when you save the metric.
-
If multiple checkers exist, the validation passes as long as the name conforms to any one of them.
-
-
-
Manually Start Check and View Results .
After you configure a checker, click Start Check to verify that the data model names in the current layer conform to the naming conventions. You can then use the results to correct any non-compliant models.
Note-
The Start Check function is not supported for metric rules.
-
The check operation runs only for enabled checkers.
-
A new checker is enabled by default. If you do not need to use a checker, you can disable it to prevent it from running checks.
After the check is complete, click View Results to review the results.
-
Next steps
After you complete these steps, you can go to Data Standard to define standards, or go to Dimensional Modeling and Data Metrics to develop models and metrics.
If your data warehouse is large and involves collaboration across multiple teams, you can share your data warehouse structure with other workspaces. For configuration details, see Modeling workspace.