This topic describes the design principles of the dimension layer, the standards for dimension tables, and how to create dimensions and query logical dimension tables.
Introduction to the dimension layer
The dimension layer provides consistent dimension tables for data analytics. This reduces the risk of using inconsistent calculation methods and algorithms. Dimensions drive the modeling process. Based on the business meaning of each dimension, you can define the calculation logic and a snowflake schema by defining dimensions and their primary keys, adding dimension properties, and associating dimensions. This process completes the property definitions and creates consistent dimension tables for data analytics. You can also define parent-child relationships between dimensions. The properties of a child dimension are merged into its parent dimension. This further ensures consistency and makes dimensions easy to use.
Design principles for dimension tables
Generate rich dimension properties.
For example, the product dimension for an E-commerce company might have nearly 100 dimension properties. This provides a solid foundation for downstream data statistics, analysis, and exploration.
Provide meaningful text descriptions.
Properties should be actual text, not just codes. In Alibaba's dimensional modeling, codes and text descriptions usually coexist. Examples include product IDs and product titles, or category IDs and category names in a product dimension. IDs are typically used for table joins, while names are used for report labels.
Distinguish between numeric properties and facts.
Whether a numeric field is a fact or a dimension property depends on its common use. For example, if a field is used for query constraints or group statistics, it is a dimension property. If it is used in measure calculations, it is a fact.
Extract common dimension properties.
Derive dimension properties through logical processing.
Obtain dimension properties from joins across multiple tables.
Create dimension properties by processing different fields from a single table.
Generate dimension properties by parsing a specific field in a single table.
Dimension table standards
When you submit a common dimension or a common dimension (hierarchy), a corresponding logical dimension table is automatically generated. You cannot create logical dimension tables manually. In addition, Dataphin supports defining enumeration dimensions and virtual dimensions. Submitting an enumeration dimension or a virtual dimension does not generate a logical dimension table.
An enumeration dimension is a dimension whose values can be enumerated. This helps standardize the dimension values. When the dimension is used as the statistic granularity for a derived metric, it enables normalized data aggregation.
A virtual dimension is associated with a field and then used as the statistic granularity to define derived metrics. A URL is an example of a virtual dimension.
When you create a dimension, a logical dimension table is automatically generated. In Dataphin, dimension table names have a default prefix of dim_, and hierarchy dimension tables have a default suffix of _lvl{n}. For example:
dim_c1 (logical table for a common dimension).
dim_c1_lvl1 (logical table for a hierarchy dimension).