According to Alibaba's best practices, the fact table is the core of dimensional modeling in a data warehouse. You should design fact tables around business processes. A business process is defined by the measures, associated dimensions, and related attributes within its fact table. A fact table is also known as a fact-based model or a logical fact table.
Dataphin supports two types of fact tables:
Transaction fact table: Describes a business process and tracks measurable events at a specific point in space or time. It stores the most atomic data and is also called an atomic fact table. The table name suffix is typically
di.Periodic snapshot fact table: Records facts at regular, predictable time intervals, such as daily, monthly, or yearly. The table name suffix is typically
df.
Fact table design principles
Include all facts related to the business process.
The purpose of a fact table is to measure a business process. Identifying the facts that are relevant to the business process is a critical part of fact table design. You should include all related facts in the fact table. Because facts are usually numeric, the storage overhead from any redundancy is minimal.
Select only facts related to the business process.
When you select facts, choose only those that are related to the business process. For example, in Company A's order transaction flow, the fact table for the "create order" business process must not include the payment amount. The payment amount is a fact that belongs to the "payment" business process.
Declare the granularity before selecting dimensions and facts.
Declaring the granularity is an essential step in fact table design. Granularity is the smallest unit of a data row, not statistical granularity. It determines the level of detail that a single row in the fact table represents. It also determines the extensibility of the dimension model. You must declare the granularity before you select dimensions and facts. Each dimension and fact must be consistent with the defined granularity. In a fact table, the granularity is usually expressed through a business description and defines the primary key of the table. However, for an aggregate fact table that involves multiple transactions, such as placing an order and making a payment, you can describe the granularity in other ways. You can concatenate multiple fields to form a new field as the primary key. You can also choose not to define a primary key. In this case, a single record represents the minimum granularity.
Do not include facts with different granularities in the same fact table.
The granularity of all facts in a fact table must be consistent with the table's declared granularity. A single fact table cannot contain facts with different granularities.
Keep the units of facts consistent.
In the same fact table, the units for all facts must be consistent. For example, the three facts of original order amount, order discount amount, and order shipping fee should use a consistent unit of measurement, such as CNY, for ease of use.
Fact table design method
Any type of event can be considered a transaction. Examples include creating an order, a buyer making a payment, package collection, shipping, and delivery confirmation. Applying for a refund or requesting customer service intervention are also transactions. A transaction fact table is built for these processes. It tracks the individual behaviors that define a business process, provides rich analytical capabilities, and serves as the detailed data in the common data model (CDM) layer of the data warehouse.
Fact table design example
The following example uses Company A's order transaction fact table to explain the general design process for a transaction fact table.
Select a business process.
Based on the business flow analysis, Company A's transaction order process includes four key business processes: creating an order, buyer payment, seller shipment, and confirming receipt. These four processes are important milestones in the transaction and are the focus of downstream statistical analysis. Therefore, the design of Company A's transaction fact table focuses on these four business processes.
To facilitate independent analysis, you should create a separate fact table for each business process. This tutorial uses the "successful transaction" business process to demonstrate how to build a transaction fact table.
Determine the granularity.
The level of business detail that is expressed by a single record in a fact table is called granularity. Granularity can be expressed in two ways: by the level of detail that is represented by a combination of dimension attributes, or by the specific business meaning it represents, such as a product.
After you select a business process, you must determine its granularity. This sets the level of detail for each row in the transaction fact table. A clear granularity ensures that the meaning of a row in the fact table is unambiguous and that all facts are recorded at the same level of detail. If a field can express this granularity, you can define it as the primary key of the fact table.
You should choose the finest level of granularity possible to ensure maximum flexibility for the fact table's applications. For the order process, an order is generated after each product is settled. The granularity for the "successful transaction" business process can be set to the individual order level. If the order ID is unique, it can be used as the primary key of the fact table to describe the granularity.
Determine the dimensions.
After you select the business process and determine the granularity, you can determine the dimensions. You should select dimensions that clearly describe the context of the business process. For example, in the design of Company A's transaction order fact table, the granularity is at the individual order level. The dimensions include buyer, seller, product name, product category, shipping region, receiving region, and order time.
Determine the facts.
Facts are the core measures of a business process. They are usually integer or floating-point decimal values. The fact table should include all facts that describe the business process. The granularity of the facts must be consistent with the determined granularity of the fact table. For example, the "place order" business process needs to include product ID, product price, and quantity. The "payment" business process needs to include payment amount, red envelope amount, and points amount. The "receipt" business process needs to include the confirmed receipt amount.
Associated dimensions
When you determine the dimensions, you might include buyer and seller dimensions, product dimensions, category dimensions, and shipping and receiving dimensions. Dimensional modeling theory suggests storing only the foreign keys of these dimension tables in the fact table. However, Company A's e-commerce transaction fact table is further optimized. It associates dimensional attributes such as buyer and seller star ratings, tags, store names, product types, product features, product properties, and category levels directly into the fact table. This practice, known as denormalization, improves the efficiency of filtering, querying, and aggregating statistics on the fact table.
Data warehouse detail (DWD) layer specifications
The default naming convention for fact tables in Dataphin is: fct_{business_process_abbreviation}[_{custom_table_name_label_abbreviation}]_{di/df, single-partition_incremental/full_identifier}. The suffix indicates the refresh method. For example, di indicates a daily incremental load, and df indicates a daily full load. For example, fct_ordcrt_trip_di is the fact table for flight ticket orders from Company A's e-commerce platform, with a daily incremental refresh. fct_asale_itm_df is the snapshot fact table for products from Company A's e-commerce platform, with a daily full refresh.