Create a standard dimension table
A standard dimension table describes a business entity and its attributes. For example, a member dimension table can include data such as member names, member IDs, and member emails.
Limitations
-
If you have not subscribed to the Data Standard module, you cannot set the field standard for fields in the table.
-
If you have not subscribed to the Asset Security module, you cannot set the data security level or data classification for fields in the table.
-
If you have not subscribed to the Data Quality module, the Unique and Not Null constraints for primary key fields are not enforced.
Prerequisites
A business entity must be created. For more information, see Create and manage business entities.
Procedure
Step 1: Create a standard dimension table
-
In the top navigation bar of the Dataphin homepage, choose Develop > data development.
-
In the top navigation bar, select a Project. If you are using the Dev-Prod mode, you must also select an Environment.
-
In the left-side navigation pane, choose standard modeling > dimension table.
-
In the dimension table list on the right, click the
New icon. -
In the Create dimension table dialog box, configure the following parameters.
Parameter
Description
Business object
Select Standard Object.
Table type
If you select Standard Object for Business object, the Table type is automatically set to standard dimension table and cannot be changed.
Data segment
Defaults to the data segment associated with the project. This setting cannot be changed.
Subject area
Defaults to the subject area of the business object. This setting cannot be changed.
Data timeliness
The data timeliness of the dimension table. Options include T+1 (daily), T+h (hourly), and T+m (minutely).
Logical table name
Enter a name for the logical table. The name can be up to 100 characters long. After you select a business object, the system automatically generates a name in the format
{data_segment_name}.dim_{business_object_code}_{data_timeliness}.Important-
The name can contain only letters, digits, and underscores (_). It must start with a letter. The name is not case-sensitive, and any uppercase letters are automatically converted to lowercase.
-
label_is a system-reserved prefix. Do not start names withlabel_.
The
{data_timeliness}suffix depends on the selected data timeliness:-
df: Daily full snapshot (T+1). Stores all historical data up to the current day.
-
hf: Hourly full snapshot (T+h). Stores all historical data up to the current hour.
-
mf: Minutely full snapshot (T+m). Stores all historical data up to the last 15-minute interval every 15 minutes.
Display name
The naming rules are as follows:
-
Must be 128 characters or less.
-
Can contain any character.
Description
Enter a description for the dimension table, up to 1,000 characters.
-
-
Click OK to create the standard dimension table.
Step 2: Configure fields
-
On the Table Structure configuration page, define the table structure, including fields, data types, and field categories.
NoteClick the
icon next to the logical table name to view its basic information.Area
Description
① Locate table fields
Click to navigate to the Basic Configuration or data governance section of the field list.
② Search and filter
You can search for fields by name.
Click
to filter fields by data type, field category, has associated dimension, associated dimension, field constraint, and data security level.③ Field list operations
-
Adjust field order: To prevent data errors, ensure that no downstream jobs that depend on this logical table use a
select *query before you reorder fields. -
Import fields: You can add fields by using Import from DDL statement or Import from Table. For more information, see Import fields for a dimension table.
-
Add field: Add a data field or partition field. You can edit the name, description, data type, field category, associated dimension, field standard, field constraint, data classification, data security level, and Remarks.
NoteFor the MaxCompute engine, you can create up to six levels of partition fields.
④ Field list
The field list displays detailed information for each field, including sequence number, field name, description, data type, field category, associated dimension, field standard, field constraint, data classification, data security level, and Remarks.
-
Sequence number: The sequence number of the field. It auto-increments by one for each new field.
-
Field name: The name of the field. You can enter a field name or a keyword, and the system automatically suggests matching standard field names.
-
Description: The field description, limited to 512 characters.
-
Data type: Supported data types include string, bigint, double, timestamp, decimal, Text, Numeric, Date/Time, and Other.
-
Field category: Can be set to Primary key, Partition, or Attribute.
Note-
Only one primary key field is allowed.
-
Only fields of type string, varchar, bigint, int, tinyint, or smallint can be used as partition fields.
-
-
Associated dimension: For details, see Add an associated dimension.
-
Field standard: Select a field standard for the field. To create a standard, see Create and manage data standards.
-
Field constraint: Select a constraint for the field. Supported constraints are Unique and Not Null.
-
Data classification: Select a data classification for the field. To create a data classification, see Create a data classification.
-
Data security level: The system automatically determines the data security level after you select a data classification.
-
Remarks: Enter remarks for the field, limited to 2,048 characters.
You can also Delete fields in the Actions column.
Note-
Deleting a field cannot be undone.
-
The primary key and system partition fields of a dimension table cannot be deleted.
⑤ Batch operations
You can perform the following batch operations on selected fields.
-
Delete: Click the
icon to delete the selected data fields in bulk. -
Data type: Click the
icon to modify the data type for the selected fields in bulk. -
Field category: Click the
icon to modify the field category for the selected fields in bulk. -
Radix-based Naming: Click the
icon. The system tokenizes the description content, matches it with existing radices, and recommends field names. In the Radix-based Naming dialog box, you can replace the names of selected fields with the suggested values.Note-
If the recommended field names do not meet your needs, you can edit them in the Corrected field name input box.
-
Click Reset to revert the Modified field name to the name suggested by the system's radix match.
-
-
Field standard: Click the
icon. The system recommends a field standard based on the field name. In the Field Standard dialog box, you can apply the recommended standard to the field. -
Field constraint: Click the
icon to set constraints for the selected fields in bulk.ImportantYou cannot set field constraints for child dimension tables.
-
-
Click Save and Next.
Import fields
-
Import from Table
-
On the configuration page of the dimension table, click Import from Table.
-
In the Import from Table dialog box, select a source table, then select the fields you want to add.
Parameter
Description
Source table
You can select any physical table, logical table, or view (excluding parameterized views) within the current tenant for which you have read permissions. Physical tables automatically generated by Dataphin are excluded.
To learn how to obtain read permissions for a physical table, see Apply for, renew, and release table permissions.
Field list
Select the fields you want to add.
NoteYou can switch between different source tables to select fields from multiple tables.
Selected fields
The fields you add appear here. You can Delete fields from this list.
-
Click Add to import the fields from the physical table into the dimension table.
Once added, you can edit the new fields in the New Fields area, including names, data types, field categories, and associated dimensions.
-
-
Import from DDL statement
-
On the configuration page of the dimension table, click Import from DDL statement.
-
In the Import from DDL statement dialog box, enter the DDL statement and click Parse SQL.
-
From the field list, select the fields you need and click Add to create them in the dimension table.
-
Add an associated dimension
-
On the dimension table configuration page, click the
icon in the Associated Dimension column to open the Edit Model Relationship dialog box. -
In the Edit Model Relationship dialog box, configure the parameters.
Area
Parameter
Description
Null replacement value
If the main table (the current dimension table) cannot be joined with the associated dimension table, Dataphin automatically populates the join field with -110.
Dimension table
Associated entity, dimension table
Select an existing associated entity and dimension table.
Edit join logic
Join logic
Displays the join between the source field and the primary key of the associated dimension table. This field is read-only.
Dimension table version policy
Specifies which partition of the associated dimension table to use when joining with the main table. By default, the partition from the same scheduling cycle is used.
-
Use dimension of the same cycle: The main table and the associated dimension table use the same time partition for computation.
For example, if the business date is 20220101 and you need to query the
ds=20220101partition of both the main table and the associated dimension table, select this option. -
Use the latest dimension table: Uses the latest partition of the associated dimension table for computation.
For example, a product category is frequently changed. 10 days ago, it was the mobile phone category, and today it is the home appliances category. If you need to reprocess data from 10 days ago by using the current home appliances category, the dimension table versioning policy must be set to Use latest dimension table (Uses latest partition).
Join failure policy
Defines the computing logic for data that exists in the source table (left table) but not in the dimension table (right table). You can choose to Retain original non-joined data or Replace non-joined data with default values.
-
Retain original non-joined data: Keeps the original data from the left table when creating derived metrics.
-
Replace non-joined data with default values: If a field in the main table does not have a matching entry in the dimension table, it is replaced with the default value of -110.
Edit dimension role
Role English name, Role name
A dimension role is an alias for a dimension, allowing you to reference the same dimension multiple times with different names. You must define a Role English name and a Role name.
-
The Role English name must have the prefix
dim. The custom part must adhere to the following rules:-
Contain only letters, digits, or underscores (_).
-
Be 64 characters or less.
-
-
The Role name must adhere to the following rules:
-
Contain only Chinese characters, digits, letters, underscores (_), or hyphens (-).
-
Be 64 characters or less.
-
-
-
Click OK.
Step 3: Configure the computing logic
-
On the Computing logic configuration page, map source data to the fields of the dimension table.
-
Click Source Configuration, and in the Source Configuration dialog box, click +Add Source Object to configure the source parameters.
NoteWe recommend that you do not add extra event time filters in the filter conditions or custom SQL.
Parameter
Description
Source type
Supported source types include physical table, custom SQL, and logical table.
Source table type descriptions:
-
Source table with a primary key: A logical table with a primary key can have multiple sources. The first source is always the primary source, which determines the total number of rows in the logical table.
-
Source table without a primary key: A logical table without a primary key can have only one source. If the source involves multiple tables, you must join them in advance by using custom SQL.
NoteTo configure multiple source objects, click Add Source Object.
Source object
-
Select Physical table: You can select any physical table or physical view (excluding parameterized views) within the current tenant for which you have read permissions. Physical tables automatically generated by Dataphin are excluded.
To learn how to obtain read permissions for a physical table, see Apply for, renew, and release table permissions.
-
Select Custom SQL: Click the
icon and enter your SQL statement in the editor. Example:select id, name from project_name_dev.table_name1 t1 join project_name2_dev.table_name2 t2 on t1.id = t2.id -
Select Logical table: You can select any logical table within the current tenant for which you have read permissions.
To learn how to obtain read permissions for a table, see Apply for, renew, and release table permissions.
ImportantUsing a logical table as the data source for another increases the complexity of computing logic and maintenance.
Object alias
Define an alias for the source table. For example: t1, t2.
Object description
Enter a description for the object, up to 1,000 characters.
Filter condition
Define filter conditions for the custom SQL.
Click the
icon and enter the content in the editor. Example:ds=${bizdate} and condition1=value1Join field
The field in the source object that corresponds to the logical table's primary key, used for an equijoin.
Delete
The primary source cannot be deleted.
For a logical table without a primary key, deleting the source clears the computing logic for all fields.
-
-
Click OK to save the source configuration.
-
After configuring the source, drag a Source Field into the computing logic expression. You can also click Quickly Map Fields with the Same Name to automatically map source fields to logical table fields with the same name.
Click the
icon to edit the computing logic expression in the editor. Aggregate functions such as sum, count, and min are not supported. Examples:-
Example 1:
substr(t1.column2, 3, 10) -
Example 2:
case when t1.column2 != '1' then 'Y' else 'N' end -
Example 3:
t1.column2 + t2.column1
-
-
After configuring the computing logic, click the
icon at the bottom to validate the expressions.Click Preview SQL to view the generated SQL for the computing logic.
-
Click Save and Next.
Step 4: Configure constraints
-
Based on the field constraints, the system creates quality rules for the current logical table in the Data Quality module. Set the rule severity for each field, choosing between Strong rule and Weak rule. For more information, see Data table quality rules.
NoteThe rule severity for logical table field constraints can only be configured here and cannot be edited in the Data Quality module.
-
Click Save and Next.
Step 5: Configure scheduling
-
On the Scheduling and parameter configuration page, configure the data delay, scheduling properties, scheduling dependencies, scheduling parameters, and runtime configuration for the dimension table.
Parameter
Description
Data delay
If you enable data delay, the system automatically reruns all data for this logical table within the period specified by Maximum Delay Days. For details, see Configure data delay for a logical table.
Scheduling properties
Specifies how the dimension table is scheduled in the production environment. Configure properties such as scheduling type, cycle, and execution logic. For details, see Configure scheduling properties for a logical table.
Upstream dependencies
Defines the logical table's scheduling node. Dataphin uses these dependencies to run nodes in order, ensuring timely and valid data output. For details, see Configure upstream dependencies for a logical table.
Parameter configuration
Assigns values to variables in your code. These variables are automatically replaced with their corresponding values during node scheduling. On the scheduling parameter configuration page, you can choose to Ignore a parameter or Convert to Global Variable. For details, see Configure runtime parameters for a logical table.
Runtime configuration
Configure task-level runtime timeout periods and rerun policies for task failures. This prevents resource waste from long-running tasks and improves task execution reliability. For details, see Configure runtime settings for a logical table.
Resource configuration
Configure a scheduling resource group for the current logical table task. The task consumes the resource quota of this group during scheduling. For configuration details, see Configure resources for a logical table.
-
Click Save and Commit.
Step 6: Save and commit
-
After configuring the standard dimension table, click Save and Commit.
The system validates the settings for table structure, computing logic, scheduling dependencies, and runtime parameters. Review the Check Results and correct any failed configurations.
-
After all checks pass, enter a submission comment and click OK and Commit.
When you commit, Dataphin performs data lineage analysis and pre-commit checks. For more information, see Notes on committing standard modeling tasks.
Next steps
If your project uses Dev-Prod mode, publish the logical table to the production environment. For more information, see Manage publishing tasks.
After you publish the logical table to the production environment, you can view and manage its tasks in the O&M Center. For more information, see O&M Center.