Create a standard dimension table

更新时间: 2026-06-23 10:26:50

A standard dimension table describes a business entity and its attributes. For example, a member dimension table can include data such as member names, member IDs, and member emails.

Limitations

  • If you have not subscribed to the Data Standard module, you cannot set the field standard for fields in the table.

  • If you have not subscribed to the Asset Security module, you cannot set the data security level or data classification for fields in the table.

  • If you have not subscribed to the Data Quality module, the Unique and Not Null constraints for primary key fields are not enforced.

Prerequisites

A business entity must be created. For more information, see Create and manage business entities.

Procedure

Step 1: Create a standard dimension table

  1. In the top navigation bar of the Dataphin homepage, choose Develop > data development.

  2. In the top navigation bar, select a Project. If you are using the Dev-Prod mode, you must also select an Environment.

  3. In the left-side navigation pane, choose standard modeling > dimension table.

  4. In the dimension table list on the right, click the image New icon.

  5. In the Create dimension table dialog box, configure the following parameters.

    Parameter

    Description

    Business object

    Select Standard Object.

    Table type

    If you select Standard Object for Business object, the Table type is automatically set to standard dimension table and cannot be changed.

    Data segment

    Defaults to the data segment associated with the project. This setting cannot be changed.

    Subject area

    Defaults to the subject area of the business object. This setting cannot be changed.

    Data timeliness

    The data timeliness of the dimension table. Options include T+1 (daily), T+h (hourly), and T+m (minutely).

    Logical table name

    Enter a name for the logical table. The name can be up to 100 characters long. After you select a business object, the system automatically generates a name in the format {data_segment_name}.dim_{business_object_code}_{data_timeliness}.

    Important
    • The name can contain only letters, digits, and underscores (_). It must start with a letter. The name is not case-sensitive, and any uppercase letters are automatically converted to lowercase.

    • label_ is a system-reserved prefix. Do not start names with label_.

    The {data_timeliness} suffix depends on the selected data timeliness:

    • df: Daily full snapshot (T+1). Stores all historical data up to the current day.

    • hf: Hourly full snapshot (T+h). Stores all historical data up to the current hour.

    • mf: Minutely full snapshot (T+m). Stores all historical data up to the last 15-minute interval every 15 minutes.

    Display name

    The naming rules are as follows:

    • Must be 128 characters or less.

    • Can contain any character.

    Description

    Enter a description for the dimension table, up to 1,000 characters.

  6. Click OK to create the standard dimension table.

Step 2: Configure fields

  1. On the Table Structure configuration page, define the table structure, including fields, data types, and field categories.

    image..png

    Note

    Click the image icon next to the logical table name to view its basic information.

    Area

    Description

    Locate table fields

    Click to navigate to the Basic Configuration or data governance section of the field list.

    Search and filter

    You can search for fields by name.

    Click image..png to filter fields by data type, field category, has associated dimension, associated dimension, field constraint, and data security level.

    Field list operations

    • Adjust field order: To prevent data errors, ensure that no downstream jobs that depend on this logical table use a select * query before you reorder fields.

    • Import fields: You can add fields by using Import from DDL statement or Import from Table. For more information, see Import fields for a dimension table.

    • Add field: Add a data field or partition field. You can edit the name, description, data type, field category, associated dimension, field standard, field constraint, data classification, data security level, and Remarks.

      Note

      For the MaxCompute engine, you can create up to six levels of partition fields.

    Field list

    The field list displays detailed information for each field, including sequence number, field name, description, data type, field category, associated dimension, field standard, field constraint, data classification, data security level, and Remarks.

    • Sequence number: The sequence number of the field. It auto-increments by one for each new field.

    • Field name: The name of the field. You can enter a field name or a keyword, and the system automatically suggests matching standard field names.

    • Description: The field description, limited to 512 characters.

    • Data type: Supported data types include string, bigint, double, timestamp, decimal, Text, Numeric, Date/Time, and Other.

    • Field category: Can be set to Primary key, Partition, or Attribute.

      Note
      • Only one primary key field is allowed.

      • Only fields of type string, varchar, bigint, int, tinyint, or smallint can be used as partition fields.

    • Associated dimension: For details, see Add an associated dimension.

    • Field standard: Select a field standard for the field. To create a standard, see Create and manage data standards.

    • Field constraint: Select a constraint for the field. Supported constraints are Unique and Not Null.

    • Data classification: Select a data classification for the field. To create a data classification, see Create a data classification.

    • Data security level: The system automatically determines the data security level after you select a data classification.

    • Remarks: Enter remarks for the field, limited to 2,048 characters.

    You can also Delete fields in the Actions column.

    Note
    • Deleting a field cannot be undone.

    • The primary key and system partition fields of a dimension table cannot be deleted.

    Batch operations

    You can perform the following batch operations on selected fields.

    • Delete: Click the image icon to delete the selected data fields in bulk.

    • Data type: Click the image icon to modify the data type for the selected fields in bulk.

    • Field category: Click the image icon to modify the field category for the selected fields in bulk.

    • Radix-based Naming: Click the image icon. The system tokenizes the description content, matches it with existing radices, and recommends field names. In the Radix-based Naming dialog box, you can replace the names of selected fields with the suggested values.

      Note
      • If the recommended field names do not meet your needs, you can edit them in the Corrected field name input box.

      • Click Reset to revert the Modified field name to the name suggested by the system's radix match.

    • Field standard: Click the image icon. The system recommends a field standard based on the field name. In the Field Standard dialog box, you can apply the recommended standard to the field.

    • Field constraint: Click the image icon to set constraints for the selected fields in bulk.

      Important

      You cannot set field constraints for child dimension tables.

  2. Click Save and Next.

Import fields

  • Import from Table

    1. On the configuration page of the dimension table, click Import from Table.

    2. In the Import from Table dialog box, select a source table, then select the fields you want to add.

      Parameter

      Description

      Source table

      You can select any physical table, logical table, or view (excluding parameterized views) within the current tenant for which you have read permissions. Physical tables automatically generated by Dataphin are excluded.

      To learn how to obtain read permissions for a physical table, see Apply for, renew, and release table permissions.

      Field list

      Select the fields you want to add.

      Note

      You can switch between different source tables to select fields from multiple tables.

      Selected fields

      The fields you add appear here. You can Delete fields from this list.

    3. Click Add to import the fields from the physical table into the dimension table.

      Once added, you can edit the new fields in the New Fields area, including names, data types, field categories, and associated dimensions.

  • Import from DDL statement

    1. On the configuration page of the dimension table, click Import from DDL statement.

    2. In the Import from DDL statement dialog box, enter the DDL statement and click Parse SQL.

    3. From the field list, select the fields you need and click Add to create them in the dimension table.

Add an associated dimension

  1. On the dimension table configuration page, click the gagag icon in the Associated Dimension column to open the Edit Model Relationship dialog box.

  2. In the Edit Model Relationship dialog box, configure the parameters.

    Area

    Parameter

    Description

    Null replacement value

    If the main table (the current dimension table) cannot be joined with the associated dimension table, Dataphin automatically populates the join field with -110.

    Dimension table

    Associated entity, dimension table

    Select an existing associated entity and dimension table.

    Edit join logic

    Join logic

    Displays the join between the source field and the primary key of the associated dimension table. This field is read-only.

    Dimension table version policy

    Specifies which partition of the associated dimension table to use when joining with the main table. By default, the partition from the same scheduling cycle is used.

    • Use dimension of the same cycle: The main table and the associated dimension table use the same time partition for computation.

      For example, if the business date is 20220101 and you need to query the ds=20220101 partition of both the main table and the associated dimension table, select this option.

    • Use the latest dimension table: Uses the latest partition of the associated dimension table for computation.

      For example, a product category is frequently changed. 10 days ago, it was the mobile phone category, and today it is the home appliances category. If you need to reprocess data from 10 days ago by using the current home appliances category, the dimension table versioning policy must be set to Use latest dimension table (Uses latest partition).

    Join failure policy

    Defines the computing logic for data that exists in the source table (left table) but not in the dimension table (right table). You can choose to Retain original non-joined data or Replace non-joined data with default values.

    • Retain original non-joined data: Keeps the original data from the left table when creating derived metrics.

    • Replace non-joined data with default values: If a field in the main table does not have a matching entry in the dimension table, it is replaced with the default value of -110.

    Edit dimension role

    Role English name, Role name

    A dimension role is an alias for a dimension, allowing you to reference the same dimension multiple times with different names. You must define a Role English name and a Role name.

    • The Role English name must have the prefix dim. The custom part must adhere to the following rules:

      • Contain only letters, digits, or underscores (_).

      • Be 64 characters or less.

    • The Role name must adhere to the following rules:

      • Contain only Chinese characters, digits, letters, underscores (_), or hyphens (-).

      • Be 64 characters or less.

  3. Click OK.

Step 3: Configure the computing logic

  1. On the Computing logic configuration page, map source data to the fields of the dimension table.

  2. Click Source Configuration, and in the Source Configuration dialog box, click +Add Source Object to configure the source parameters.

    Note

    We recommend that you do not add extra event time filters in the filter conditions or custom SQL.

    Parameter

    Description

    Source type

    Supported source types include physical table, custom SQL, and logical table.

    Source table type descriptions:

    • Source table with a primary key: A logical table with a primary key can have multiple sources. The first source is always the primary source, which determines the total number of rows in the logical table.

    • Source table without a primary key: A logical table without a primary key can have only one source. If the source involves multiple tables, you must join them in advance by using custom SQL.

      Note

      To configure multiple source objects, click Add Source Object.

    Source object

    • Select Physical table: You can select any physical table or physical view (excluding parameterized views) within the current tenant for which you have read permissions. Physical tables automatically generated by Dataphin are excluded.

      To learn how to obtain read permissions for a physical table, see Apply for, renew, and release table permissions.

    • Select Custom SQL: Click the image icon and enter your SQL statement in the editor. Example:

      select id, name
      from project_name_dev.table_name1 t1
      join
      project_name2_dev.table_name2 t2
      on t1.id = t2.id
    • Select Logical table: You can select any logical table within the current tenant for which you have read permissions.

      To learn how to obtain read permissions for a table, see Apply for, renew, and release table permissions.

      Important

      Using a logical table as the data source for another increases the complexity of computing logic and maintenance.

    Object alias

    Define an alias for the source table. For example: t1, t2.

    Object description

    Enter a description for the object, up to 1,000 characters.

    Filter condition

    Define filter conditions for the custom SQL.

    Click the image icon and enter the content in the editor. Example:

    ds=${bizdate} and condition1=value1

    Join field

    The field in the source object that corresponds to the logical table's primary key, used for an equijoin.

    Delete

    The primary source cannot be deleted.

    For a logical table without a primary key, deleting the source clears the computing logic for all fields.

  3. Click OK to save the source configuration.

  4. After configuring the source, drag a Source Field into the computing logic expression. You can also click Quickly Map Fields with the Same Name to automatically map source fields to logical table fields with the same name.

    Click the image icon to edit the computing logic expression in the editor. Aggregate functions such as sum, count, and min are not supported. Examples:

    • Example 1: substr(t1.column2, 3, 10)

    • Example 2: case when t1.column2 != '1' then 'Y' else 'N' end

    • Example 3: t1.column2 + t2.column1

  5. After configuring the computing logic, click the image..png icon at the bottom to validate the expressions.

    Click Preview SQL to view the generated SQL for the computing logic.

  6. Click Save and Next.

Step 4: Configure constraints

  1. Based on the field constraints, the system creates quality rules for the current logical table in the Data Quality module. Set the rule severity for each field, choosing between Strong rule and Weak rule. For more information, see Data table quality rules.

    Note

    The rule severity for logical table field constraints can only be configured here and cannot be edited in the Data Quality module.

  2. Click Save and Next.

Step 5: Configure scheduling

  1. On the Scheduling and parameter configuration page, configure the data delay, scheduling properties, scheduling dependencies, scheduling parameters, and runtime configuration for the dimension table.

    Parameter

    Description

    Data delay

    If you enable data delay, the system automatically reruns all data for this logical table within the period specified by Maximum Delay Days. For details, see Configure data delay for a logical table.

    Scheduling properties

    Specifies how the dimension table is scheduled in the production environment. Configure properties such as scheduling type, cycle, and execution logic. For details, see Configure scheduling properties for a logical table.

    Upstream dependencies

    Defines the logical table's scheduling node. Dataphin uses these dependencies to run nodes in order, ensuring timely and valid data output. For details, see Configure upstream dependencies for a logical table.

    Parameter configuration

    Assigns values to variables in your code. These variables are automatically replaced with their corresponding values during node scheduling. On the scheduling parameter configuration page, you can choose to Ignore a parameter or Convert to Global Variable. For details, see Configure runtime parameters for a logical table.

    Runtime configuration

    Configure task-level runtime timeout periods and rerun policies for task failures. This prevents resource waste from long-running tasks and improves task execution reliability. For details, see Configure runtime settings for a logical table.

    Resource configuration

    Configure a scheduling resource group for the current logical table task. The task consumes the resource quota of this group during scheduling. For configuration details, see Configure resources for a logical table.

  2. Click Save and Commit.

Step 6: Save and commit

  1. After configuring the standard dimension table, click Save and Commit.

    The system validates the settings for table structure, computing logic, scheduling dependencies, and runtime parameters. Review the Check Results and correct any failed configurations.

  2. After all checks pass, enter a submission comment and click OK and Commit.

    When you commit, Dataphin performs data lineage analysis and pre-commit checks. For more information, see Notes on committing standard modeling tasks.

Next steps

If your project uses Dev-Prod mode, publish the logical table to the production environment. For more information, see Manage publishing tasks.

After you publish the logical table to the production environment, you can view and manage its tasks in the O&M Center. For more information, see O&M Center.

上一篇: Logical Dimension Table 下一篇: Create a hierarchy dimension logical table
阿里云首页 智能数据建设与治理 Dataphin 相关技术圈