Create an offline dataset using table mapping

更新时间:
复制 MD 格式

Dataphin tags rely on an offline computing engine. You can create offline datasets by mapping physical table fields directly to metrics.

Prerequisites

Before you create an offline dataset, create a tag project for it. For more information, see Create a tag project.

Procedure

  1. On the Dataphin homepage, choose Tags > Tag Workbench in the top navigation bar.

  2. In the top navigation bar, select a project.

  3. In the left-side navigation pane, choose Data Preparation > Offline dataset.

  4. On the Offline dataset page, click Add dataset. In the Add offline dataset dialog box, select table mapping.

  5. On the Create table mapping page, configure the Basic information, Processing logic, and O&M settings for the dataset.

    • Basic information

      Parameter

      Description

      Dataset name

      Enter a name for the dataset. The name can contain Chinese characters, letters, digits, and underscores (_), up to 64 characters in length.

      Dataset code

      This code distinguishes the dataset from others that share the same name. The code must start with a letter, can contain lowercase letters, digits, and underscores (_), and can be up to 64 characters in length.

      Dataset update method

      The supported methods are Periodic Update and Manual Update:

      • Periodic Update: The dataset is updated automatically at a specified interval.

      • Manual Update: The dataset must be updated manually.

      Owner

      Select an owner for the offline dataset.

      Description

      Enter a brief description of the offline dataset, up to 1,000 characters in length.

    • Processing logic

      Parameter

      Description

      Project/Data sector

      Select the project or data sector that the offline dataset references. The drop-down list includes all projects (bound to an offline computing source) and data sectors within the current tenant.

      Note

      If you have not purchased the Intelligent R&D edition, you can only select a project.

      Logical table/Source table

      Select the logical table or source table for the dataset.

      • Logical table: If you select a data sector for Project/Data sector, you can select a logical table. Only logical tables for which you have read synchronization permissions are available.

        To select a logical table, first select a Logical table type, then a subject domain, and finally the target logical table from within that domain. You can use keywords to search for subject domains and logical tables. Logical table types include fact logical table, dimension logical table, and summary logical table.

        Note

        By default, the logical table output does not include associations.

      • Source table: If you select a project for Project/Data sector, you can select a source table. Only tables that the project's production account has permission to query are available. If you do not have the required permissions, click Apply for Permissions to submit a request.

        Note

        Currently, only partitioned tables can be selected.

      Date partition

      Select a partition field from the source table.

      If the source table is a partitioned table, the system uses the field name as the date partition by default. If the default field name is not in the source table's list of partition fields, the system uses the table's first partition field as the date partition.

      Partition field format

      Enter a date format or select an existing one. You can select yyyymmdd, yyyy-mm-dd, yyyy/mm/dd, or yyyy.mm.dd.

      Entity ID-value type

      Select the entity ID field. This field is used for automatic entity ID mapping during tag processing.

      Note

      The system automatically generates the value type based on the type of the entity ID field.

      Metric configuration

      After you select a project, source table, and Entity ID-value type, you can select metrics for the dataset in the metric configuration list and configure the corresponding source field, code table, and description.

      Note
      • A metric name cannot be the same as the name of a level-1 partition field.

      • The user interface indicates which fields are selectable.

      • A field that is already used as an entity ID cannot be used as a metric.

      • Search for metrics: Enter a metric name or description to search for metrics.

      • Add metrics in batches: Click Add metrics in batches. In the Select source fields dialog box, you can select multiple source fields to add to the Metric configuration list.

      • Configure code table: You can configure a code table only for fields of the Integer, Decimal(M,0), Boolean, and String types.

        1. Click image.png to open the Configure code table dialog box.

        2. In the Configure code table dialog box, configure the parameters.

          • Configure code table: By default, no code table is configured. You can select Code table to configure a code table for the metric.

          • Code table source: Currently, only Manual configuration is supported.

          • Code table name: Enter a name for the code table. The name can contain Chinese characters, letters, digits, and special characters, up to 128 characters in length.

          • Code table description: Enter a brief description of the code table, up to 1,000 characters in length.

          • Code information: You can add code information individually or in batches. Up to 500 pairs are supported.

            • Single entry: Click Add code value, and enter a Code value and a Code name. Both are required and must be unique. The type of the code value must match the value type of the metric. You can click image.png to delete the current row.

            • Batch entry: Click Batch entry. In the Batch enter code information dialog box, you can enter code values and code names in batches. Put each pair on a new line, and separate the code value and code name with a colon (:). When you click Recognize, the system automatically parses the information and adds it to the list.

            • Clear all: Click Clear all to clear the information list.

        3. Click OK to complete the code value configuration.

          Note

          If you enter duplicate code values or code names during batch entry, the system automatically highlights the first invalid row after you click OK.

      • Actions: Click image to delete the metric.

      • Delete in batches: You can delete the selected metrics in a batch.

    • O&M settings

      Note

      This section is not applicable if the Dataset update method is set to Manual Update.

      1. Scheduling cycle

        • Scheduled update time: You can schedule the task to run at a specific time of day. The task runs automatically once a day at the time you specify.

        • Conditional scheduling: After you enable conditional scheduling, select a Configuration method. You can choose Custom settings or Scheduling template.

          Important

          You can add up to 10 scheduling conditions. The system evaluates them sequentially from top to bottom. When a condition is met, the corresponding scheduling action is executed, and all subsequent conditions are ignored. If no conditions are met, the default scheduling configuration is used.

          Conditional scheduling takes effect only when the scheduling type is Normal scheduling.

          Scheduling conditions and start times are calculated based on the configured scheduling time zone.

          • Custom settings

            1. Click +Add scheduling condition.

            2. In the Edit conditional scheduling dialog box, configure the conditional scheduling settings.

              • Condition name: Can contain any characters, up to 32 characters in length.

              • Status: Enabled by default. If disabled, this condition is ignored during scheduling.

              • Meet the following conditions: The evaluation rules for the condition. When the condition evaluates to true, the task is scheduled according to the Run schedule settings. For more information about the configuration, see Conditional scheduling rules.

              • Run schedule: Supports the following options:

                • Custom: If the condition is true, the schedule runs according to the specified Scheduling type.

                • Follow scheduling properties: Uses the same policy defined in the main scheduling properties.

              • Scheduling type: For more information about the configuration, see Scheduling types.

            3. Click OK.

              After you configure the conditional scheduling settings, click Preview schedule plan to view the dates that meet the conditions on the calendar.

              Important
              • After you modify the conditional scheduling settings, submit them, and publish them to the production environment, the changes take effect immediately for instances that are in the Not run state. The changes do not affect instances that are already in the Waiting for run time state.

              • If a conditional scheduling rule uses a cross-node parameter check, you must provide possible parameter values for the preview.

          • Scheduling template

            If you select Scheduling template, you can choose from the Conditional scheduling templates configured in Plan > Common Definitions > Offline scheduling template. If no template meets your needs, click Create scheduling template to create one. After you select a template, you cannot add new scheduling conditions. Click the View details icon next to each condition to view its details.

            Note

            If the task's scheduling cycle is day, week, or month, the Start Time parameter in the referenced conditional scheduling template takes effect. If the task's scheduling cycle is hour or minute, the Start Time parameter is ignored.

        • Schedule plan: Click Preview. The schedule plan displays all scheduled instances and their scheduling types for each day of a given month, based on the configured scheduling cycle and conditions. You can preview by Business date or Run date (scheduling date).

          If instances for a single day have multiple scheduling types, the calendar displays all of them with different colors and shows the name and instance count for each type. For example, the following figure shows that on the 4th of the month, the task has 44 normal scheduling instances, 2 paused instances, and 12 dry-run instances.image

          Hover over a scheduling type block for a specific day to view a detailed list of instances for that day, including run time, scheduling type, and condition name.

      2. Scheduling dependencies

        Scheduling dependencies define the upstream and downstream relationships between nodes. A downstream task node runs only after its upstream dependencies complete successfully.

        • Automatic Parsing

          The system automatically parses the task's data lineage to identify and associate upstream dependencies. The task's update depends on the output from the upstream nodes.

          Note
          • If the auto-parsed results are not as expected, you can click the image.png toggle to disable the dependency on that node.

          • By default, the dependency is set to the current cycle.

        • Add dependency

          If Automatic Parsing fails to identify the scheduling dependencies or the result does not match the actual requirements, you can manually add upstream dependencies for the node.

          Click Add dependency, choose to add a physical node or a logical table node, select one or more target nodes in the dialog box that appears, and then click OK.

          Note
          • If you have not purchased the Intelligent R&D edition, you can only add physical node dependencies.

          • If you click Automatic Parsing after adding a dependency manually, the system overwrites your manual entry if it parses the same node.

        • Edit dependency

          In the scheduling dependency list, click the image icon in the Actions column of the target upstream dependency. In the dialog box that appears, you can modify the dependency period, dependency policy, and Dependency field (can be modified only for logical table nodes). For more information about dependency configurations, see Configure scheduling dependencies for an offline task and Rules and examples of scheduling dependency scenarios.

          To delete a dependency, click the image icon in the Actions column for that dependency.

  6. Click Save and Publish to create the offline dataset.

    Note

    After saving the dataset, click Preview to verify the data generated based on your processing logic.

Next steps

After you create and configure the offline dataset, you can create offline tags for it. For more information, see Create an offline tag.