Manage problem data

更新时间:
复制 MD 格式

Data Quality lets you retain data that fails monitoring checks, helping you to quickly troubleshoot issues and identify their root cause. This topic describes how to manage this problem data.

Background information

  • Problem Data Table

    If some rules fail data quality checks, Data Quality automatically creates a Problem Data Table to store the records that failed the checks.

    • Naming rule: The problem data table is named in the format {source_table_name}_dirtydata_dw_system_dqc.

    • Workspace: The source table's environment determines the workspace of the problem data table. For example, if the source table is dev_project.table1, the problem data table is also stored in the dev_project workspace.

    • Owner: The problem data table shares the same owner as the source table.

    • Retention period: By default, problem data is retained for 15 days, so the lifecycle of the problem data table is also 15 days. If the table meets the lifecycle-based reclamation policy of MaxCompute, the data is deleted. For more information about MaxCompute table lifecycles, see Lifecycle.

    • Partition structure:

      Partition Level

      Partition Field Name

      Description

      Level-1 partition

      dqc_task_run_dt

      The collection time for the problem data. This value is the runtime of the monitoring rule, in yyyymmdd format.

      Level-2 partition

      dqc_task_id

      The ID of the DQC rule instance that collected the problem data.

      Note

      Executing a DQC rule generates a DQC instance.

  • Supported monitoring rules and problem data specifications

    For more information about the monitoring rules that support data retention and the definitions of problem data for each rule, see Appendix: Supported monitoring rules and problem data specifications.

Usage notes

  • You can enable the problem data retention feature only for MaxCompute tables.

  • Only some data quality monitoring rules support retaining problem data. For a list of supported rules, see Appendix: Supported monitoring rules and problem data specifications.

  • Problem data is collected by running rule-generated SQL statements on the MaxCompute compute engine. This process incurs MaxCompute computing fees.

  • The system stores the collected problem data in a MaxCompute temporary table. This process incurs MaxCompute storage fees.

Enable problem data retention

When you configure a monitoring rule for a table and add a template rule, you can enable problem data retention for specific field-level rules. If this feature is enabled, the system saves data that fails a rule check to an automatically generated Problem Data Table. For more information about how to configure monitoring rules for a table, see Configure rules for a single table.

Note

Only some rules support retaining problem data when a check fails. For a list of supported rules, see Appendix: Supported monitoring rules and problem data specifications.

View problem data

Go to the Quality O&M > Running Records page. Find a running record with a result of "Exception" or "Failed" and click Actions in the Details column. On the Monitor Running Details page, click the Problem Data Processing tab.

Note

If you do not have permission to preview the problem data or query the problem data table, go to Security Center to apply for query permissions on the table. For more information, see Access control for MaxCompute data.

View problem data collection logs

You can use the following methods to view logs for problem data collection, including the SQL code and execution process. These logs can help you quickly troubleshoot errors that occur during collection.

  • Method 1: View logs in the Data Quality module.

    On the Quality O&M > Running Records page, view the Raw Log of the target monitoring rule to check the collection process. For more information, see View monitor execution details.

  • Method 2: View logs from the scheduling task list.

    After you associate a data quality monitoring rule with a scheduling node, go to the Operation Details page of the node. On the Execution > DQC panel, click Actions in the View Details column for the target rule to view the collection process. For more information, see intelligent diagnosis.

Appendix: Supported monitoring rules and problem data specifications

  • Built-in templates

    Check type

    Check rule

    Rule description

    Problem data specifications

    unique value

    Number of unique values, fixed value.

    Compares the count of distinct values against a specified fixed value.

    If the unique value check fails:

    • If all values are unique, the system creates an empty problem data table.

    • If duplicate values exist, they are saved to the problem data table as problem data.

    Number of unique values/Total rows.

    Compares the ratio of unique values to the total number of rows against a fixed value.

    null value

    Number of null values, fixed value.

    Compares the number of null values in the field against a fixed value.

    Note

    The check uses the SQL is null operator.

    The system saves all rows containing a null value in the monitored field to the problem data table. These rows constitute the problem data.

    Number of null values/Total rows, fixed value.

    Compares the ratio of null values to the total number of rows against a fixed value.

    Note

    The fixed value is a decimal number.

    duplicate value

    Number of duplicate values/Total rows, fixed value.

    Compares the ratio of duplicate values to the total number of rows against a fixed value.

    The system saves the duplicate values for the field to the problem data table. All duplicate values are considered problem data.

    Number of duplicate values, fixed value.

    Calculates the number of duplicate values by subtracting the distinct value count from the total row count and compares the result to a fixed value.

  • Custom SQL statement

    After you enable Retain problem data, problem data is saved whenever a custom SQL monitoring rule triggers an alert.