Data Quality overview
Data Quality is a feature of the Dataphin platform that provides a complete solution for managing data quality. This solution includes features such as quality rule configuration, quality monitoring, scheduling configuration, smart alerts, and verification and administration.
Get started in 5 minutes
Prerequisites
You have purchased the Data Quality value-added service and enabled the Data Quality module for the current tenant.
Background information
As businesses across all industries increasingly build, manage, and use big data, the scenarios in Dataphin become more complex. The quality of raw data from business systems is often inconsistent. Dataphin helps you define data quality standards to ensure that your data is timely, accurate, complete, consistent, and valid. This lets you make business decisions based on reliable data.
Data Quality process guide
The Data Quality process guide walks you through the entire workflow: (Optional) Configure rule templates -> Import monitored objects -> Configure quality rules -> Verify rules -> View verification records and View quality reports -> Perform quality corrections.
Scenarios for quality rules
During development, asset quality is essential for data quality assurance.
If the verification process detects a data quality issue, Data Quality checks the rule's property. The property can be a strong rule or a soft rule. This property determines whether to block downstream nodes to prevent the propagation of low-quality data.
If a strong rule fails verification, an alert is sent and downstream task nodes are blocked.
If a soft rule fails verification, an alert is sent, but downstream task nodes are not blocked.

Function overview
Data Quality supports rule verification and correction for Dataphin data tables, global data tables, metrics, data sources, and real-time metadata tables.
Dataphin data tables: Supports rule verification and correction for multiple table types, such as physical tables, logical fact tables, logical dimension tables, and logical aggregate tables.
Global data tables: Supports rule verification and correction for data tables from various data source types, such as MaxCompute, Hive, MySQL, Oracle, Microsoft SQL Server, PostgreSQL, SAP HANA, AnalyticDB for PostgreSQL, ClickHouse, IBM DB2, DM, Hologres, and AgroDB.
Metrics: Allows you to monitor, receive alerts for, and correct issues with the number of field groups, duplicate field values, field stability, and field volatility.
Data sources: Allows you to monitor, receive alerts for, and correct issues with data source connectivity and table schema changes.
Real-time metadata tables: Allows you to detect statistical values, compare real-time and offline data, compare multiple real-time ingest endpoints, receive alerts for abnormalities, and make corrections.
Data Quality provides an end-to-end solution that includes features such as quality verification, quality monitoring, smart alerts, report generation, and correction initiation for data tables, data sources, metrics, and real-time metadata tables. This process ensures that your data is reliable during production and usage, which helps you avoid making poor business decisions based on low-quality data.
Data Quality includes Quality Overview, Quality Monitoring, and Quality Administration:
Quality Overview displays the number of tables verified and the number of tables that failed verification. This helps you quickly find and fix abnormal results.
Quality Monitoring provides a list of quality rules. It lets you configure rules, view verification records, and view quality reports.
Quality Administration lets you review errors from the verification process. You can then perform administration operations for each issue, such as making corrections, ignoring the issue, or sending notifications. This creates a complete Plan-Do-Check-Act (PDCA) cycle, from planning to correction, to effectively improve data quality.