Feature updates (2023)

更新时间:
复制 MD 格式

This topic describes Dataphin releases in 2023.

November 2023

Corresponding product version: V3.12

Beijing: Released on November 21, 2023.

Shenzhen and Hangzhou: Released on November 23, 2023.

Shanghai: Released on January 14, 2024.

Feature Name

Description

References

Public calendar

Adds fiscal calendar templates. You can select fiscal calendars such as 4-5-5, 5-4-4, 4-5-4, or 4×13. You can customize the start day of the week and the fiscal base day.

Create and manage a public calendar

Global variables

Supports selecting multiple code reviewers for global variables.

Variable groups and global variables

Statistical periods

Supports creating fiscal statistical periods based on a specified source fiscal calendar.

Create a statistical period

Data modeling

  • Supports creating logical aggregate tables without metrics.

  • Business filters support configuring different computing logic for different data timestamp ranges.

Compute tasks

  • Supports full data download for the complete results of a query statement. You can manage download approvals in Management Center > Data Download.

  • When you submit a SPARK_JAR_ON_MAX_COMPUTE task, the system automatically removes control characters (\r and \n) from the beginning and end of code copied from a Windows operating system.

  • MaxCompute now supports the `qualify` and `tablesample` syntax.

  • MaxCompute SQL supports using system reserved words and keywords as field names. However, we do not recommend using system reserved words and keywords as field names.

Functions

Supports selecting archive resources, such as .zip files, when you create a function.

Function operations

Data sources

  • For FTP data sources with the protocol type set to SFTP, supports using a username and key file for authentication.

  • Adds the Salesforce data source. You can extract data from Salesforce using batch synchronization.

  • OSS data sources support folder configuration. This is useful in scenarios where an account has only folder-level permissions.

  • Supports configuring a data source encoding. After you configure the encoding, you can reference tables in MySQL, Hologres, and MaxCompute data sources in a Flink SQL node using the datasource_encoding.table or datasource_encoding.schema.table format.

Security settings

You can configure safe mode settings, such as whether to allow cross-project table creation and whether to allow writing to the production environment from the development environment, on a per-project basis.

Security settings

Offline integration

  • The operational logs for batch synchronization tasks now display structured information that is easy to read and understand:

    • Preview results: Preview data when you develop a synchronization task.

    • Log information: View operational information, channel information, step measures, and raw logs.

    • Error information: If a task fails, view the error message in the operational log. Intelligent diagnostics provide clear error causes and potential solutions.

    • Runtime code: View the Dlink code for the current run.

  • Adds submission validation details:

    • Validates whether the pipeline and component configurations are complete.

    • Verifies permissions for data sources, data tables, keys, and quality monitoring objects.

    • Checks for duplicate source and sink tables.

  • The new Salesforce input component enables you to extract data from a Salesforce instance offline.

  • Adds FTP as a source database for whole-database migration. You can upload and parse a formatted Excel file (.xlsx) to create FTP file synchronization tasks in batches.

  • The FTP reader component supports adding output fields by specifying the start and end character positions in the source file.

  • The SQL Server reader component supports Hint syntax.

  • Optimizes the OSS reader and writer components. You can now read Excel files and configure a policy to handle file prefix conflicts.

Real-time development

  • Optimizes the development experience for real-time tasks:

    • Flink SQL nodes support accessing tables in MySQL, Hologres, and MaxCompute data sources using data source encodings.

    • You can specify an owner for metadata tables and runtime images during development. The owner can be modified on the Asset page after the table or image is published.

  • Adds a real-time task migration tool. You can migrate real-time tasks from the Flink VVP platform to Dataphin with one click.

Scheduling Configuration

  • Supports scheduling tasks at fixed intervals without using cron expressions.

  • Enables cross-cycle dependency settings:

    • For dependency cycle, you can select Current cycle (current day), Previous cycle (previous day), Previous N days, or Last 24 hours.

    • For dependency policy, you can select All instances, First instance, Last instance, or Latest instance.

Basic O&M

  • Supports configuring Not completed alerts for tasks that are scheduled by hour or minute.

  • Optimizes the startup of real-time instances:

    • When you stop a real-time instance, you can choose to stop it with its current state retained or perform a stateless stop.

    • When you start a real-time task or instance, you can choose to perform a stateless start or start from the latest state.

AIOps

Supports configuring accounts that can manage baseline monitoring tasks.

Create a baseline monitor

Data Catalog

  • Adds a configuration center for the data catalog. You can manage subject groups and configure data profiling. This feature requires the Data Quality module.

    • Subject group management: Centrally manage created subject groups, modify basic group information, and quickly navigate to the subject square to view all subjects within a group.

    • Data profiling configuration: Centrally configure the scope of physical and logical tables for automatic data profiling. Configure the number of concurrent profiling tasks, the timeout period for a single task, and the retention period for profiling records.

  • Optimizes the data catalog feature:

    • The subject details page supports an edit mode and a view mode.

    • You can filter physical tables by data plate and subject area.

    • Adds a search entry on the asset details page to look for other assets.

    • Optimizes the preview feature for partitioned tables. By default, the latest partition with data is queried. However, in some scenarios, engine and partition type limitations may prevent the retrieval of the latest partition with data. In such cases, the latest partition is queried by default, which may return no data.

Data profiling

Adds the data profiling feature. This feature requires the Data Quality module. You can configure automatic and manual profiling tasks for physical tables, physical views, and logical tables.

  • Configure the data scope, profiling frequency (for automatic profiling only), field scope, and profiling scenarios (null value statistics, field value distribution, unique value statistics, subject to data type limitations). You can also configure permissions to view profiling results.

  • Preview the generated profiling SQL.

  • View profiling records and their corresponding logs. You can stop a running profiling task with one click. You can also view the configuration for each profiling record and quickly start a new profiling task based on an existing configuration.

  • For successful profiling records, you can view a profiling report. The report includes a configuration overview and result cards for each profiled field and scenario.

Data Standard

  • Data standard sets are used to define management properties for standards, such as view permissions and approval templates, to enable more precise control over data standards.

  • Data standard templates are used to define the properties that must be filled in when creating a data standard, ensuring standardized definitions.

  • Data standard templates support defining uniform specifications and constraints that data standards created from the template must follow. This is done by defining standard properties, which mainly include business, technical, and management properties.

  • Optimizes batch operations for data standards. You can import and export data standards across different standard sets in batches. You can also view batch import records and submit data standards for publishing in batches.

Data Quality

  • Quality reports support selecting all table partitions.

  • The consistency comparison of statistical values for fields in two tables supports multiple join types, such as Left Join, Right Join, Inner Join, and Full Join.

  • Supports configuring scoring weights for monitored objects of Dataphin tables and global data tables. You can also configure scoring weights for quality rules of Dataphin tables (custom configuration) and global data tables.

Data security

  • Security scans are adapted for scenarios where the partition field is not `ds` or where there are no partitions. The scan is performed using a full table scan with a LIMIT clause.

  • The highest data classification level of table fields can be displayed in the Data Catalog, personal center, and basic table information.

Tag Factory

  • Offline tag service tasks support selecting sink tables across projects.

  • Supports using physical tables from other projects as source tables to create offline views.

  • Offline views and behavior relationships support manual dependency parsing and manual configuration of upstream dependency nodes.

  • Filter conditions for composite tags, behavior statistics, and preference tags support fuzzy matching (like) and fuzzy non-matching (not like).

  • Adds a user guide for the tag platform to help you get started quickly.

Analytics Platform

Adds an SQL query feature. You can save, precompile, accelerate, run, and share queries, along with configure parameters, format code, and locate SQL query tasks. You can also view the query results, task logs, and code for SQL tasks.

DataService Studio

  • DataService Studio is integrated with Tag Service, and the service module includes built-in tag projects.

  • DataService Studio removes the strict validation check for members who have not transferred their permissions.

    • In tenant member management, you can delete members from a service project.

    • When adding members to a service project, you can select any active account within the current tenant (excluding members deleted from tenant member management).

    • When you query member information within the service, the information is displayed correctly even if the member has been deleted from tenant member management.

  • The frontend input length for API debugging and testing is limited to 1,000 characters.

  • Supports creating APIs through service orchestration. You can manage API versions, and test, debug, publish, and delete orchestrated APIs.

August 2023

Corresponding product version: V3.11

Hangzhou: Released on August 8, 2023.

Beijing: Released on August 10, 2023.

Shanghai: Released on August 13, 2023.

Shenzhen: Released on August 15, 2023.

The Tag Factory module will be launched in the Hangzhou, Shanghai, Shenzhen, and Beijing regions on August 15, 2023.

Feature Name

Description

References

Resource statistics

  • Adds support for downloading weekly or monthly snapshots of resource statistics details.

  • Virtual nodes are not included in data processing unit (DPU) statistics. This reduces DPU consumption in scenarios where virtual nodes are used as common upstream and downstream dependencies for batch node management.

  • The billing check is moved from the submission stage to the publishing stage. This reduces the impact of test tasks on resource usage.

  • Adds a pre-check and a billing check for DPUs during the publishing process and displays validation details.

Notification Center

Adds a notification settings feature. You can configure notification channels to inform relevant personnel when an approval task is initiated or an approval action is taken. Adds email and DingTalk group notifications.

Notification settings

Smart Development

  • Optimizes logical table field configuration. You can set asset administration properties, such as matching roots and standards to fields, and setting whether a field is unique, nullable, and its security classification.

  • Optimizes logical operators for business filters. You can use AND, OR, and NOT operators to create derived business filters.

  • Supports using `max_pt` to get the latest hash partition when you query a logical table.

General Development

  • Optimizes offline physical table field configuration. You can set asset administration properties, such as matching roots, standards, and security classifications to fields.

  • Upgrades and optimizes the editor:

    • Supports using the at sign (@) to select table fields in batches in the `SELECT` list or `WHERE` clause.

    • Hover over a table name to view its schema. Hover over a function in SQL to view its description.

    • When you enter a `set` parameter, the system automatically suggests a list of available parameters for the current engine and provides their descriptions.

    • The system automatically detects bad syntax in SQL, provides error descriptions, and offers quick fix options.

    • Automatically recognizes statements, provides quick run options, and offers many shortcuts and keyboard shortcuts.

  • Offline physical tables support setting subject areas, viewing historical versions, and comparing versions.

Data sources

  • Optimizes the following data sources: MySQL, PolarDB-x, PolarDB, AnalyticDB for MySQL, AnalyticDB for PostgreSQL, TiDB, GoldenDB, StarRocks, PostgreSQL, GreenPlum, SQL Server, Vertica, SAP Hana, DB2, OceanBase, ClickHouse, DM, KingbaseES, Gbase 8a, and Apache Doris.

    • Adds the connectTimeout (if available) and socketTimeout (if available) configuration items.

    • When you create a data source, the default value of connectTimeout is 15 minutes.

    • The default value of socketTimeout is 30 minutes. This prevents synchronization tasks from running for an extended period and consuming resources if a timeout is not configured.

  • Kafka data sources now support SASL/SCRAM-SHA-256 and SASL/SCRAM-SHA-512 authentication.

Offline integration

  • Adds batch synchronization for Apache Doris data sources.

  • Supports MaxCompute as the destination database for whole-database migration tasks.

  • Optimization for full database migrations:

    • Adds a configuration item for the loading policy of the destination database.

    • Adds a table name validation feature. This feature checks for and displays tables with the same name in the destination database. If a table with the same name exists, you can configure table name replacement or select the option to automatically delete the table with the same name from the data source.

    • Optimizes the synchronization method. If you select daily synchronization, a recurring task scheduled daily is generated. If you select one-time synchronization, a one-time task is generated. If you select both, a recurring task and a one-time task are generated.

  • Optimizes the channel configuration for synchronization tasks to reduce invalid waiting times caused by various exceptions:

    • The default number of database connection retries is changed from 7 to 1. This reduces long retries and waiting times caused by database connection issues.

    • Adds an SQL execution timeout period, which is 30 minutes by default. If the execution time of a preparatory or concluding statement exceeds this period, the task fails.

    • Adds a no-traffic time threshold, which is 30 minutes by default. If the duration of no data being read or written during a synchronization task exceeds this threshold, the task fails.

Real-time development

  • Improves the O&M features for real-time tasks. You can configure alerts and the number of sending attempts after a runtime failure.

  • Optimizes the development experience for real-time tasks:

    • In a Flink SQL node, you can create a metadata table with one click based on a native DDL statement.

    • Optimizes metadata tables by improving the accuracy of Flink field type mapping for data sources such as Kafka, MySQL, and Hologres.

    • Optimizes the object publishing experience. When you publish objects in batches, you can build the publishing order based on dependencies to increase the success rate of task publishing.

    • Supports automatic generation of real-time instances after a real-time task is submitted or published. This ensures real-time synchronization with the task status on the Flink VVP side.

Scheduling and O&M

Adds support for creating data backfill tasks:

  • You can configure data backfill tasks to be triggered on a schedule or manually. This is useful for scenarios where the node range and data timestamp for backfilling are regular and determined, reducing manual effort.

  • The data timestamp for data backfill tasks supports options such as the last N calendar weeks and calendar months.

  • You can save a one-time data backfill configuration as a data backfill task with one click, including node range selection and runtime rule configuration.

  • Supports timed scheduling for data backfill tasks. The system automatically backfills historical data periodically.

General data backfill instances

Data Catalog

  • Adds the subject square feature for asset classification.

    • Adds the Operations Administrator role, which is responsible for managing asset subjects and subject groups.

    • Adds the asset subject square, where you can quickly search for and view asset subjects for which you have permissions. A guide for creating new subjects is provided to reduce the learning curve.

    • Supports creating subject groups and asset subjects. You can set view permissions and directory organization methods for subjects. You can create up to five levels of directories under a subject and add asset objects (data tables) to the subject.

  • Upgrades the asset list to optimize search and viewing paths.

    • The data table list supports querying and filtering assets by subject.

    • Upgrades the search box. You can quickly switch asset types in the search box while retaining the search term.

    • In the data table list, you can hover over View Details to quickly view more asset summary information.

  • Supports custom asset labels for data tables and searching by asset labels.

Data Standard

  • Adds a feature to export data standards in batches.

  • Supports importing mapped relationships in batches from Excel files and viewing import records.

  • Supports viewing the source details (generated by mapping rules, associated during development, or imported manually in batches) and the operator of a mapping relationship.

  • Supports manual deactivation of mapping relationships.

  • Supports configuring quality monitoring in Data Standard:

    • In the mapped relationship list, you can add quality monitoring rules configured for the related data standards to asset objects in the mapping relationship in batches and add a scheduling method. These rules affect the standard mapping evaluation rate in the standard mapping evaluation details, completing the data standard monitoring link.

    • Quality monitoring rules configured on the standard side are visible in the rule list and quality reports of the Data Quality module but cannot be edited or deleted. They can only be edited on the standard side, thus enforcing strong constraints for standards.

  • Supports configuring global standard mapping evaluation tasks for auditing manually added mapped relationships and updating the monitoring results of all quality rules created on the data standard side.

  • Upgrades the standard mapping evaluation details by adding statistics for data standard monitoring rules, which affects the calculation of the standard mapping pass rate.

Asset Quality

  • Supports viewing quality rules, validation records, the administration workbench, and rectification processes from global, project, and personal perspectives. It also controls the data scope that individuals can view based on their permissions, making the platform more secure and improving administration efficiency.

  • The quality partition now supports multi-level partition writing and is no longer limited to a fixed partition expression. For example:

    • To validate all data partitions from the last 7 days: ds>=$[yyyyMMdd-7d] and ds<=$[yyyyMMdd].

    • To validate the data partition for Hangzhou yesterday: ds=$[yyyyMMdd-1d] and city="hangzhou".

  • Downstream quality reports and quality administration items are split according to the actual validated partitions. Independent quality reports are generated for different validated partitions.

  • The scheduling method now includes partitions updated by tasks. Validation is performed based on the partitions actually updated in the code, without needing to specify partitions in advance.

  • Quality reports and validation records support date positioning. You can only select dates with validation data, which helps you quickly locate the last executed quality report and validation record.

Asset Security

  • Secure data classification supports industry-level multi-level classification systems, allowing for multi-layered data classification based on subjects. Data classification is bound to data levels and recognition methods, serving as the basis for downstream data recognition and data masking.

  • Data classification has built-in security industry templates for out-of-the-box use. You can view the built-in industry templates and import their settings into your enterprise's data classification system. The built-in templates include various recognition features and data classifications for personal sensitive data and enterprise data.

  • Supports table owners modifying the classification and level of the tables they are responsible for, improving the operational efficiency of the security module.

  • The security encryption and decryption algorithm supports the Hex format.

Database Permission

  • Supports batch mixed applications for data table permissions across table types, projects, and plates.

  • Supports one-click application for missing permissions during the publishing process, improving the efficiency of permission applications.

Request, renew, and release table permissions

DataService Studio

  • Adds support for registering external APIs. You can publish, manage permissions (request, grant, etc.), and call registered APIs. Supported O&M and monitoring operations include configuring rate limiting, alerts, and call statistics.

  • API tasks support multi-version management, with each version having isolated draft, development, and production states.

  • The direct connection data source mode now supports StarRocks data sources.

  • API O&M and monitoring adds four metric data points: number of published APIs, number of online APIs, number of API calls, and online API call rate.

  • Supports enabling service upgrade mode in system configuration to ensure normal API calls during upgrades.

  • Project management adds API groups. You can specify the group to which an API belongs when creating it.

Analytics Platform

Adds the Analytics Platform module. The Analytics Platform is a fast and convenient data platform for individual users. This release supports Notebook tasks, allowing you to write rich media content that combines SQL code and Markdown text in a notebook format.

  • Each run of an SQL cell generates a temporary table. You can save and display the results, and directly copy and query or use them.

  • Supports using projects in the current user's Dev/Basic environment with ad hoc query permissions as SQL running projects.

  • The Analytics Platform supports binding a dedicated compute source and running SQL (the compute source cannot be modified after binding).

  • The Analytics Platform supports sharing Notebook tasks, setting the temporary table lifecycle, and configuring download approval policies.

  • Supports configuring non-username/password global variables and Notebook local variables.

  • Supports viewing and managing tables created by the current user with create table under the dedicated compute source of the Analytics Platform.

Analytics Platform overview

Tag Factory

Adds the Tag Factory module, which includes features for offline tags, offline views, behavior relationships, the tag marketplace, offline tag services, and application management.

Tag Factory overview

Management Center - Cross-tenant publishing

  • In addition to full export of deployment packages, now supports incremental and specified object export.

  • When importing a deployment package, supports global settings for the import rules of the package.

  • You can preview the details of a deployment package and view the overall publishing status of the package.

  • Optimizes the publishing experience. You can view objects to be published and publishing records on the same page.

  • Supports cross-tenant publishing of tag objects, including entities, entity IDs, tag projects, and offline tags (metric mapping, rule-based composite) in the tag platform.

  • Supports exporting objects created in historical versions (before Dataphin v2.9) for cross-tenant publishing.

July 2023

Hangzhou, Beijing, Shenzhen, and Shanghai: Released on July 11, 2023.

Feature Name

Description

References

Billing

Expands the data processing unit (DPU) specifications for Dataphin Smart Development Edition and Basic Development Edition to include 3500, 4000, 4500, and 5000.

June 2023

Hangzhou: Released on June 20, 2023.

Beijing and Shenzhen: Released on June 27, 2023.

Shanghai: Released on July 1, 2023.

Feature Name

Description

References

Management Center - Specification settings

  • Adds a data download settings feature. You can set watermarks and file formats for data downloads.

  • Adds a database permission approval policy feature. You can set approval policies based on project and plate ownership, environment, table type, data security level, and permission type.

Management Center - Approval templates

  • Optimizes approval template management. You can modify the approval node settings of built-in approval flows or customize approval templates.

  • Enhances the capabilities of built-in template approval processes. You can add or delete change owner and add signature operations, and add custom approval operations.

Management Center - Data sources

  • Optimizes the metadata retrieval and creation process for custom data sources.

    • For RDBMS (relational database management system) type databases, you need to upload a database driver to create a data source type. The system automatically generates reader and writer components, from which you can retrieve table and field metadata. You can create synchronization tasks through integration pipelines and the code editor.

    • For non-RDBMS type databases, you need data source configuration items and reader/writer plugins to create a data source type. You can create synchronization tasks through the code editor.

  • Supports configuring data source labels for better classification and management of data sources.

  • Supports displaying the application scenarios for each data source based on the currently activated features.

Smart Development

Upgrades the overall functionality of the Smart Development edition, including the following:

  • Adds a non-empty required validation for statistical period expressions.

  • The owner of a logical aggregate table defaults to the plate architect.

  • Logical fact tables associated with a logical dimension table can be displayed within the logical dimension table.

  • Modifications to parameters in custom materialization are recorded in the version details.

  • Adds Delete, Unpublish, and Unpublish and Delete buttons to the derived metric details page.

  • Add derived metrics

  • Create a statistical period

  • Create a common dimension logical table

General Development

  • Supports disabling recurring scheduling for Basic projects and production projects in Dev-Prod mode.

  • Optimizes the ad hoc query feature by adding a query acceleration switch, which supports 5 concurrent queries by default.

  • Optimizes scheduling configuration. The system can automatically generate upstream dependencies and output names.

  • The compute task list supports batch operations.

  • Optimizes the operation flow for compute tasks and synchronization tasks. Compute tasks support viewing input tables, output tables, and downstream task information.

Real-time development

  • Supports writing DDL to read and write Iceberg data in Flink SQL nodes. Supports normal precompilation, debugging, running, and O&M for DDL tasks.

  • Under the Flink VVP engine, supports using Elasticsearch data sources as dimension tables, sink tables, and source tables. Supports using Hudi data sources as real-time data sources, serving as sink tables and source tables.

  • Supports data profiling for JSON, CSV, canal-json, maxwell-json, and debezium-json data formats in Kafka.

  • Supports data profiling for Kafka with no authentication and username/password authentication in non-SSL mode.

  • The Dataphin system supports upgrading the Flink VVP engine to version 1.15.

  • Real-time computing templates support configuring runtime parameters. Tasks created based on the template automatically inherit these parameters.

  • Supports configuring Kafka data sources with usernames and passwords, so users do not need to configure them manually in tasks.

  • Supports configuring checkpoints and selecting dependency files on the Flink SQL node page.

  • Optimizes field type matching for runtime images. Field type matching is now based on Flink SQL field types instead of original field types.

Basic O&M

  • Instance count statistics are updated in real time and can also be manually refreshed.

  • Supports configuring project-level monitoring and alerting based on task type and scheduling type. These settings apply to all objects in the project, and newly added tasks that meet the criteria will automatically be configured with the corresponding monitoring and alerting.

  • Supports monitoring and alerting for entire logical tables. Newly added fields will automatically be configured with corresponding monitoring and alerting.

  • Optimizes data backfill. Search results can accurately locate a specific data backfill instance.

  • Dry-run task data backfill supports choosing whether to run normally, increasing the flexibility of data backfill tasks.

Offline integration

  • Expands data source support to include OceanBase (Oracle mode) and writing to Redis data sources. Whole-database migration tasks now support IBM DB2 and OceanBase as source databases.

  • Integration component optimization:

    • In one-click table creation and whole-database migration, MaxCompute table creation supports outputting Chinese field names.

    • Adds a database retry policy configuration for synchronization tasks. This prevents synchronization tasks from failing due to occasional database connection failures or from being blocked for too long due to connection failures.

    • The logical table synchronization component is adapted to support the field types of logical tables.

    • Optimizes the FTP component separator. Reading CSV files supports custom single-character field and row separators. Reading TEXT files supports custom multi-character field separators. Writing CSV and TEXT files supports custom multi-character field and row separators.

    • Supports uploading local files of Excel and Text types.

    • The MongoDB component supports getting metadata in anonymous mode and allows users to configure the authSource parameter.

    • Supports getting metadata from all versions of ClickHouse data sources.

    • One-click table creation supports automatically getting the table comment from the reader component and writing it into the table creation statement.

    • The MySQL and AnalyticDB for MySQL 3.0 writer components now have a one-click table creation feature.

  • Adds support for quickly configuring quality monitoring rules for data sources or tables related to a task when creating a synchronization task:

    • Supports configuring quality rules for schema changes, table stability, and table fluctuations for data tables in a synchronization task.

    • Scheduling methods support timed scheduling, pre-task scheduling, and post-task scheduling.

    • In a pipeline, you can view all quality rules configured for all data tables in the pipeline task.

Data Standard

  • Adds system properties for value range, whether a value is unique, whether a value can be null, and whether a value can be an empty string, with built-in corresponding standard mapping monitoring configurations. When you create a standard, the system automatically creates corresponding content quality monitoring rules based on the property values you enter.

  • Adds system properties for data classification and data level, which are compared with the results from the security module to enhance the association between standards and security.

  • Data standard sets support adding user groups as members. Changes to standard set members are dynamically updated with changes to user group members.

  • Data standards support version comparison, allowing you to clearly and intuitively view changes in data standard information.

  • Adds a mapping relationship feature. You can view the mapping relationship checklist between data standards and asset objects from an asset summary perspective or a full display perspective. You can also perform operations to set a mapping as "invalid" or cancel the invalid status.

  • Supports importing invalid mapping relationships in batches from Excel files, viewing import records, and downloading details of abnormal and skipped records.

Data Standard - Quality

Adds a data standard lookup table comparison template. You can reference data standard lookup tables for data validity checks.

Asset folder

  • Adds a personal data center feature. You can view all data tables for which you are the owner and transfer ownership of data tables in batches.

  • Supports modifying the owner of a data table on the asset details page.

  • Supports viewing owner transfer records.

  • The logical table details page supports generating DDL statements.

Asset Quality

  • Adds an administration workbench feature. You can initiate rectification for quality issues, re-validate, and manage whitelists. This completes the quality improvement loop from quality rule configuration and quality issue detection to quality issue rectification.

  • Exception archiving supports recording complete exception data into a data table for subsequent analysis.

  • Adds permission management to protect sensitive data.

  • Under the fixed task scheduling method in scheduling configuration, supports prioritizing quality rule validation before task scheduling and supports source data validation.

  • Supports resource owners adding the resources they are responsible for as quality monitoring objects.

  • Adds a data standard lookup table comparison template. You can select a published lookup table from a data standard for comparison, supporting the selection of code values, code names, and English code names.

Asset Security

  • Real-time security recognition supports triggering real-time recognition upon data changes.

  • Supports project administrators using security features. Project administrators can edit the classification and level of table fields within their projects.

  • Supports built-in function masking. You can implement masking using native database functions without installing a security policy.

  • Optimizes key management permissions. You can restrict management permissions to only super administrators and key owners to manage top-secret keys.

  • Optimizes recognition rule testing. Supports testing with specified tables and multiple rules to improve testing efficiency.

DataService Studio

  • Expands data source support. Direct connection APIs now support Lindorm and ClickHouse data sources.

  • Single physical table service units support deleting fields that have been deleted from the source table.

  • Supports calling Dataphin data sources using Python.

  • Supports downloading API documentation from the API marketplace.

  • The One-click Transfer Owner feature on the member management page supports transferring the owners of service units, APIs, data sources, and applications, along with alert recipients in DataService Studio.

  • The query time range for O&M and monitoring is configurable. You can customize the storage and queryable time for O&M and monitoring data.

  • Adds Python call examples to the call examples.

  • Optimizes advanced SQL compilation and parsing parameters. Historical parsing results can be retained without re-parsing.

April 2023

Hangzhou: Released on April 25, 2023.

Shenzhen: Released on April 27, 2023.

Beijing: Released on May 9, 2023.

Shanghai: Released on May 13, 2023.

Feature name

Description

References

Engine compatibility

Adds support for the Hologres compute engine. When MaxCompute is the selected compute engine, a project can be bound to either a Hologres or a MaxCompute compute source.

None

Member management

  • Allows you to create custom project roles and replace multiple roles simultaneously.

  • Allows for the one-click transfer of ownership and permissions for objects, such as data sources, from one user to another. You can also view the transfer history.

Data sources

  • Offline integration now supports OpenGauss, SAP Table, StarRocks, Hudi, Doris, and Greenplum data sources.

  • Real-time development now supports StarRocks data sources.

  • Optimizes the following data source features:

    • Kafka data sources now support uploading JAAS files for Kerberos authentication and are compatible with the Confluent Kafka Schema Registry.

    • MySQL data sources now support Secure Sockets Layer (SSL) encryption.

    • API data sources now support API key and token authentication methods.

    • MongoDB now supports versions 3.2, 3.4, and later.

Compute sources

You can create multiple MaxCompute and Flink compute sources simultaneously.

Compute settings

When MaxCompute is set as the compute engine, you can switch endpoints in Compute Settings.

Set MaxCompute as the compute engine for a Dataphin instance

Common business logic

Adds a public calendar feature that lets you create multiple calendars, mark dates as holidays or workdays, and manage date tags.

Project management

Flink SQL tasks now support global variables.

Create a project

Standardized modeling

  • Optimizes the standardized modeling submission process to reduce submission time.

  • Event-based logical fact tables now support delayed data processing. Instances of these tables are automatically rerun for the configured period.

  • Event-based logical fact tables now support backfilling data for multiple periods in a single instance. You can backfill up to 30 days of partitions for the current node.

  • You can delete partitions from logical tables in this section.

  • Adds the following measurement units to atomic metrics: hand, sheet, piece, time, package, unit, ton, and kilogram.

  • Optimizes the compute logic configuration, including validation and preview.

  • Dependency parsing and non-empty validation are automatically triggered when you open the Scheduling Configuration or Parameter Settings page.

  • Create and configure logical fact tables

  • Create atomic metrics

  • Scheduling Configuration

Offline integration

  • Adds new input and output components: Greenplum input and output, Kudu input, Elasticsearch input, and API output.

  • Optimizes component features by adding the GET_JSON_OBJECT function to parse JSON strings and return content from a specified path, and the COALESCE function to return the first non-null value.

  • Adds data source support for reading from and writing to OpenGauss and StarRocks data sources. Adds SAP Table input to read SAP data using RFC.

  • Optimizes the following integration components:

    • Oracle output components allow you to set login and query timeout values.

    • FTP input components now support the .xls and .xlsx formats.

    • FTP output components now support exporting compressed files in ZIP or GZIP format, or with no compression. A filename conflict error policy is added that causes the task to fail if a filename conflict occurs.

    • Hologres input and output components do not support views.

    • For Confluent Kafka, you can now use Schema Registry. Kafka Avro is added as a key and value type for input and output components.

    • API input components now include a URL path configuration option.

    • MaxCompute input and output components now support reading Date types and writing Date, Tinyint, Smallint, and Float types.

    • OSS input and output components allow you to set row delimiters.

  • Full-database migration tasks now support upstream dependencies.

Offline development

  • When you create or edit MaxCompute user-defined functions, you can select resources with .jar or .py extensions.

  • Adds table management features that allow you to create, delete, modify, and view physical tables, and import data into them. You can also view physical table columns and edit or import columns directly from the list.

  • Supports all window functions and window syntax for MaxCompute engines.

  • In SQL tasks, you can use variables to define table names in DDL statements.

Create a user-defined function

Real-time development

  • Flink SQL tasks now cache the most recent debug test data.

  • Flink VVP engines now support StarRocks data sources as source or target tables.

  • Flink VVP engines allow you to restart stopped tasks from their last state.

  • Improves the meta-table experience. You can now copy code for Flink DDL export with a single click. Different sample SQL imports are provided for meta-table fields based on the data source type.

  • Real-time code templates now support running the same code with multiple configurations.

  • Flink SQL now supports native Flink DDL statements. You can configure whether to allow native Flink DDL statements.

  • Flink VVP engines allow you to configure the compute source version and queue for development environments in tasks.

Scheduling Configuration

  • For tasks with Normal scheduling, you can enable conditional scheduling. After you enable this feature, both recurring instances and data backfill instances follow the configured rules. This allows a single task to use different scheduling methods for different combinations of conditions:

    • You can set scheduling properties using calendar attributes of the data timestamp and combinations of input parameters.

    • You can configure multiple sets of scheduling conditions.

    • After you enable conditional scheduling, both recurring instances and data backfill instances use the conditional scheduling rules.

  • Adds task-level settings for runtime timeout, automatic rerun attempts upon failure, and rerun intervals. Each task can use either the tenant-level default settings or custom settings.

Scheduling Configuration

Basic O&M

  • Data backfill:

    • Backfill by period: You can specify fixed weekly or monthly dates. You can also select end-of-month to automatically calculate the exact date for each month.

    • Configure the order of data backfill instances: You can run instances in ascending or descending order of their data timestamps.

    • Optimizes filters in the recurring task and recurring instance lists.

    • If selected task nodes have cross-cycle dependencies, you can set the concurrency group count to 1 to ensure correct data dependencies.

    • You can export or view the list of selected nodes.

    • You can quickly exclude paused nodes and their downstream nodes.

    • You can select backfill nodes using one of the following methods: all downstream nodes of the current node, first-level child nodes and their downstream nodes, or all nodes along the path from a start node to a specified end node.

  • Other optimizations:

    • Adds an automatic rerun indicator to operational logs to distinguish them from manual reruns.

    • You can manually mark unrun instances as successful.

  • Data backfill instances

  • View and manage script tasks

  • View and manage script instances

Asset overview and catalog

  • Expands data table production tasks to include integration tasks, custom lineage tasks, and system-parsed lineage tasks where the table is the output. Optimizes the display of production information, allowing you to view the average start and end times, average runtime, and instance lists for each task.

  • For tables with quality monitoring rules, you can quickly view quality reports. For tables without rules, you can quickly create new quality rules.

  • Expands lineage visualization to cover the full-domain table lineage, including synchronization from business systems to compute sources and backflow from compute sources to business systems.

  • Asset overview

  • Asset catalog

  • Asset details

Asset permissions

  • Displays the security classification and level for fields when you apply for table permissions.

  • Adds user group permission management.

  • Removes default global database permissions for system administrators. System administrators must now join a project and grant themselves permissions.

Data standards

  • Splits public standard set attributes into system and custom attributes. When you create standard set attributes, you can reference built-in system attributes or custom public attributes.

  • Built-in system data type attributes monitor compliance using metadata from asset objects.

  • Adds range-value attributes to define constraints, such as value domains. This feature supports enumerations, lookup tables, and interval definitions.

  • You can view global compliance details from both the standard and asset object perspectives.

  • You can create quality monitoring rules when you build data standards. For assets that are mapped to the standard, you can quickly apply matching quality rules in Data Quality to monitor content quality.

  • Adds a functional overview and workflow guide for data standards.

  • In the standard set list, you can quickly create data standards, batch export data standards, and create mapping rules.

Asset quality

  • Supports batch configuration of quality rules.

  • You can view schema change validation records for data source tables and compare the changes side-by-side.

  • Adds a quality rule template for cross-source field statistical consistency, which is used for cross-data-source comparisons.

  • Adds LIKE expressions to validity and field format validation templates to match at the start or end of a string. Adds built-in expressions for phone numbers, landlines, ID numbers, email addresses, and bank card numbers.

  • Adds the built-in hourly partition expression ds='${yyyyMMdd HH}' to quality partition filter expressions.

  • You can configure whether to trigger quality checks using partition filter expressions.

  • You can directly reference quality monitoring rules that are configured for data standards and mapped to the fields of the current table. You can also actively audit data standards.

  • For timed scheduling, you can configure scheduling conditions to trigger quality rules only on specific dates.

Asset security

  • You can view detection result details, including active and hit rules, and arbitration results.

  • You can specify projects for scan tasks to save compute resources.

  • Caches data samples for user security detection to reduce resource consumption and improve detection stability, execution history, and accuracy.

  • You can add detection records manually for a single field, in batches, or by uploading an Excel file.

  • You can lock detection records manually. Locked records cannot be changed.

  • Optimizes detection rule configuration. You can combine the detection scope and method with AND/OR logic up to two levels and set detection thresholds.

  • Includes built-in detection rules for common sensitive data, such as ID numbers, addresses, and Chinese names.

  • Hologres engines now support native database functions for data masking. No algorithm packages are required.

  • Adds default masking policies. You can configure policies by data classification to mask sensitive data that is not covered by detection rules.

  • Manage detection details

  • Create, configure, and manually trigger detection rules

Data Service Studio

  • Approval is no longer required for granting permissions on applications, service units, APIs, or Dataphin data sources. Approval is also not required for transferring the responsibilities of an application, service unit, or application owner.

  • You can write query logic using MyBatis tag SQL syntax. The supported tags include if, choose, when, otherwise, trim, foreach, and where.

  • Data Service Studio now supports page watermarks.

  • Adds Hologres data sources to the Dataphin data source selection list.

  • When you create new APIs, you can revert to the first step to switch the creation method.

  • Optimizes the interactions in the API wizard mode for a better user experience.

  • Develop Dataphin data sources and accelerate queries

  • SQL mode

  • Codeless UI

  • Direct connection mode

Alert center

  • Adds built-in message template options.

  • You can configure recipient variables to dynamically update recipients.

Message templates

Task Hub

Adds space information, such as the project and business module, to the details of data table permission approval tasks.

View and process tasks

January 2023

Beijing and Hangzhou: Released on January 12, 2023.

Shanghai and Shenzhen: Released on January 18, 2023.

Feature Name

Description

References

Quick start

Follow the quick start guide to quickly understand Dataphin's features and experience the basic process of data model construction.

Dataphin quick start

Built-in model

Dataphin has a built-in retail industry data model that you can quickly import to experience the model construction feature. Note: Only new customers who activate the Dataphin service after the upgrade can use the built-in model.

Import a pre-built model

Browser compatibility

Adds support for QiAnXin Browser.

Limits

Data sources

  • Adds support for GBase 8a, TiDB, KingBase, and GoldenDB data sources.

  • Kafka data sources support connectivity tests.

Data modeling

The process of creating and editing logical tables is now wizard-based to improve the user experience.

Offline Integration

  • Adds five new data source reader and writer components, including GBase 8a, TiDB, KingBase, GoldenDB, and IMPALA.

  • The API component supports outputting Chinese field names.

  • The Teradata reader and writer components do not support selecting views.

  • Adds a function to convert binary types to hexadecimal strings.

  • Offline integration tasks support automatic dependency parsing.

    • Supports automatically parsing and adding upstream physical nodes or logical table nodes.

    • Supports automatically parsing and adding task output names.

    • Supports one-click addition of the root vertex as an upstream dependency node.

Offline development

  • Spark tasks on the MaxCompute engine support accessing logical tables and include permission verification.

  • Supports custom configuration of lineage for non-SQL tasks, including input and output tables and fields, which are displayed on the lineage page.

  • Adds a running record feature, which provides functions such as viewing code, viewing execution logs, stopping runs, and downloading results.

Real-time development

  • Optimizes the creation and use of metadata tables.

  • Flink SQL nodes support SQL Hints.

  • Stream-batch unified tasks under the Alibaba Cloud Realtime Compute for Flink engine support binding to different compute sources.

Publish

Supports enabling release approval in project settings and supports custom approvers.

  • Managing Publish Tasks

  • Create a project

Basic O&M

Supports configuring tenant-level task timeout, auto retry count, and retry interval.

Data Catalog

Data table lineage now displays custom lineage and supports viewing the output task type corresponding to the lineage relationship.

  • Data catalog

  • Asset details

Asset Permission

Supports table-level permission requests, grants, and authentication to improve the efficiency of permission requests and grants.

Request, renew, and release table permissions

Data Standard

  • Optimizes the creation and use experience of data standard sets. Standard set properties support configuring default values, visibility scopes, and approval templates.

  • Supports viewing all standards and allows searching across standard sets.

  • Supports viewing the execution record list of standard mapping rules and the detailed standard mapping results of a single task execution.

  • Adds common standard set properties.

  • Adds support for custom standard approval process templates.

Asset Quality

  • Adds support for data source connectivity and schema change detection for more data sources.

  • Global asset quality now supports more data sources, including IBM DB2, AnalyticDB for PostgreSQL, Hologres, ClickHouse, DM, and HANA.

Asset Security

  • Recognition rules support manual execution and updates.

  • Adds a display for security recognition tasks, allowing you to view historical recognition task results and field recognition details.

  • Create, configure, and manually trigger recognition rules

  • Manage recognition records

DataService Studio

Call examples: Supports call examples and operational instructions for the Python language.

Call a Dataphin data source