Feature Updates (2024)

更新时间:
复制 MD 格式

This topic describes the features released for Dataphin (fully managed) in 2024.

December 2024

Product version: V4.4

Beijing: Released on December 3, 2024.

Hangzhou and Shenzhen: Released on December 5, 2024.

Shanghai: Released on December 8, 2024.

Feature name

Feature description

Related documentation

Data source management

  • AnalyticDB for PostgreSQL data sources support high-availability IP configuration in JDBC URLs.

  • FTP data sources now show a prompt that SFTP key files must be in PEM format.

Announcement settings

Add system announcements. You can publish important notices to the current tenant as text (with optional links) or images for a specified time period.

Create and manage system announcements

Offline integration

  • Add performance diagnostics. Diagnose slow-running integration tasks using read/write duration, time-consuming operations, and garbage collection (GC) duration.

  • Add a run history list. Manage run histories for all offline integration tasks in the current project.

  • PostgreSQL, Oracle, Hologres, SAP HANA, and Microsoft SQL Server input components support cross-schema data reading. Schema is no longer required when registering data sources.

  • SelectDB, StarRocks, and Doris output components support JSON data format.

  • Filter component script mode supports parentheses and nested conditions.

  • Optimize API input components. The request body supports JSON input. Response results support object parsing.

  • Filter data sources in the integration task list by non-structured data sources such as FTP. Topic and index names are now searchable in table search.

  • When submitting or publishing integration tasks, verify that tables exist in the production environment. Also verify table consistency between development and production environments.

Offline development

  • After running a Drop table statement, choose whether to auto-generate a pending publish item for the dropped table. You can also delete pending publish items for dropped tables.

  • In SQL query results, select all column headers and navigate to any column.

  • Convert draft recurring tasks and one-time tasks into each other.

  • When publishing tasks, use the global view. When selecting pending publish items, the system prompts you about dependencies.

  • Add custom window syntax window for MaxCompute.

  • Database SQL now supports PostgreSQL, StarRocks, and ClickHouse data sources.

  • SQL compute tasks support resource group-based container allocation.

Real-time development

  • Flink VVP real-time development supports batch configuration of concurrency and Chaining policy updates for custom resources.

  • Improve error messages for Access Key verification in Flink VVP compute sources. Messages are now clearer.

Customize Ververica Flink real-time task resources

Data standards

  • Add a standard template library. The system includes common lookup tables based on national standards, such as those for people, economy, and regions.

  • Optimize parsing logic for the data classification column in bulk import templates. Classification is now supported if the value starts with a forward slash (/) or directly with the category name.

Resource governance

  • Jump quickly from the governance analysis page to view unresolved governance items for storage or compute projects.

  • In your governance list, select up to 50 objects for batch operations.

Metadata Center

Add big data storage sources: StarRocks, Hive (MySQL metadata database), and Hologres. These sources collect table, field, and partition information. You can find these tables in the asset inventory and Metadata Center, and publish them to the asset catalog.

Publish assets—metric assets

  • Define custom metrics during table publishing. Publish them with the table or separately.

  • Search for metrics in the asset catalog by statistical granularity, dimensions, and source tables.

  • Preview data and apply for permissions for custom metrics whose source tables are physical tables.

Manage quasi-assets

Publish assets

  • Add usage instructions. After enabling, view them in the asset catalog details to clarify asset content and value.

  • Enter custom attribute values defined in Planning > Attribute Management when editing assets or configuring auto-publish rules.

  • View execution details for auto-publish rules to check asset change records and assess rule configuration.

Data Service Studio

  • Add API cloning. Create new APIs by cloning existing ones.

  • Improve API publish and delete permission guidance. Update service project publish control settings for clarity. When publishing or deleting APIs, quickly revoke API permissions or unbind composite APIs.

  • Optimize the API development list. When the latest API version is published, editing automatically creates a new version. Version numbers are auto-generated and filled.

  • Add service log functionality. View call detail logs using specific filters. View up to 90 days of statistical logs.

  • Set storage locations for call detail logs and call statistics. The system deletes logs automatically based on total count and retention period.

  • APIs support downloading Open YAML files in Alibaba Cloud Model Studio custom plugin format and standard OpenAPI specification format.

Tag Factory

  • Adjust asset categories for published tags and audience groups in the asset marketplace.

  • For offline datasets and behavioral relationship source tables, tag and audience group offline service target tables have no restrictions on partition field names or formats. Users can define them freely.

  • Tag asset lineage now supports service node and service target table nodes. Jump from compute source tables to the asset inventory to view table lineage.

  • Support batch publish, unpublish, publish, and unpublish for project-level tags and audience groups. Optimize dependency detection interactions.

  • Batch remove applied tags and audience groups from projects and applications.

Cross-tenant publishing

Add cross-node parameter cross-tenant publishing.

Manage R&D pending publish objects

October 2024

Product version: V4.3

Beijing: Released on October 9, 2024.

Hangzhou and Shenzhen: Released on October 10, 2024.

Shanghai: Released on October 13, 2024.

Feature name

Feature description

Related documentation

Permission management

Added support for applying for and granting permissions on tables in MySQL and Oracle data sources.

Data source management

  • Data source connection tests can now return a 'Success with risk' status. Data sources with this status can be used only in Data Service Studio and Data Quality. They cannot be used in Data Development or Data Integration.

  • Elasticsearch data sources now support HTTPS connections and self-signed certificates.

Projects

When you delete a project, a list of existing objects is displayed to help you identify any blockers.

Delete a project

Offline integration

  • PolarDB-X input components now support the QuerySQL mode.

  • KingbaseES database strings now use the concat function instead of the || operator.

  • PostgreSQL output components support one-click table creation.

  • Elasticsearch input components support reusing existing index schemas.

  • The default row and column delimiters for StarRocks, SelectDB, and Doris output components are changed to uncommon characters to reduce conflicts with business data.

  • Local file input components now support Chinese and special characters. The limits on the file upload size and parsing size are increased.

  • For offline full-database migration tasks, you can now specify custom name prefixes and suffixes. You can also create and move files in system-generated directories.

  • Component validation is improved. The system now checks for inconsistent input structures from multiple sources to prevent dirty or misaligned data.

Offline development

  • The configuration of offline compute tasks is optimized. You can now configure runtime resources (CPU and memory) for non-SQL tasks. You can also configure priority settings for new tasks. The default priority can be configured globally.

  • When you publish offline physical tables, pending items for the same table are automatically merged.

  • For SQL tasks that reference global variables for accounts or passwords, you can hide the MaxCompute Logview URL.

  • Global variables for accounts and passwords can now be automatically filled during development and at runtime.

  • Built-in Python now supports Python 3.11.

Real-time development

  • SAP HANA metadata tables now support blob, clob, and nclob field types.

  • You can choose a single update time field of the timestamp type or enter an SAP HANA SQL time string expression, such as concat(column_date,column_time).

Create and manage metadata tables

Metadata Center

Four new metadata acquisition sources are added: AnalyticDB for MySQL, PolarDB-X, IBM DB2, and SAP HANA.

Create and manage metadata acquisition tasks

Asset inventory

  • Lineage graphs can now display only the direct lineage.

  • You can now view lists of data tables, metrics, and fields from a project perspective.

Data standards

  • The number of supported levels for standard directories, lookup table directories, and standard document directories is extended to 10.

  • When you edit a data standard, you can switch templates. If the property names, field types, and input methods match, the values are automatically filled.

  • You can now export up to 100 data standards in a single bulk operation.

  • For public system properties in standard templates, you can toggle the required status. The system automatically generates monitoring items based on whether property values exist.

Data Quality

You can now export custom SQL rules.

Data security

A built-in template library for data security is added. The library includes templates for industries such as energy and power, Internet of vehicles, and intelligent connected vehicles.

Import data classification from template library

Asset catalog

A metric object list is added. You can view metrics by subject or catalog.

Catalog management

  • Metric publishing management is supported. You can manage the publishing and unpublishing of metrics that are generated by standardized modeling. You can also configure metric tags, display names, and catalogs.

  • The auto-publish feature is added. Assets are published or their attributes are updated based on configured rules. This reduces manual effort.

  • You can view the change history of an asset, including attribute and status changes, for traceability.

Data Service Studio

  • The direct-connect data source mode now supports SelectDB data sources. APIs support SQL injection detection.

  • The Dataphin data source functionality is deprecated. Only users who previously created a Dataphin data source can continue to use it.

Tag platform

  • When you create metric-mapped tags, you can add dataset metrics in batches to streamline the creation process.

  • You can view the basic information and version information for audience groups on their detail pages.

Cross-tenant publishing

When you import deployment packages, you can use DDL replacement rules for offline physical tables. For example, you can replace the host or storage class in external table location URLs.

Import deployment packages

August 2024

Product version: V4.2.4

Beijing: Released on August 27, 2024.

Hangzhou and Shenzhen: Released on August 29, 2024.

Shanghai: Released on September 1, 2024.

Feature name

Feature description

Related documentation

Sales

  • Add Personal Edition and Agile Development Edition, both supporting database compute engines for diverse business scenarios.

  • Add Asset Operations module purchasing, available in Standard Edition and Basic Edition.

    • Standard Edition: Up to five asset subjects (excluding metric assets), supports catalog publishing approval and publishing.

    • Basic Edition: Up to two asset subjects (excluding metric assets), supports catalog publishing approval, and unlimited published assets.

Data integration

API input and output components support HTTP request retries.

Offline development

Add default scheduling time for recurring tasks in R&D platform settings. Choose random time within a range (to distribute server load and improve stability) or fixed time.

R&D platform settings

Data Quality

Export and download issue lists as Excel files for easy business reporting and archiving.

Add and manage issue lists

July 2024

Product version: V4.2

Beijing: Released on July 25, 2024.

Hangzhou and Shenzhen: Released on July 30, 2024.

Shanghai: Released on August 4, 2024.

Feature name

Feature description

Related documentation

Member management

Permissions are added for the Metadata Center and asset catalog features.

Global role management

Data source management

For custom data sources, you can now re-upload JAR packages and edit JSON configurations.

Resource scheduling

For compute and integration tasks in development projects, you can now configure scheduling resource groups for execution and preview in the development environment.

Offline development

  • Database SQL tasks now support Presto data sources.

  • When you create a database SQL task, you must specify a schema.

  • In Basic-edition projects, you can restore or permanently delete compute tasks from the recycle bin.

Basic O&M

The project-level monitoring and alerting feature is optimized. You can configure separate incomplete alerts for daily, weekly, and monthly tasks, and for hourly and minute-based tasks.

Configure project monitoring and alerting rules

Offline integration

  • openGauss output components support one-click table creation.

  • Relational database pipeline tasks support using datetime fields as split keys to increase the concurrency of data import.

Metadata Center

The Metadata Center module is added. This module lets you manage tasks, instances, and business systems, and view the metadata inventory. You can extract, process, store, and manage metadata from business systems to support data governance and improve internal data organization, retrieval, and analysis.

Metadata Center

Asset inventory

  • In the asset inventory search box, you can switch between Dataphin assets and business system assets that are imported from data sources.

  • You can view recently browsed, favorited, and used assets in My Footprint. You can go to the Personal Data Center to view all your favorited assets.

  • Dataphin assets include compute source tables, metrics, functions, projects, data sources, and APIs.

    • The metrics list now supports navigation by data domain or subject area for quick switching.

    • Logical table field details now show the remarks that are configured during development.

  • Business system assets include assets that are imported from data sources.

    • You can view imported data source tables by source data source and assigned business system.

    • You can filter data source tables by source data source, schema, and business system.

    • You can view data source table details, including attributes, fields, lineage and impact, and quality overview (production tables only).

    • Quick actions are available. You can view DDL statements, generate SELECT statements, report quality issues (production tables only), export fields, and apply for data source permissions.

Data security

  • First-level data classification directories support setting administrators and managing classification details. This lets you delegate classification management and protect sensitive configurations.

  • Dynamic classification range selection is added. You can select all classifications or a specified directory and its subdirectories.

    • All classifications: The selected data range is matched against the global data classifications each time.

    • Specified directory: The data range is matched against the latest classifications under the selected directory and its subdirectories each time. New classifications do not require manual rule updates.

Asset catalog

The new asset catalog feature provides an entry point for listed assets. You can use it to search for data and view asset details.

Catalog management

The original asset subject management feature now supports multiple subjects and multi-level directories for each subject. You can manually publish assets to different catalogs and configure display names and view permissions to improve data discovery and reduce the costs of understanding asset operations.

Manual publishing

Tag platform

  • The offline tag services and audience group offline services are optimized:

    • openGauss target data sources are added. You can generate tables with a single click and export code names and tag values.

    • You can backfill data for running offline services. By default, T-1 data is backfilled to ensure immediate execution after creation.

  • Automatic task categorization is supported for more user-friendly task management:

    • Tasks are automatically categorized by owner and purpose.

    • The system automatically expands directories when the task count exceeds the threshold.

  • The tag lookup table capabilities are enhanced. Composite tags support auto-generated lookup tables:

    • Composite tags can automatically generate lookup tables based on hierarchical definitions.

    • The asset marketplace displays the distribution of composite tags by lookup table.

    • Offline service exports support exporting lookup tables for composite tags.

Personal Data Center

  • You can view all the assets that you favorited in the asset inventory, including data tables, APIs, metrics, and standards.

  • You can view all the table assets that you own.

Cross-tenant publishing

Cross-tenant publishing is added for behavioral relationships, behavioral preference tags, and behavioral statistic tags.

Manage tag pending publish objects

Maintenance/upgrade

During cross-tenant publishing, you can perform operations that do not affect production data. You can access the O&M, Publish, Analyze, Asset Inventory, and Permission Management modules. You can also submit integration tasks and development objects in development projects. Publishing is not allowed.

Maintain/upgrade Dataphin

Filename conventions

The filename convention feature is added. You can validate file prefixes and suffixes for files that are uploaded to Dataphin. Files must meet both the prefix and suffix requirements to be uploaded.

Manage filename conventions

June 2024

Product version: V4.1

Beijing: Released on June 13, 2024.

Hangzhou and Shenzhen: Released on June 20, 2024.

Shanghai: Released on June 23, 2024.

Feature name

Feature description

Related documentation

Global

  • Menu navigation optimization: You can add menu items to your favorites for quick access. You can also directly view second-level navigation menus on the homepage. The top navigation bar is upgraded.

  • Global visual optimization: All pages except Data Development and Data Service Studio now use a light theme instead of a dark theme. The lighter colors help reduce eye strain, focus your attention on the main content, and improve efficiency.

No documentation

Data source management

  • FTP data sources support the PORT (active) and PASV (passive) modes when authentication is set to FTP or FTPS.

  • Elasticsearch data sources now support Elasticsearch 8.x.

  • Oracle 19c data sources support the ZHS16GBK character set.

  • DM data sources support multiple IP addresses when load balancing is enabled.

Offline integration

  • For MaxCompute source databases in full-database migration, partition filtering is now optional.

  • Relational database input and output components support configuring the number of batch reads.

  • DM output components support one-click target table generation, including the table name, field types, and precision.

  • Filter components now support the NOT LIKE operator.

  • The table selection for full-database migration is optimized. You can enter table names in bulk to automatically select exact matches or show fuzzy matches for batch selection.

Offline development

  • Code search is upgraded:

    • The entry points for code search are optimized. Entry points are added to the navigation tree, task list, and global search.

    • You can search for code across all projects for which you have permissions.

    • Code search results show more details and support filtering.

  • The code editor experience is optimized. You can hover over project names that are formatted as variables to view table information. The system displays domain or project names that are formatted as variables and their associated table names. The system also displays the role names of logical table dimensions. Project or domain names in SQL compute tasks are automatically formatted as variables.

  • The merge partition syntax is added.

  • The following built-in public calendar functions are added:

    • dpc_last_workday: Obtains the most recent workday for a specified date.

    • dpc_last_label_date: Obtains the most recent date that contains a specified label.

    • dpc_is_labelled: Checks whether a specified date has a given label.

    • dpc_is_workday: Checks whether a specified date is a workday.

    • dpc_last_workdays: Obtains a list of the N most recent workdays for a specified date.

    • dpc_workdays: Obtains a list of workdays in a specified date range.

    • dpc_last_labelled: Obtains a list of the N most recent dates that contain a specified label.

    • dpc_labelled: Obtains a list of dates that contain a specified label in a date range.

    • dpc_last_multi_labelled: Obtains a list of the N most recent dates that contain a specified set of labels.

    • dpc_multi_labelled: Obtains a list of dates that contain a specified set of labels in a date range.

Modeling development

Cascading updates to subject areas are now supported for downstream logical tables, atomic metrics, business filters, and derived metrics.

Basic O&M

  • The scheduled run times for backfill tasks now support the end of month option.

  • You can configure retention periods for instances and operational logs. You can also configure periodic cleanup of backfill workflows and one-time instances.

Data standards

  • The standard change subscription feature is added:

    • You can subscribe to changes for four states: effective, invalid, unpublished, and deleted. Notifications are sent using internal messages and email.

    • You can subscribe to change notifications in bulk from the data standards list.

    • You can select tables in bulk from the Personal Data Center on the tables I own tab, and then subscribe to change notifications for related field-mapped data standards.

    • You can view push records in the Alert Center.

  • Standard templates now support the configuration of standard code generation rules:

    • You can generate rules using auto-increment sequences, fixed strings, and standard set codes. The rules are more standardized, which reduces manual configuration and management costs.

    • After you enable code rules, you can manually enter codes. You can also enable strict validation for custom codes against the rules to balance flexibility and standardization.

    • After you update code rules, you can batch-correct previously generated standard codes using the latest rule. You can also view correction records.

  • The configuration of standard attributes is optimized:

    • For range-value attributes, including value-domain attributes, you can defer the selection of the range type until standard creation. Standards that are created from the same template can use different range types for greater flexibility.

    • You can reference system attributes, such as data domain, project, user, and user group. This enhances global adaptability and unified management.

  • The bulk import and export of data standards are optimized:

    • For new bulk imports, you can pre-configure the standard set, owner, and effective time to populate templates and reduce input effort.

    • You can choose how to import standard code values. You can use the values in the configuration file or regenerate values using code rules, which ignores the file values.

    • No limit is imposed on the number of standards that can be exported at a time. You can download the generated files from Batch Operations Records—Bulk Export Records.

  • Associated attribute information is added to lookup tables that are linked to data standards. Lookup tables can be associated with the values of standard attributes.

Data Quality

  • The bulk creation of quality rules is optimized:

    • The field list shows the field type.

    • You can filter monitored objects by resource owner.

    • You can configure the quality owner and quality score weight for monitored objects in bulk.

  • The quality monitoring granularity for custom SQL rules now supports the entire table or specific fields.

  • Quality rule templates support setting parameters to resolve errors in large-data scenarios. The configured templates apply to all rules that are created from them.

Data Service Studio

  • APIs that are created using the wizard mode for service units now support sorting.

  • For List-type API requests, you can enable result pagination. If you enable this feature, you must specify a sort field to ensure stable query results. If you disable this feature, you can set pagination parameters on the test page.

  • When you submit APIs that are created using service units, the system validates whether the service units of the referenced fields exist. At submission, the system validates whether the referenced fields exist in the production environment. When you submit APIs that are created using data sources, the system validates whether the referenced fields exist in the development data source. At submission, the system validates whether the referenced fields exist in the production data source.

Tag Factory

  • After you publish offline datasets, you can modify metrics and change the source fields for existing metrics.

  • Offline datasets support one-click backfill for downstream tag data.

  • Offline datasets and behavioral relationships support scheduling using public calendars and cross-node parameters as condition parameters.

  • Behavior relationships let you configure lookup tables for entity attributes, behavior attributes, and object attributes. After you configure these lookup tables, you can use them to process and preview behavior preference tags and behavior statistical tables.

  • The processing time windows and date filters for datasets support public calendar filtering. The behavioral time for preference and statistic tags supports public calendar filtering.

  • One-click table creation is supported for tag and audience group offline services under compute sources.

  • You can edit published and unpublished tag and audience group offline services. You can switch output tables and output tags. You must re-publish the services after editing.

  • When you export tags from tag and audience group offline services, if the tags have lookup tables, you can choose whether to export code descriptions for easier downstream system calls.

Cross-tenant publishing

  • One-click publishing is added. The system publishes pending objects in sequence.

  • Quality rule publish objects now show business attribute information in the object details and version comparison details.

May 2024

Product version: V4.0

Shenzhen: Released on May 16, 2024.

Beijing and Hangzhou: Released on May 21, 2024.

Shanghai: Released on May 26, 2024.

Feature name

Feature description

Related documentation

Tag Sale

You can purchase more tags. Different Tag Factory editions support different maximum tag counts:

  • Trial edition: 10 tags by default. No additional tags are allowed.

  • Basic edition: 50 tags by default. No additional tags are allowed.

  • Standard and Premium editions: 300 tags by default. You can add 500, 700, 1,000, or 1,500 tags.

Custom resource groups

Custom resource groups are added. You can divide your tenant's scheduling resources into isolated groups based on a specified ratio. You can then assign different scheduling resources to tasks and specify dedicated resource groups for temporary O&M scenarios, such as backfilling.

Global role management

  • You can create custom global roles to control user access to modules and features within modules, and to control view or manage permissions.

  • You can enable or disable built-in global roles.

Global role management

Python third-party packages

You can install Python modules online using Python mirrors.

Data sources

  • SelectDB data sources are added for offline integration.

  • SAP Table data sources support load-balanced addresses.

Offline integration

  • Data classification and grading information is displayed when you use data that is stored in compute engines as source data.

  • You can switch offline pipelines to the offline script mode or clone them as scripts. This improves the efficiency of script-mode configuration. You can also directly switch components to the script mode. You cannot switch back from the script mode.

  • Elasticsearch input components support reading documents using index aliases.

  • MongoDB input components support converting Document and Array field types to JSON output.

  • FTP input components support scheduled checks for completion marker files. You can also split strings by a specified field length.

  • FTP output components can write single files in the ZIP format. You can configure whether to compress file paths.

  • Integration components are optimized:

    • Output components without metadata can automatically generate fields from the field information of input components.

    • You can add frequently used components to your favorites and pin them to the top.

    • You can search for components using common aliases.

    • An unpublish action is added to the offline integration task list.

    • Encryption components provide a shortcut to create keys.

    • You can copy prompt text.

Offline development

  • Database SQL tasks now support AnalyticDB for PostgreSQL data sources with data-source-level authentication. You can also execute stored procedures.

  • The features of offline compute tasks now show dependent global variables, public calendars, resources, and Python third-party packages.

  • The features of offline physical tables now show the tasks that read the table.

  • Conditional scheduling supports using fiscal month, week, and day from fiscal calendars as scheduling parameters.

  • When you publish offline physical tables to the production environment, MaxCompute tables skip lifecycle validation against the development environment. External tables skip path validation against the development environment.

  • SQL query results are optimized. You can hide result columns, view single-row details, split query logs by statement, and disable result copying.

  • The split_size hint and compact syntax are added for MaxCompute. The show create table <domain_name.logical_table_name> and show partitions <domain_name.logical_table_name> syntaxes are supported.

Modeling development

  • When you create or edit a logical table, you can customize the system-generated name of the logical table. Naming compliance is not enforced.

  • You can configure consistency validation for the calculation logic and field types in dimension and fact logical tables. You can also disable strong consistency validation.

  • For dimension and fact logical tables, you can view field-level downstream dependency tasks in Related—Downstream Dependencies.

Real-time development

  • Real-time tasks support writing data to OSS data sources.

  • Flink batch tasks support configuring the number of retry attempts and the retry interval after real-time instance failures.

  • The state transitions of real-time tasks are optimized to prevent deadlocks and unresponsive tasks.

Create a Flink SQL task

Basic O&M

  • For backfill tasks, you can now select nodes by name to specify the backfill scope.

  • The massive backfill mode supports up to 5,000 nodes. The list mode supports filtering by node type and adds a node ID column.

  • Global configuration supports automatically retrying failed tasks after a timeout.

  • The backfill of logical tables is optimized to reduce complexity and improve performance.

Asset catalog

  • The field list is optimized. You can now filter by project, table, and table owner.

  • Identifiers for internal and external tables are added to the physical table storage type.

  • The actions on the asset detail page are optimized:

    • View DDL statements: You can view the DDL statements for different data sources. This is useful for quickly creating tables that have a matching structure in integration targets.

    • Generate SELECT statements: You can customize escape characters.

Data standards

  • The standard document feature is added:

    • You can upload documents in formats such as PDF, DOC, PPT, and Excel.

    • You can group documents for management.

    • You can associate standard documents with data standards.

  • Intelligent recommendation for mapping relationships based on identification features is added:

    • Identification features support both the intelligent recommendation of data standard mappings and classification and grading tagging. Feature scanning affects both the mapping rules of the standard module and the identification rules of the security module.

    • Intelligent matching by identification feature is added. When you create mapping rules, the system recommends mappings based on feature definitions and the content or metadata values of asset objects to improve configuration efficiency.

Data Quality

  • The manual entry of quality issues is added:

    • You can report quality issues from asset or tag detail pages. Alternatively, in Quality—Governance Workbench—Issue List, you can select the asset and describe the issue.

    • You can view auto-detected and manually entered issues in the issue list. You can also initiate and track rectification in a unified manner.

    • You can categorize quality issues and manage issue types.

    • The rectification process now shows operation records.

  • Custom quality rule attributes are added. When you create rules, you can fill in business attributes to enrich the rule information.

    • You can configure the attribute name, required status, input method, and search and filter options. You can also enable or disable attributes.

    • You can batch modify the business attributes for quality rules. You can append or overwrite existing values.

  • The scheduling of Dataphin table quality rules is optimized:

    • For timed scheduling, you can click a button to fill in the recommended time based on the average output time of the monitored table.

    • For fixed task-triggered scheduling, you can click a button to fill in the recommended task based on the lineage of the monitored table.

  • The quality alert time and push records are optimized. Alert reasons now show rule names for quick problem identification.

Tag Factory

  • For manual datasets, tags, audience groups, and offline services, you can specify business dates at runtime.

  • You can process statistics-based preference tags. You can use a behavioral relationship attribute as the statistical object and select another attribute as the tag value based on the results.

  • Behavioral tags now support count distinct processing.

  • Offline views are now referred to as offline datasets for better understanding.

  • The asset marketplace experience is optimized:

    • The shopping cart provides a batch add button and supports clearing the cart.

    • You can quickly filter tags by entity.

    • The tag and audience group detail pages are optimized. Key information is displayed at the top.

Cross-tenant publishing

  • Cross-tenant publishing is added for global variables and quality rule attributes.

  • When you import and export deployment packages, you can search for data objects in batches by ID or name.

April 2024

Product version: V3.14

Beijing: Released on April 16, 2024.

Shenzhen and Hangzhou: Released on April 18, 2024.

Shanghai: Released on April 21, 2024.

Feature name

Feature description

Related documentation

Table permissions

  • Domain architects and project administrators can now grant or revoke table permissions in batches.

  • Super administrators, domain architects, and project administrators can grant and revoke table permissions at the project and domain levels for production accounts. After the permissions are granted, the production account has permissions on all tables in the project or domain, including new tables.

Data source permissions

Users who have execution permissions on a data source can create, run, and analyze database SQL tasks that are based on that data source.

Apply for, renew, or return data source permissions

Style configuration

Page watermark settings are added. You can customize the watermark content and style, including the font size, bold, rotation angle, font color, and spacing.

Watermark settings

Task Hub

You can now approve tasks in batches.

View and process tasks

Python third-party packages

Global management for Python third-party packages is added. You can install a package once and reference it multiple times to improve the development efficiency of Python tasks.

Install Python modules

Data sources

  • Kafka data sources support bidirectional SSL encryption with no authentication or with username and password authentication.

  • Dameng data sources support SSL encryption.

Offline integration

  • The FTP output components are optimized:

    • The number of written files is not tied to concurrency. You can choose to write a single file or multiple files.

    • If the load strategy is overwrite or file conflict error, single files have no suffix. Multiple files can have sequential suffixes (_0, _1, _2) or universally unique identifier (UUID) suffixes.

    • If the load strategy is append, single or multiple files can have only UUID suffixes.

    • You can customize the content of completion marker files. File-level and task-level marker files are supported.

    • The supported file-level parameters are $filename, $filenamewithpath, $filesize, and $rowcount.

    • You can schedule tasks using global, cross-node, and pipeline parameters.

  • MySQL, AnalyticDB for MySQL 3.0, and TiDB output components support ON DUPLICATE KEY UPDATE for some fields.

  • OSS and Amazon S3 output components support choosing whether to output field names as the first row.

  • The daily sync mode for full-database migration supports creating and writing to non-partitioned tables.

  • For FTP sources in full-database migration, file templates support more configurations and read control options, for example:

    {
     "textReaderConfig":{
     "caseSensitive":true,
     "useTextQualifier":false,
     "textQualifier":"\\",
     "trimWhitespace":false
     }
    }。
  • Greenplum output components support one-click table creation.

  • PostgreSQL is added as a source database for full-database migration.

Offline development

  • Scheduling and ad hoc queries are now supported for database SQL tasks that are based on MySQL and Oracle data sources.

  • The run function in the code editor supports running with default saved parameters to reduce test clicks. A run with parameters function is added to reset parameters before running.

Modeling development

  • For logical tables, you can now customize the names of partition fields and the date formats for date-type partition fields.

  • Aggregate logical tables support setting parameters that apply to all their derived metrics.

  • You can configure an independent scheduling cycle and conditional scheduling for individual derived metrics.

  • The lists for logical tables, atomic metrics, and business filters are optimized to show more information and provide more filters. For logical tables, you can batch submit tables, batch modify run parameters, and batch modify dependencies.

  • The English name configuration for atomic metrics and business filters now matches Chinese descriptions with pre-configured root words from data standards. You can select recommended root words for the English names of business entities.

Real-time development

  • Real-time tasks can now reference global variables to prevent plaintext passwords from being exposed in DDL statements.

  • The development of real-time tasks is optimized:

    • Flink compute sources support multi-level resource queues.

    • The details and version comparison of real-time tasks are optimized. You can compare two historical versions.

    • The precompile and test permission checks are optimized. You can validate all unauthorized objects at once.

    • The validation process, items, and results are shown during task submission.

Create a Flink SQL task

Basic O&M

  • In the Operation Center, you can operate on cross-project nodes in a Directed Acyclic Graph (DAG) without switching projects if you have the required permissions.

  • Real-time O&M adds a completed run status.

  • You can modify the O&M owner for a single metric.

  • Refreshing the backfill instance list no longer collapses expanded instances.

  • In instance statistics, clicking the materialized node name of a logical table redirects you to the corresponding logical table node.

  • Abnormal statistics exclude virtual nodes and logical table control nodes.

Asset catalog

Asset subject directories can be sorted by name. The asset catalog supports filtering physical tables and fields by data domain and subject area.

Dataphin asset details

Data standards

  • The lookup table directory feature is added. You can categorize lookup tables into a directory structure with up to five levels.

  • The lookup table list supports edit and view modes for different roles.

  • You can import and export lookup tables and root words in batches using Excel files.

  • The approval configuration for publishing and unpublishing standards is upgraded:

    • Standards follow the approval settings of the standard set for publishing and unpublishing. If this feature is enabled, you can configure separate approval templates for each process.

    • You can choose whether to merge approval tasks in batches:

      • Merged: You can select multiple standards for a single approval. All standards are passed or rejected at once.

      • Split: Each standard generates a separate approval task for individual review.

  • The batch operations for data standards are optimized. You can unpublish and delete standards in batches. You can also view the records and details of batch operations, including successful, failed (with reasons), and skipped (with reasons) objects.

  • You can view standard sets by standard set directory.

  • You can export mapping relationships to Excel in batches.

Data Quality

  • The alert configuration for quality monitoring is upgraded:

    • You can configure different alert methods for different rules to distinguish between alerts. The alert scope can be all rules, all strong rules, all soft rules, or custom rules.

    • You can set an alert effective policy. The first matched alert configuration takes effect, or all alert configurations take effect.

      • First matched: The configured alerts are sorted. Only the first matched alert takes effect.

      • All configurations: All alerts in the list apply to the quality rules of the monitored object.

  • The feedback for batch importing quality rules from Excel is optimized. You can choose whether to import duplicate records.

Data security

  • Identification rules now support the automatic inheritance of upstream classification and grading from the data lineage. When combined with default masking policies, this ensures that inherited results trigger masking algorithms, which improves security. The management of identification results is optimized. When you modify a classification or grading, the system automatically provides recommendations based on identification records.

  • Automatic inheritance from lineage supports two scenarios: identification rule runs and lineage updates. It also supports two inheritance rules: single result and multiple results.

  • Identification rules support batch running only effective rules or all rules. You can choose whether to trigger auto-inheritance tasks.

  • The identification result details are optimized. You can quickly view effective and other results. You can also set a specific record as effective, modify a record to the system recommendation with a single click, or quickly assign a data classification for results that have only a grading from lineage inheritance.

  • The identification result list is optimized. The identification method is shown. You can also edit identification results.

  • Data classification supports quick referencing of preset models in the effective model list. The preset model list supports quick selection of a classification or grading to add to the effective models.

  • The feedback for batch uploading identification results from Excel is optimized. Online duplicate records and import validation errors are separated. You can choose whether to import duplicates.

  • Security algorithms add Format-Preserving Encryption (FPE) and decryption.

Tag Factory

  • Offline datasets now support lookup table configuration. When you create an offline tag, the corresponding tag field is automatically matched with the lookup table. When you use this tag for downstream filtering, the name of the code value is displayed.

  • The behavior time of behavioral relationships supports the date and text data types. For the text data type, you can set a date format for conversion.

  • The usage statistics on the tag detail page support filtering the call history by time range and show the top 10 calling applications. The call statistic metrics are optimized:

    • Total calling applications: The unique count of online and offline calling applications in the specified time range.

    • Online calling applications: The unique count of real-time query applications that call the tag in the specified time range, excluding marketplace tag tests.

    • Offline calling applications: The unique count of tag offline service applications that reference the tag in the specified time range.

    • Total calls: The total number of online and offline calls in the specified time range.

    • Online calls: The number of real-time queries for the tag by applications in the specified time range.

    • Offline calls: The number of successful instances of tag offline services that reference the tag in the specified time range.

    • Reference count: The number of times a published tag version is directly referenced by other tags.

  • For numeric tags without custom intervals, distribution statistics are automatically calculated for distribution groups and intervals based on the tag value distribution.

  • The composite tag and offline service lists are optimized. An "All" tab is added to quickly select all available tags. In the applied tags list, selecting a parent category shows all tags in that category and its subcategories.

  • You can search for tags by description. You can create tag marketplaces without project dependency. You can also bind projects to multiple marketplaces, including public and private ones.

Analysis platform

Notebooks and SQL queries can now access data in MySQL and Oracle databases. You can also run SQL scripts.

Data Service Studio

  • In the direct-connect data source mode, the SQL mode configuration now supports setting request parameters in SQL functions.

  • If the SQL mode is Advanced SQL and you are parsing SQL parameters, you can choose whether to keep manually configured parameter information.

  • You can specify the storage location for API cached data. The data can be stored in Dataphin's system Redis, a specified Redis data source instance, or in-app storage.

Cross-tenant publishing

  • For objects related to data standards, you can now view details and compare versions. These objects include standard set directories, standard sets, public standard attributes, root words, and lookup tables.

  • Cross-tenant publishing is added for lookup table directories, public calendars, offline code templates, and offline physical table objects.

  • When you export tag objects in a deployment package, you can also export view dependencies. Publishing supports the automatic granting of permissions.

February 2024

Product version: V3.13

Beijing and Shenzhen: Released on February 27, 2024.

Hangzhou: Released on February 29, 2024.

Shanghai: Released on March 3, 2024.

Feature name

Feature description

Related documentation

Q&A

Add ticket-based support. Quickly access the support bot or submit a ticket from the bottom right of the page to get your questions answered.

Support

Compute settings

If no compute source is created, you can modify the MaxCompute region and network connection type in the compute settings.

Set MaxCompute as the compute engine for a Dataphin instance

R&D platform settings

  • Add exclusive edit lock configuration. When enabled, other users cannot take over the lock while a task is being edited. The lock holder must release it, reducing issues of lost or overwritten changes.

  • Supported R&D objects: integration tasks, real-time compute tasks, offline compute tasks, logical tables, offline physical tables, real-time tables, real-time/offline compute templates, and ad hoc queries.

R&D platform settings

Project role management

In custom project roles, a role with Member Management—Edit permission cannot configure a user as a project administrator (including themselves).

Project role management

Member management

Optimize one-click owner transfer. Support transferring data table owners, task O&M owners, and monitoring owners.

Data source management

  • Add Amazon S3 and TDengine data sources.

  • Supported Oracle versions now include Oracle 11g, 12c, 18c, 19c, 21c, and 23c.

  • Some data sources support configuring database connection timeout and retry attempts. Offline integration tasks can also have task-level retry attempts configured. This applies to offline integration tasks and enterprise-wide data quality monitoring rules. Supported data sources include MySQL, PolarDB-X (formerly DRDS), PolarDB, AnalyticDB for MySQL 2.0, AnalyticDB for MySQL 3.0, TiDB, GoldenDB, StarRocks, PostgreSQL, AnalyticDB for PostgreSQL, GreenPlum, Microsoft SQL Server, Vertica, SAP HANA, IBM DB2, OceanBase, ClickHouse, Dameng, KingbaseES, GBase 8a, and Doris.

  • MySQL data sources add RDS MySQL version, using MySQL 8.0 driver, supporting OpenSSL 3.0, and compatible with MySQL 5.6 and 5.7.

  • Elasticsearch data sources support HTTPS protocol URLs.

Offline integration

  • Add integration task list. Support batch submit, batch unpublish and delete, batch schedule/dependency/parameter/runtime configuration, batch move directory, batch transfer developer, and batch acquire lock.

  • Add Amazon S3 input and output components.

  • Script mode input/output components character limit increased to 500,000.

  • Filter components add START WITH and END WITH functions.

  • Support writing Timestamp data type fields to Lindorm (wide table).

  • Configure connections for each database at runtime for each task. Retry attempts configured in offline integration tasks have higher priority than data source configurations.

Compute tasks

  • Optimize circular dependency error messages during compute task submission. Show specific node names, IDs, and dependency paths.

  • MaxCompute now supports pivot and unpivot syntax.

  • Offline compute and integration tasks support editing node output names when there are no downstream dependencies.

Create a MaxCompute SQL task

Offline compute templates

  • Add SPARK_JAR_ON_MAX_COMPUTE and MAX_COMPUTE_MR node type compute templates.

  • When creating tasks from compute templates, the latest template version is referenced by default. If the template changes, SQL compute tasks require manual update of the referenced version. Other tasks will automatically switch to the latest version.

Create an offline compute template

Ad hoc query

Optimize ad hoc and analysis query results. Bigint and Decimal types now show full precision. Bigint supports the range -2^63+1 to 2^63-1.

Query and download data

Resources

Support using from dataphin import odps/hive in .py resource files, with the same conditions and limits as in Python tasks. This statement supports reading logical tables and performing table authentication.

Upload and reference resources

Offline and modeling development

  • Optimize logical table data preview variable parameter values in development environment. Local variables default to configured parameter values. Date/time global variables take the latest value.

  • Optimize auto-fill logic for parameter run values when running compute tasks and previewing logical table data in development environment. Modified run values are retained until changed again or cache is cleared. Add quick restore to default values.

  • Add cross-node parameter feature. Python, SQL, and Shell compute tasks can assign values to one or more variable parameters and pass them directly to dependent downstream nodes, enabling variable value transfer between upstream and downstream tasks.

  • All offline task types (compute, integration, logical table) can receive cross-node output parameters from direct upstream tasks. Assign cross-node parameter values to local variables and determine scheduling based on cross-node conditional scheduling.

  • Optimize atomic metric calculation logic duplicate check. If calculation expressions are the same but event times (statistical period identifiers) are different, the logic is not considered a duplicate.

  • After creating logical tables and metrics, the creator is granted query permissions by default.

Real-time development

  • Flink CDC extraction from MongoDB supports filtering delete operations.

  • Flink SQL tasks can directly access tables in Oracle and StarRocks data sources using data source codes.

  • Flink SQL tasks support referencing sample code for quick task creation, including CDC real-time data synchronization to data lakes or warehouses, and Kafka real-time data processing.

  • Add real-time compute task list. Support batch submit, batch unpublish and delete, batch runtime configuration, batch move directory, batch transfer developer, and batch acquire lock.

  • Real-time compute tasks support configuring null values for set parameters.

Create a Flink SQL task

Publish Center

  • Publish approval task details now include project, data domain, developer, and O&M owner information.

  • In publish validation details, if permission check fails, clicking Apply for Permissions redirects to the permission application page and auto-selects only the required fields and permission types.

Manage publish tasks

Basic O&M

  • Backfill instance summary run status statistics ignore paused status. A separate prompt appears if there are paused instances.

  • Recurring/backfill instances support sorting by instance name (recurring only), scheduled run time, start time, end time, and duration.

  • Support batch running of one-time tasks.

  • Support batch modification of logical table O&M owners and priorities in the O&M list.

  • Optimize list summary information when expanding DAGs for recurring and backfill instances. Show instance name, business date, and run status.

  • When project compute source is a Flink VVP project, support configuring real-time task monitoring and alerting.

Manage modeling task instances

Data standards

  • Mapping rule configuration for mapping relationships now supports the IN operator.

  • Data standards support configuring associated standards. Support auto-identifying associated lookup tables based on attribute values. View association information in the standard attribute drawer and jump to associated object details.

  • Standard template attributes support sorting. Support previewing attribute styles when creating new standards.

Data Quality

  • Add knowledge base feature. Record configuration issues and solutions to understand related quality problems, policies, and associate with quality rules.

  • Dataphin data tables and metrics support creating custom statistical metrics and data detail validation templates using custom SQL.

  • Support batch uploading quality monitoring rules based on custom SQL templates via Excel.

  • Timeliness—time function comparison template, for time difference configuration, supports two field validation methods: validation field - comparison field, and comparison field - validation field, for flexible configuration.

  • For instances that fail due to quality monitoring rule validation failure, view the failure reason in the Operation Center instance run log. For rules involving statistical metric comparison, the log directly shows the statistical metric value and the configured passing metric requirement.

  • In data table monitoring object scheduling configuration, validation partition is upgraded to validation range, which works with the rule's filter conditions to filter validation data and serves as the minimum viewing granularity for quality reports. Configurable validation range is expanded from partitioned tables to all data tables.

  • Optimize quality alert information. Support sending core metric data from quality rule validation:

    • Key metrics: Support sending rule validation configuration and actual validation results.

    • Complete metric: the specific metric values used for verification in the configuration of rules that support sending.

  • Optimize statistical metric display in validation record results. Display order is: current validation metric > actual calculated statistical metric value > intermediate process metrics for calculating statistical metrics, for more intuitive analysis of quality issues.

Data security

  • Data classification built-in models add a finance industry classification and grading template.

  • Configuring scheduled manual scan rules now shows an estimated number of tables to be scanned.

  • Support batch resetting of identification rule tagging results.

Tag Factory

  • Remarks and descriptions for tag-related objects are expanded to 1000 characters.

  • Rule-based composite tag hierarchical values support special characters.

Analysis platform—manual tables

  • Add manual table feature:

    • Quickly create manual tables by importing DDL statements or existing tables.

    • Edit manual table data online. Choose whether to sync changes to the compute engine and save.

    • Share manual tables with other users. View historical versions of published manual tables.

  • Notebook outline supports jumping to corresponding cells.

Data Service Studio

  • Advanced SQL supports dynamically specifying query return fields via parameters. Parameter names must start with var_cols_ (e.g., var_cols_xxx) and be included in the query statement. Pass the desired fields when calling the API to get dynamic results.

  • Advanced SQL requires defining complete API return parameters. If the query statement includes return fields but the API return parameters are not configured, the call will fail.

  • API test page supports viewing API script content to check if test results match the query logic.

  • Optimize API change and publish process:

    • If an API is not bound to an application and not referenced by a composite API, you can directly change its parameter configuration and republish.

    • Support configuring API publish control mechanisms at the project level. When an API change affects downstream usage, choose different control mechanisms based on impact scope and severity.

    • If an API is bound to an application or referenced by a composite API, republishing requires comparison with the online version. If there are changes like adding required request parameters, deleting request or return parameters, or changing request parameter data types, the system will determine if publishing is allowed based on the project's API publish control mechanism.

  • APIs created from service units, regardless of data source type, support a maximum of 10,000 returned records.

  • Add direct-connect APIs for TDengine data sources. When using basic SQL mode, parsed request parameters are required. Public request parameters for paging and sorting are unavailable; define them in the SQL.

  • Creating APIs with direct-connect data sources supports custom complex SQL. After re-parsing, decide whether to automatically overwrite manual configurations.

  • Optimize code comment prompts for service call example downloads and SDK downloads.

Cross-tenant publishing

  • Upgrade cross-tenant publishing settings:

    • Dev-Prod development projects support two methods: validate personal permissions, or ignore personal permission validation on submission. Production project permissions support two methods: validate production account permissions, or auto-grant permissions.

    • Publish validation supports two methods: validate operator permissions, or ignore permission validation.

    • For business objects going online, or R&D objects with publish approval or code review enabled, ignore approval and publish directly.

    • Data standard publishing supports two methods: ignore approval, or follow standard set configuration.

  • Data standard-related objects support cross-tenant publishing, including standard templates, standard sets, standard set directories, public standard attributes (custom), data standards, mapping rules, mapping relationships, lookup tables, and root words. If data standard monitoring configurations reference quality monitoring templates or security classification/grading, they can also be exported.

  • Asset security-related objects support cross-tenant publishing, including keys, identification results, and data classification and grading (including identification features).

  • When exporting integration tasks for cross-tenant publishing, if encryption/decryption components are referenced, key dependencies are also exported. During publishing, keys are filled and replaced based on dependencies.

  • When exporting logical table tasks, you can also export the association between fields and data standards.