Data Integration is a simple and efficient data synchronization platform built on Dataphin. It provides powerful data pre-processing capabilities and high-speed, stable data synchronization between various disparate data sources.
Get started in 5 minutes
Background information
As big data applications expand across industries, data integration faces increasing demands. These demands include the ability to efficiently configure sync tasks for numerous data tables, integrate multiple disparate data sources, perform light pre-processing on data, and optimize data sync tasks with features such as fault tolerance, speed limits, and concurrency.
Function overview
If you purchased Dataphin after April 2020, the data synchronization feature has been upgraded to Data Integration.
Dataphin has upgraded its data integration capabilities to help you build a simple, efficient, secure, and reliable data synchronization platform:
You can improve data integration efficiency using full database migration to quickly generate batch sync tasks and create destination tables with one click. When you sync data to MaxCompute, you do not need to manually create tables. For more information, see Configure an integration task by migrating an entire database.
You can use the Flow and Transform components to pre-process data from a data source. Pre-processing includes traffic scrubbing, transformation, field masking, calculation, merging, distribution, and filtering. For more information, see Create an integration task from a single pipeline.
You can use Dev-Prod and Basic developer patterns based on your needs.
You can quickly sync logical tables created in Dataphin to a destination database.
You can create custom components that are not supported by the system to meet data synchronization needs for different scenarios. Relational Database Management System (RDBMS) components connect through Java Database Connectivity (JDBC). For non-RDBMS components, you must upload the JAR package.
Data Integration supports various component types. You can drag, configure, and assemble these components to generate an offline single pipeline. Data Integration also lets you quickly generate batch sync tasks. For full database migration, the source can be MySQL, SQL Server, or Oracle, and the destination must be MaxCompute. Data Integration also lets you create custom component types that are not supported by the system to meet your data synchronization needs.
Access Data Integration
Quick access (recommended)
On the Dataphin home page, click Data Import in the product path to quickly access Data Integration.

Standard access
On the Dataphin home page, choose Develop > Data Integration from the top menu bar to go to the Data Integration page.

Connect the data source to the Dataphin network
To synchronize data, you must establish a network connection between your data source and your Dataphin project. For more information, see Network connectivity solutions.
Scenarios
Scenario | Description | Instructions |
Build a sync task using a pipeline script | Develop a pipeline node based on an existing pipeline script to synchronize data. |
|
Build a sync task using an offline single pipeline | An offline data pipeline defines the source and destination data sources and datasets. It provides an abstract set of data entry, Outputs, Flow, and Transform components. This frame uses a simplified intermediate data transmission format to enable data transmission between data sources. |
|
Build a sync task using offline full database migration | Full database migration is a tool that improves user efficiency and reduces costs. It lets you quickly upload all tables from a MySQL, Oracle, or SQL Server database to MaxCompute. This greatly reduces the configuration and migration costs of the initial cloud setup. |
|
Build a sync task using a custom component | Data Integration lets you create custom components that are not supported by the system to meet data synchronization needs for various business scenarios. |
|