Integration component library development instructions-Dataphin(Dataphin)-阿里云帮助中心

Prerequisites

To develop an offline pipeline, you must first create the corresponding development script. For more information on creating an offline pipeline script, see create an integration task through a single pipeline.

Offline pipeline component development entry

Navigate to the Dataphin home page and select Development -> Data Integration from the top menu bar.
To access the Offline Pipeline Component development page, follow these steps:

Choose a Project (Dev-Prod mode requires selecting an environment) -> click Batch Pipeline -> select and click the offline pipeline you wish to develop -> click Component Library.

Offline component library development instructions

Typically, a complete offline pipeline is composed of one or more Inputs, zero or more Transforms and Flows, and one or more Outputs.

On the development page for a single offline pipeline script, click Component Library in the upper right corner to reveal components such as Favorite, Inputs, Transforms, Flows, Outputs, and Custom components.

Favorite components

Click to access components favorited by your account in other component libraries, letting you quickly reuse frequently used components.

Input components

Input components define the data source. Select the component that matches your data type and drag it onto the pipeline canvas to start data ingestion. For details about each input component, see the configuration details for each component.

Input components are not compatible with ancestor nodes.
The descendant node of an Input can be a Transform, Output, or Flow.
When the Input component is connected to multiple descendant nodes, such as Outputs or Transforms, it is necessary to select a Data Sending Method for the Input component.
- Replication: The data from the ancestor node is replicated equally among the descendant nodes, with each descendant node receiving the full data set from the ancestor node.
- Round-robin Distribution: The data from the ancestor node is distributed in a round-robin fashion among the descendant nodes, ensuring the combined data of all descendant nodes equals that of the ancestor node.

Output components

Select the output component that matches your target data store and drag it onto the pipeline canvas. For details about each output component, see the configuration details of each component.

Output components are not compatible with descendant nodes.

Flow components

Dataphin provides two flow control components for data integration: throttling and conditional distribution. For details about each component, see the configuration details of each component.

Flow components cannot serve as the initial or terminal nodes in an offline pipeline; however, they can be positioned anywhere between the start and end of the pipeline script.
When the Flow component is connected to multiple descendant nodes, such as Transforms, Outputs, or Flows, it is necessary to select a Data Sending Method from the Input component.
If the Flow component selects the Conditional Distribution component, you must specify the distribution condition when connecting the components:
- Select Condition Result Is True to send data downstream when the ancestor node's result is true.
- Select Condition Result Is False to send data downstream when the ancestor node's result is false.

Transform components

Transform components process source data from input components by computing, filtering, or encrypting data fields. For details about each transform component, see the configuration details of each component.

Transform components can be connected to multiple Downstream components, such as Transforms, Outputs, and Flows. It is necessary to specify the Input component's Data Sending Method when establishing these connections.

Directed connections

After selecting components, use directed connections to link upstream input components to downstream transform, flow, and output components. The runtime executes each component sequentially along these connections. The following figure shows the upstream and downstream relationships.

Canvas operations

The pipeline canvas supports building multiple pipeline scripts simultaneously. Right-click the canvas to access the following operations.

Operation	Description
Copy	Copy existing components on the pipeline canvas.
Paste	Paste the copied pipeline components onto the pipeline canvas.
Delete	Delete the selected components from the canvas.
Select All	Select all components on the pipeline canvas.
Lasso Select	Use the mouse to lasso and select multiple components on the canvas.

Switch to code editor components

For components other than LogicalTable, code editor, and local file, input and output components support switching to Code Editor mode in the configuration dialog box. This switch is irreversible. The following table uses the MySQL input component as an example.

Before switching	After switching

Component configuration instructions

For configuration instructions for each Dataphin component, see the following tables:

Input components

Component Name	Component Configuration
MYSQL	MySQL input component
Oracle	set up the Oracle input component
Vertica	Vertica input component
FTP	FTP input component
LogicalTable	set up the LogicalTable Input Component
AnalyticDB for PostgreSQL	AnalyticDB for PostgreSQL input component
PolarDB	PolarDB input component
Local file	Local File Input Component.
Teradata	Teradata output component.
OceanBase	OceanBase input component
Hologres	Hologres input component.
DataHub	DataHub input component
DM	DM (Dameng) input component
TiDB	TiDB input component
GBase 8a	GBase 8a Output Component
SAP Table	SAP Table input component
StarRocks	StarRocks input component
Elasticsearch	Elasticsearch input component
Salesforce	StarRocks input component.
SelectDB	SelectDB output component.
Microsoft SQL Server	Microsoft SQL Server input component
PostgreSQL	PostgreSQL input component.
PolarDB-X (formerly DRDS)	PolarDB-X (formerly DRDS) input component
MaxCompute	MaxCompute input component.
MongoDB	set up the MongoDB input component.
AnalyticDB for MySQL 3.0	AnalyticDB for MySQL 3.0 input component
Log Service	to ensure proper data integration
OSS	OSS input component
SAP HANA	SAP HANA input component
IBM DB2	IBM DB2 input component
Code editor input	Editor input component
ClickHouse	ClickHouse input component
Kafka	Kafka output component
API	API Input Component
KingbaseES	KingbaseES Input Component
GoldenDB	GoldenDB Input Component
Impala	Impala Input Component
OpenGauss	OpenGauss input component
Greenplum	Greenplum Input Component

Output components

Component Name	Configuration Instructions
MYSQL	MySQL Output Component
Oracle	set up the Oracle output component
Vertica	here.
FTP	FTP output component
AnalyticDB for MySQL 2.0	AnalyticDB for MySQL 2.0 output component
AnalyticDB for MySQL 3.0	AnalyticDB for MySQL 3.0 output component
PolarDB	PolarDB output component
SAP HANA	SAP HANA output component
IBM DB2	IBM DB2 output component
Output from the code editor	Configuration of the code editor component
ClickHouse	ClickHouse output component
Kafka	Kafka output component
KingbaseES	KingbaseES output component
GoldenDB	GoldenDB output component
Impala	Impala Output Component
StarRocks	StarRocks Output Component
Greenplum	Greenplum Output Component
Microsoft SQL Server	Microsoft SQL Server output component
PostgreSQL	PostgreSQL output component
PolarDB-X (formerly known as DRDS)	PolarDB-X output component.
MaxCompute	for detailed instructions.
MongoDB	MongoDB output component
Elasticsearch	Elasticsearch output component.
AnalyticDB for PostgreSQL	AnalyticDB for PostgreSQL output component
OSS	OSS output component
Teradata	Teradata Output Component
OceanBase	OceanBase output component
Hologres	Hologres output component
DataHub	DataHub output component
DM	DM (Dameng) output component
TiDB	TiDB output component
GBase 8a	GBase 8a Output Component
OpenGauss	for detailed instructions
API	API Output Component
SelectDB	SelectDB output component.

Transform components

Component Name	Component Configuration
Field Selection	Field Selection Transform Component
Signature Calculation	Field Calculation Transform Component
Filter	Filter Transform Component
Encryption	Encrypt Transform Component
Decryption	Decrypt Transform Component

Flow components

Component Name	Configuration Instructions
Throttling	Throttling Flow Component
Conditional Distribution	Conditional Distribution Flow Component

Custom components

To use custom components in Dataphin, you must first create them on the platform. For instructions, refer to creating an offline custom source type.