Configure the DM input component
Configure the DM input component to read data from a DM data source into Dataphin for data integration and development.
Prerequisites
-
A DM data source is created. For more information, see Create a DM data source.
-
The account used to configure the DM input component has read-through permission for the data source. If you do not have this permission, request it. For more information, see Request, renew, and return data source permissions.
Procedure
-
On the menu bar at the top of the Dataphin home page, choose Development > Data Integration.
-
On the menu bar at the top of the Data Integration page, select a project. In Dev-Prod mode, also select an environment.
-
In the navigation pane on the left, click Batch Pipeline. In the Batch Pipeline list, click the target batch pipeline to open its configuration page.
-
In the upper-right corner of the page, click Component Library to open the Component Library panel.
-
In the navigation pane of the Component Library panel, choose Input. Find the DM component in the list and drag it to the canvas.
-
Click the
icon on the DM input component card to open the DM Input Configuration dialog box. -
In the DM Input Configuration dialog box, configure the parameters.
Parameter
Description
Step Name
The name of the DM input component. Dataphin automatically generates a step name, which you can change. The naming conventions are as follows:
-
The name can contain only Chinese characters, letters, underscores (_), and digits.
-
The name cannot exceed 64 characters in length.
Datasource
The drop-down list displays all DM data sources in the current Dataphin project, including those for which you have read-through permission and those for which you do not. Click the
icon to copy the data source name.-
For a data source where you lack read-through permission, click Request next to the data source to request the permission. For more information, see Request, renew, and return data source permissions.
-
If you do not have a DM data source, click Create Data Source to create one. For more information, see Create a DM data source.
Number of Source Tables
Select whether to use a single table or multiple tables with the same schema as input. Valid values: Single Table and Multiple Tables.
-
Single Table: Syncs data from one source table to one destination table.
-
Multiple Tables: Syncs data from multiple source tables to the same destination table. The union algorithm is used to merge data from multiple tables into a single table.
For more information about union, see INTERSECT, UNION, and EXCEPT.
Table Match Method
Select General Rule or Database Regex.
NoteThis parameter is available only when you set Number of Source Tables to Multiple Tables.
Table
Select the source table or tables:
-
If you set Number of Source Tables to Single Table, enter a keyword to search for the table, or enter the exact table name and click Exact Match. After you select a table, the system automatically checks its status. Click the
icon to copy the name of the selected table. -
If you set Number of Source Tables to Multiple Tables, enter an expression to add tables based on the selected table match method.
-
If you select General Rule for Table Match Method: In the input box, enter a table expression to filter for tables with the same structure. The system supports enumerations, regular expression-like patterns, and a mix of both. For example,
table_[001-100];table_102;. -
If you select Database Regex for Table Match Method: In the input box, enter a regular expression that the current database supports. The system matches tables in the destination database based on this expression. At runtime, the node uses the database regex to match the new range of tables in real time for synchronization.
After you enter the expression, click Exact Match to view the list of matched tables in the Confirm Match Details dialog box.
-
Split Key (Optional)
The system partitions data based on the configured split key. Use this parameter together with the concurrency parameter to enable concurrent reads. Select a column from the source table as the split key. For best performance, use a primary key or an indexed column.
ImportantIf you select a date and time type, the system identifies the maximum and minimum values and performs a rough split based on the total time range and concurrency. The splits are not guaranteed to be even.
Batch Read Size (Optional)
The number of records to read per batch. Specify a batch size, such as 1024 records, instead of reading one record at a time to reduce data source interactions, improve I/O efficiency, and lower network latency.
Input Filter (Optional)
A filter condition for input fields. For example,
ds=${bizdate}. The Input Filter applies to the following scenarios:-
Filtering a fixed portion of data.
-
Parameter-based filtering.
Output Fields
Displays all fields from the selected tables that match the filter criteria. You can perform the following operations:
-
Field Management: To exclude certain fields from downstream components, delete them:
-
Single field deletion: To delete a small number of fields, click the
icon in the Actions column to delete the extra fields. -
Batch field deletion: To delete many fields, click Field Management. In the Field Management dialog box, select multiple fields, click the
left arrow icon to move the selected fields to the unselected list, and then click Confirm.
-
-
Batch Add: Click Batch Add to configure fields in a batch using JSON, TEXT, or DDL format.
NoteAfter you add fields in a batch and click Confirm, the existing field configuration is overwritten.
-
To configure in a batch using JSON format, for example:
// Example: [{ "index": 1, "name": "id", "type": "int(10)", "mapType": "Long", "comment": "comment1" }, { "index": 2, "name": "user_name", "type": "varchar(255)", "mapType": "String", "comment": "comment2" }]NoteThe `index` parameter specifies the column number of the object. The `name` parameter defines the field name, and the `type` parameter defines the field type after import. For example,
"index":3,"name":"user_id","type":"String"indicates that the fourth column from the file is imported, with 'user_id' as the field name and 'String' as the field type. -
To configure in a batch using TEXT format, for example:
// Example: 1,id,int(10),Long,comment1 2,user_name,varchar(255),Long,comment2-
The row delimiter separates the information for each field. The default delimiter is a line feed (\n). You can also use a semicolon (;) or a period (.).
-
The column delimiter separates field names from field types. The default value is a comma (,). You can use
','as the column delimiter. The field type is optional and defaults to','.
-
-
To configure in a batch using DDL format, for example:
CREATE TABLE tablename ( user_id serial, username VARCHAR(50), password VARCHAR(50), email VARCHAR (255), created_on TIMESTAMP, );
-
-
Add Output Field: Click +Add Output Field and enter the Column, Type, and Comment, and select the Mapping Type. After you configure the row, click the
icon to save.
-
-
Click Confirm to complete the configuration of the DM input component properties.