Configure DataHub input components to read data from DataHub data sources into the storage system connected to the big data platform for data integration and secondary processing.
Prerequisites
-
A DataHub data source has been created. For more information, see Create a DataHub Data Source.
-
The account used to configure DataHub input component properties must have read-through permission on the data source. If you lack the required permission, request it first. For more information, see Request Data Source Permission.
Procedure
-
On the Dataphin home page, select Development > Data Integration from the top menu bar.
-
In the top menu bar of the integration page, select Project (Dev-Prod mode requires selecting an environment).
-
In the left-side navigation pane, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline that needs to be developed to open its configuration page.
-
Click the Component Library in the upper right corner of the page to open the Component Library panel.
-
In the left-side navigation pane of the Component Library panel, select Input, find the DataHub component in the input component list on the right, and drag the component to the canvas.
-
Click the
icon in the DataHub input component card to open the DataHub Input Configuration dialog box. -
In the DataHub Input Configuration dialog box, configure the parameters according to the following table.
Parameter
Description
Step Name
The name of the DataHub input component. Dataphin automatically generates the step name, but you can modify it as needed. The naming convention is as follows:
-
Can only contain Chinese characters, letters, underscores (_), and numbers.
-
Cannot exceed 64 characters.
Datasource
Lists all DataHub-type data sources in the current Dataphin instance, including those you have read-through permission for and those you do not. Click the
icon to copy the current data source name.-
For data sources without read-through permission, click Request next to the data source to request read-through permission. For more information, see Request, Renew, and Return Data Source Permission.
-
If you do not have a DataHub-type data source, click Create to create one. For more information, see Create a DataHub Data Source.
Subject
The name of the DataHub topic. Select the topic you want to read from the drop-down list.
Consumption Start Time
The start offset for data consumption. Specify a time in
yyyyMMddHHmmssformat as the left boundary of the time range. This value must be used with schedule parameters. For example, if the schedule parameter isstartTime=${20220101000000}, set Consumption Start Time to${startTime}.Consumption End Time
The end offset for data consumption. Specify a time in yyyyMMddHHmmss format as the right boundary of the time range. This value must be used with schedule parameters. For example, if the schedule parameter is
endTime=${20220101000000}, set Consumption End Time to${endTime}.Batch Read Count
The number of records read per batch. Configure a batch size (such as 1024 records) to reduce interactions with the data source, improve I/O efficiency, and lower network latency.
Output Fields
Displays all fields matched by the selected table and filter criteria. To exclude fields from downstream components, delete them:
-
Single Field Deletion Scenario: To delete a few fields, click the
icon in the operation column to remove the field. -
Batch Field Deletion Scenario: To delete many fields at once, click Field Management, select the fields in the Field Management dialog box, click the
shift left icon to move them to the unselected list, and click OK. 
-
-
Click OK to complete the DataHub input component configuration.