Configure DataHub Input Components

更新时间:
复制 MD 格式

Configure DataHub input components to read data from DataHub data sources into the storage system connected to the big data platform for data integration and secondary processing.

Prerequisites

  • A DataHub data source has been created. For more information, see Create a DataHub Data Source.

  • The account used to configure DataHub input component properties must have read-through permission on the data source. If you lack the required permission, request it first. For more information, see Request Data Source Permission.

Procedure

  1. On the Dataphin home page, select Development > Data Integration from the top menu bar.

  2. In the top menu bar of the integration page, select Project (Dev-Prod mode requires selecting an environment).

  3. In the left-side navigation pane, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline that needs to be developed to open its configuration page.

  4. Click the Component Library in the upper right corner of the page to open the Component Library panel.

  5. In the left-side navigation pane of the Component Library panel, select Input, find the DataHub component in the input component list on the right, and drag the component to the canvas.

  6. Click the image icon in the DataHub input component card to open the DataHub Input Configuration dialog box.

  7. In the DataHub Input Configuration dialog box, configure the parameters according to the following table.

    Parameter

    Description

    Step Name

    The name of the DataHub input component. Dataphin automatically generates the step name, but you can modify it as needed. The naming convention is as follows:

    • Can only contain Chinese characters, letters, underscores (_), and numbers.

    • Cannot exceed 64 characters.

    Datasource

    Lists all DataHub-type data sources in the current Dataphin instance, including those you have read-through permission for and those you do not. Click the image icon to copy the current data source name.

    Subject

    The name of the DataHub topic. Select the topic you want to read from the drop-down list.

    Consumption Start Time

    The start offset for data consumption. Specify a time in yyyyMMddHHmmss format as the left boundary of the time range. This value must be used with schedule parameters. For example, if the schedule parameter is startTime=${20220101000000}, set Consumption Start Time to ${startTime}.

    Consumption End Time

    The end offset for data consumption. Specify a time in yyyyMMddHHmmss format as the right boundary of the time range. This value must be used with schedule parameters. For example, if the schedule parameter is endTime=${20220101000000}, set Consumption End Time to ${endTime}.

    Batch Read Count

    The number of records read per batch. Configure a batch size (such as 1024 records) to reduce interactions with the data source, improve I/O efficiency, and lower network latency.

    Output Fields

    Displays all fields matched by the selected table and filter criteria. To exclude fields from downstream components, delete them:

    • Single Field Deletion Scenario: To delete a few fields, click the sgaga icon in the operation column to remove the field.

    • Batch Field Deletion Scenario: To delete many fields at once, click Field Management, select the fields in the Field Management dialog box, click the image shift left icon to move them to the unselected list, and click OK. image..png

  8. Click OK to complete the DataHub input component configuration.