Configure the dataset input component

更新时间:
复制 MD 格式

The dataset input component reads data from a metadata table within a dataset. This allows you to combine unstructured data with structured data for data warehouse development.

Prerequisites

  • To use the dataset input component, you must first purchase the unstructured data feature.

  • A dataset has been created. For more information, see Dataset.

Procedure

  1. In the top navigation bar of the Dataphin home page, choose Develop > Data Integration.

  2. On the data integration page, select a project. If you are using the Dev-Prod mode, you must also select an environment.

  3. In the left-side navigation pane, click Batch Integration. Then, click the target batch pipeline to open its configuration page.

  4. In the upper-right corner, click Component Library to open the Component Library panel.

  5. In the Component Library panel, select Input, and then drag the Dataset component onto the canvas.

  6. On the dataset input component card, click the image icon to open the Dataset Input Configuration dialog box.

  7. In the Dataset Input Configuration dialog box, configure the parameters.

    Parameter

    Description

    Step name

    The name of the dataset input component. Dataphin automatically generates a step name, which you can modify based on your business scenario. The name must adhere to the following rules:

    • Can contain only Chinese characters, letters, digits, and underscores (_).

    • Cannot exceed 64 characters in length.

    Dataset

    Select a table dataset or hybrid dataset from the current project. You can search for a dataset by entering keywords from its name. After selecting a dataset, you must also select its version. Click the image icon to copy the name of the current dataset.

    Batch read size (Optional)

    Specifies the number of records to read per batch. Setting a batch size, such as 1024, allows the system to read data in chunks instead of one record at a time. This method reduces interactions with the data source, improving I/O efficiency and decreasing network latency.

    Filter

    Enter a conditional expression supported by PostgreSQL to filter data. Specify the condition that would follow a WHERE keyword. Do not include the WHERE keyword itself. You can also use system-level global variables, such as the business date ${bizdate}.

    Output fields

    Displays all fields from the metadata of the selected dataset.

  8. Click OK to save the configuration for the dataset input component.