The dataset input component reads data from a metadata table within a dataset. This allows you to combine unstructured data with structured data for data warehouse development.
Prerequisites
To use the dataset input component, you must first purchase the unstructured data feature.
A dataset has been created. For more information, see Dataset.
Procedure
In the top navigation bar of the Dataphin home page, choose Develop > Data Integration.
On the data integration page, select a project. If you are using the Dev-Prod mode, you must also select an environment.
In the left-side navigation pane, click Batch Integration. Then, click the target batch pipeline to open its configuration page.
In the upper-right corner, click Component Library to open the Component Library panel.
In the Component Library panel, select Input, and then drag the Dataset component onto the canvas.
On the dataset input component card, click the
icon to open the Dataset Input Configuration dialog box.In the Dataset Input Configuration dialog box, configure the parameters.
Parameter
Description
Step name
The name of the dataset input component. Dataphin automatically generates a step name, which you can modify based on your business scenario. The name must adhere to the following rules:
Can contain only Chinese characters, letters, digits, and underscores (_).
Cannot exceed 64 characters in length.
Dataset
Select a table dataset or hybrid dataset from the current project. You can search for a dataset by entering keywords from its name. After selecting a dataset, you must also select its version. Click the
icon to copy the name of the current dataset.Batch read size (Optional)
Specifies the number of records to read per batch. Setting a batch size, such as 1024, allows the system to read data in chunks instead of one record at a time. This method reduces interactions with the data source, improving I/O efficiency and decreasing network latency.
Filter
Enter a conditional expression supported by PostgreSQL to filter data. Specify the condition that would follow a
WHEREkeyword. Do not include theWHEREkeyword itself. You can also use system-level global variables, such as the business date${bizdate}.Output fields
Displays all fields from the metadata of the selected dataset.
Click OK to save the configuration for the dataset input component.