Configure the Snowflake input component

更新时间:
复制 MD 格式

The Snowflake input component reads data from a Snowflake data source. To sync data from a Snowflake data source to another data source, you must first configure the Snowflake input component to read from the source, and then configure the component for the destination. This topic describes how to configure the Snowflake input component.

Prerequisites

Procedure

  1. On the Dataphin home page, in the top menu bar, choose Develop > Data Integration.

  2. On the Data Integration page, in the top menu bar, select a Project. If you are in Dev-Prod mode, you must also select an environment.

  3. In the navigation pane on the left, click Offline Integration. In the Offline Integration list, click the offline pipeline that you want to develop to open its configuration page.

  4. In the upper-right corner of the page, click Component Library to open the Component Library panel.

  5. In the Component Library panel, select Input from the navigation pane on the left. From the list of input components on the right, find the Snowflake component and drag it to the canvas.

  6. On the Snowflake input component card, click the image icon to open the Snowflake Input Configuration dialog box.

  7. In the Snowflake Input Configuration dialog box, configure the following parameters.

    Parameter

    Description

    Step Name

    The name of the Snowflake input component. Dataphin automatically generates a name. You can change it as needed. The naming convention is as follows:

    • The name can contain only letters, digits, and underscores (_).

    • The name must be 64 characters or less.

    Datasource

    From the drop-down list, select the Snowflake data source. The list shows all Snowflake data sources, including those for which you have or do not have read-through permission. Click the image icon to copy the data source name.

    Time Zone

    Data with a time format is processed based on the selected time zone. The default is the time zone configured in the data source. This parameter cannot be modified.

    Note

    For nodes created before V5.1.2, you can select Data Source Default Configuration or Channel Configuration Time Zone. The default selection is Channel Configuration Time Zone.

    • Data Source Default Configuration: The default time zone of the selected data source.

    • Channel Configuration Time Zone: The time zone configured for the current integration node under Properties > Channel Configuration.

    Schema (optional)

    Select the schema that contains the table. You can select tables from different schemas. If you leave this parameter empty, the schema configured in the data source is used by default.

    Table

    Select or enter the source table. To find a specific table, enter its exact name and click Exact Match. After you select a table, the system automatically checks its status. Click the image icon to copy the name of the selected table.

    Split Key (optional)

    The system partitions data based on the configured split key. This feature, when used with the concurrency setting, enables concurrent reads. You can use a column from the source table as the split key. To ensure optimal performance, use a primary key or an indexed column as the split key.

    Important

    If you select a date or time data type, the system identifies the minimum and maximum values. It then splits the data into chunks based on the total time range and the degree of concurrency. The data chunks are not guaranteed to be of equal size.

    Input Filter (optional)

    Enter a conditional expression that Snowflake supports. Enter only the content that follows the `WHERE` keyword. Do not include the `WHERE` keyword. You can use global variables, such as the data timestamp ${bizdate}.

    Output Fields

    The Output Fields area displays all fields from the selected table that match your filter. You can perform the following operations:

    • Manage fields: Delete fields that you do not want to output to downstream components.

      • To delete a single field: In the row of the field you want to delete, click the sgaga icon in the Operation column.

      • To delete fields in a batch: Click Field Management. In the Field Management dialog box, select the fields you want to remove. Click the image left arrow icon to move them to the unselected list. Then, click OK.

        image..png

    • Batch Add: Click Batch Add to configure fields in a batch using JSON, TEXT, or DDL format.

      Note

      After you add fields in a batch and click OK, the existing field configuration is overwritten.

      • To configure fields in a batch using the JSON format, see the following example:

        // Example:
          [{
             "index": 0,
             "name": "id",
             "type": "int(10)",
             "mapType": "Long",
             "comment": "comment1"
           },
           {
             "index": 1,
             "name": "user_name",
             "type": "varchar(255)",
             "mapType": "String",
             "comment": "comment2"
         }]
        Note

        index specifies the column number. name specifies the field name after import. type specifies the field type after import.

        For example, "index":3,"name":"user_id","type":"String" imports the fourth column from the file, names the field user_id, and sets the field type to String.

      • To configure fields in a batch using the TEXT format, see the following example:

        // Example:
        0,id,int(10),Long,comment1
        1,user_name,varchar(255),Long,comment2
        • The row delimiter separates the information for each field. The default is a line feed (\n). Semicolons (;) and periods (.) are also supported.

        • The column delimiter is used to separate field names from field types. The default is a comma (,). The comma `,` is supported. The field type can be omitted, and the default delimiter is `,`.

      • To configure fields in a batch using the DDL format, see the following example:

        CREATE TABLE tablename (
        	user_id serial,
        	username VARCHAR(50),
        	password VARCHAR(50),
        	email VARCHAR (255),
        	created_on TIMESTAMP,
        );
    • Create Output Field: Click + Create Output Field. Enter the Column, Type, and Comment, and select a Mapping Type. After you configure the row, click the image icon to save.

  8. Click Confirm to save the configuration of the Snowflake input component.