Configure the Snowflake input component

更新时间:
复制 MD 格式

The Snowflake input component reads data from a Snowflake data source. To synchronize Snowflake data to another data source, configure the Snowflake input component as the source, and then configure the destination.

Prerequisites

  • You have created a Snowflake data source. For more information, see .

  • The account used to configure the Snowflake input component must have read-through permissions on the data source. If you do not have the required permissions, request them. For more information, see Request, renew, and return data source permissions.

Procedure

  1. In the top menu bar of the Dataphin home page, choose Developer > Data Integration.

  2. In the top menu bar of the integration page, select a project. If you are in Dev-Prod mode, also select an Environment.

  3. In the navigation pane on the left, click Offline Integration. In the Offline Integration list, click the offline pipeline that you want to develop to open its configuration page.

  4. In the upper-right corner of the page, click Component Library to open the Component Library panel.

  5. In the navigation pane on the left of the Component Library panel, choose Input. Find the Snowflake component in the list of input components on the right and drag it to the canvas.

  6. Click the image icon on the Snowflake input component card to open the Snowflake Input Configuration dialog box.

  7. In the Snowflake Input Configuration dialog box, configure the following parameters.

    Parameter

    Description

    Step Name

    The name of the Snowflake input component. Dataphin automatically generates a step name, which you can change. The naming convention is as follows:

    • The name can contain only Chinese characters, letters, underscores (_), and digits.

    • The maximum length is 64 characters.

    Datasource

    Lists all Snowflake data sources, including those you have read-through permissions for and those you do not. Click the image icon to copy the current data source name.

    • For data sources you do not have read-through permissions for, click Request next to the data source to request read permissions. For more information, see Request, renew, and return data source permissions.

    • If you do not have a Snowflake data source, click Create Data Source to create one. For more information, see .

    Time Zone

    Time-formatted data is processed based on the current time zone. By default, this is the time zone configured in the selected data source and cannot be changed.

    Note

    For nodes created before version 5.1.2, you can select Data source default configurations or Channel configuration time zone. The default selection is Channel configuration time zone.

    • Data source default configurations: The default time zone of the selected data source.

    • Channel configuration time zone: The time zone configured for the current integration node in Properties > Channel Configuration.

    Schema (Optional)

    Select the schema where the table resides. You can select tables across different schemas. If you leave this blank, the schema configured in the data source is used by default.

    Table

    Select or enter the source table. You can enter the exact table name and click Exact Search. After you select a table, the system automatically checks its status. Click the image icon to copy the name of the selected table.

    Split Key (Optional)

    The system partitions data based on the split key field, which works with the concurrency setting to enable parallel reads. You can use a column from the source table as the split key. Use a primary key or indexed column as the split key for optimal transfer performance.

    Important

    When you select a date/time type, the system performs a brute-force split based on the total time range and concurrency by identifying the maximum and minimum values. The splits are not guaranteed to be even.

    Input Filter (Optional)

    Enter a filter expression supported by Snowflake. Enter only the content after the WHERE keyword, without including WHERE itself. You can use system global variables such as the data timestamp ${bizdate}.

    Output Fields

    Displays all fields from the selected table that match the filter criteria. The following operations are supported:

    • Field Management: If you do not need certain fields in downstream components, delete them:

      • To delete a single field: Click the sgaga icon in the Actions column to remove the field.

      • To delete fields in a batch: Click Field Management. In the Field Management dialog box, select the fields, click the image left arrow icon to move them to the unselected list, and then click OK.

        image..png

    • Batch Add: Click Batch Add to configure fields in a batch using JSON, TEXT, or DDL format.

      Note

      Adding fields in a batch and clicking OK overwrites the existing field information.

      • To configure in a batch using JSON format, for example:

        // Example:
          [{
             "index": 0,
             "name": "id",
             "type": "int(10)",
             "mapType": "Long",
             "comment": "comment1"
           },
           {
             "index": 1,
             "name": "user_name",
             "type": "varchar(255)",
             "mapType": "String",
             "comment": "comment2"
         }]
        Note

        index specifies the column number of the object. name specifies the field name after import. type specifies the field type after import.

        For example, "index":3,"name":"user_id","type":"String" indicates that the fourth column in the file is imported, the field name is user_id, and the field type is String.

      • Batch configure fields in TEXT format. For example:

        // Example:
        0,id,int(10),Long,comment1
        1,user_name,varchar(255),Long,comment2
        • The row delimiter separates the information for each field. The default is a line feed (\n). Semicolons (;) and periods (.) are also supported.

        • The column delimiter separates the field name from the field type. The default is a half-width comma (,). It supports ','. The field type is optional, and the default is ','.

      • To configure in a batch using DDL format, for example:

        CREATE TABLE tablename (
        	user_id serial,
        	username VARCHAR(50),
        	password VARCHAR(50),
        	email VARCHAR (255),
        	created_on TIMESTAMP,
        );
    • Create Output Field: Click + Create Output Field and enter the Column, Type, and Remarks, and select the Mapping Type. Click the image icon to save the row.

  8. Click Confirm to complete the property configuration for the Snowflake input component.