configure snowflake output component-Dataphin(Dataphin)-阿里云帮助中心

The Snowflake output component writes data to a Snowflake data source. When synchronizing data from other data sources to a Snowflake data source, you must configure the target data source for the Snowflake output component after configuring the source data information.

Prerequisites

You have created a Snowflake data source. For more information, see Create a Snowflake Data Source.
The account that configures the Snowflake output component properties must have write-through permission on the data source. If you do not have this permission, you must apply for data source permissions. For more information, see Apply for, Renew, and Return Data Source Permissions.

Procedure

On the Dataphin home page, in the top menu bar, select Development > Data Integration .
On the Integration page, in the top menu bar, select Project . (In Dev-Prod mode, select an environment.)
In the navigation pane on the left, click Batch Pipeline . In the Batch Pipeline list, click the offline pipeline to open its configuration page.
Click Component Library in the upper-right corner of the page to open the Component Library panel.
In the navigation pane on the left of the Component Library panel, select Output . In the output component list on the right, find the Snowflake component and drag it to the canvas.
Click and drag the icon of the target input component to connect it to the current Snowflake output component.
Click the icon in the Snowflake output component card to open the Snowflake Output Configuration dialog box.

In the Snowflake Output Configuration dialog box, configure the following parameters.

Parameter		Description
Basic Settings	Step Name	The name of the Snowflake output component. Dataphin automatically generates the step name. Modify it as needed. Naming conventions are as follows: The value can contain only Chinese characters, letters, numbers, and underscores (_). Length cannot exceed 64 characters.
	Datasource	The data source drop-down list displays all Snowflake data sources, including those for which you have write-through permissions and those for which you do not. Click the icon to copy the current data source name. For a data source that does not have write-through permission, you can click Request next to the data source to request the permission. For more information, see Request data source permissions. If you do not have a Snowflake data source, click the Create icon to create a data source. For detailed steps, see Creating a Snowflake Data Source.
	Time Zone	Time-formatted data is processed based on the current time zone. The default time zone is configured in the selected data source and cannot be modified. Note For tasks created before V5.1.2, you can select Data Source Default Configuration or Channel Configuration Time Zone. Channel Configuration Time Zone is selected by default. Data Source Default Configuration: The default time zone of the selected data source. Channel Configuration Time Zone: The time zone configured for the current integration task in Properties > Channel Configuration.
	Schema (Optional)	Enter the schema where the table is located. Cross-schema table selection is supported. If left blank, the schema configured in the data source is used by default.
	Table	Select the target table for output data. You can enter a table name keyword to search, or enter the exact table name and click Precise Search. After selecting a table, the system automatically performs a table status check. Click the icon to copy the name of the currently selected table. If the Snowflake data source does not have a target table for data synchronization, use the one-click table creation feature to quickly generate a target table. The detailed procedure is as follows: Click One-Click Table Creation. Dataphin automatically matches the code for creating the target table, including the target table name (source table name by default), field types (initially converted based on Dataphin fields), and other information. You can modify the SQL script for creating the target table as needed, then click New. After the target table is successfully created, Dataphin automatically sets the newly created target table as the target table for output data. Note If a table with the same name exists in the development environment, Dataphin reports an error indicating that the table already exists after you click Create. If there are no matching items, integration based on manually entered table names is also supported.
	Production Table Missing Policy	Specifies the policy for when the production table does not exist. You can select Do Not Create or Automatic Creation. The default is Automatic Creation. If you select Do not create, the production table is not created when the task is published. If you select Automatic creation, a table with the same name is created in the target environment when the task is published. Do Not Process: If the target table does not exist, a message indicating that the target table does not exist is displayed upon submission, but the task can still be published. In this case, create the target table in the production environment before executing the task. Automatic Creation: requires you to Edit Table Creation Statement. The field is pre-filled by default with the table creation statement of the selected table, and you can adjust it. The table name in the table creation statement uses the placeholder `${table_name}`, and you can only use this placeholder, which will be replaced with the actual table name during execution. If the target table does not exist, create the table based on the table creation statement. If table creation fails, the check result is failed upon publishing. Modify the table creation statement based on the error message, and then publish again. If the target table already exists, table creation is not performed. Note This option is supported only in Dev-Prod mode projects.
	Loading Policy	Select the policy for writing data to the target table. Loading Policy includes: Append Data (insert into): If a primary key/constraint violation occurs, a dirty data error is reported. Update on Primary Key Conflict (merge into): If a primary key/constraint violation occurs, the data of the mapped fields is updated on existing records.
	Bulk Write Data Volume (Optional)	The amount of data written at one time. You can also set the Number of Bulk Write Records. During writing, the system writes data based on which of the two configurations reaches the upper limit first. The default value is 32 MB.
	Number of Bulk Write Records (Optional)	The default value is 2048 records. When data is synchronized and written, a batch write strategy is used. The parameters include Number of Bulk Write Records and Bulk Write Data Volume. When the accumulated data volume reaches either of the set upper limits (that is, the bulk write data volume or the number of records limit), the system considers a batch of data to be full and immediately writes this batch of data to the target. Set the bulk write data volume to 32 MB. For the upper limit of bulk insert records, adjust it flexibly based on the actual size of a single record. Usually, set a larger value to fully utilize the advantages of batch writing. For example, if the size of a single record is about 1 KB, set the bulk insert byte size to 16 MB. Considering this condition, set the number of bulk insert records to a value greater than 16 MB divided by the single record size of 1 KB (that is, greater than 16384 records). Here, assume it is set to 20000 records. After this configuration, the system triggers batch write operations based on the bulk insert byte size. Each time the accumulated data volume reaches 16 MB, a write action is performed.
	Preparation Statement (Optional)	The SQL script executed on the database before data import. For example, to ensure continuous service availability, before the current step writes data, create target table Target_A, write data to Target_A. After the current step finishes writing data, rename Service_B (the table continuously providing service in the database) to Temp_C, then rename Target_A to Service_B, and finally delete Temp_C.
	Completion Statement (Optional)	The SQL script executed on the database after data import.
Field Mapping	Input Fields	Displays input fields based on upstream output.
	Output Fields	Displays output fields. The following operations are supported: Field Management: Click Field Management to select output fields. Click the icon to move Selected Input Fields to Unselected Input Fields. Click the icon to move Unselected Input Fields to Selected Input Fields. Batch Add: Click Batch Add to perform batch configuration in JSON, TEXT, or DDL format. Batch configuration in JSON format. For example: `// Example: [{ "name": "user_id", "type": "String" }, { "name": "user_name", "type": "String" }]` Note name indicates the name of the imported field, and type indicates the type of the imported field. For example, `"name":"user_id","type":"String"` indicates that the field named user_id is imported and its type is set to String. Batch configuration in TEXT format. For example: `// Example: user_id,String user_name,String` The row delimiter separates information for each field. The default value is a line feed (\n). Line feeds (\n), semicolons (;), and periods (.) are supported. The column delimiter separates the field name and field type. The default value is a comma (,). Batch configuration in DDL format. For example: `CREATE TABLE tablename ( id INT PRIMARY KEY, name VARCHAR(50), age INT );` Create Output Field: Click + Create Output Field, fill in Column and select Type as prompted. Once you finish configuring the current row, click the icon to save.
	Mapping	Manually select field mappings based on upstream input and target table fields. Quick Mapping includes Row Mapping and Name Mapping. Name Mapping: Maps fields with the same field name. Row Mapping: The field names of the source table and target table are inconsistent, but the data in the corresponding rows needs to be mapped. Only fields in the same row are mapped.

Click Confirm to complete the property configuration for the Snowflake output component.