Configure the Doris Output Component-Dataphin(Dataphin)-阿里云帮助中心

Use the Doris output component to write data from external databases into Doris. You can also replicate and push data from storage systems connected to your big data platform into Doris for integration and further processing.

Prerequisites

You have created a Doris data source. For more information, see Create a Doris Data Source.
The account used to configure the Doris output component must have write-through permission on the data source. If you do not have this permission, request it. For more information, see Request Data Source Permissions.

Procedure

On the Dataphin homepage, in the top menu bar, click Develop, and then click Data Integration.
On the Data Integration page, in the top menu bar, click Project. In Dev-Prod mode, also select an environment.
In the left navigation pane, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline that you want to develop. The configuration page for that offline pipeline opens.
In the upper-right corner of the page, click Component Library to open the Component Library panel.
In the left navigation pane of the Component Library panel, click Output. In the output component list on the right, locate the Doris component and drag it onto the canvas.
Click and drag the icon of a target input, transform, or flow component to connect it to the Doris output component.
In the Doris output component card, click the icon to open the Doris Output Configuration dialog box.

In the Doris Output Configuration dialog box, configure the parameters in the following table.

Parameter		Description
Basic Settings	Step Name	The name of the Doris output component. Dataphin generates a default name that you can change. Naming rules: Use only Chinese characters, letters, underscores (_), and digits. Keep the name no longer than 64 characters.
	Datasource	Lists all Doris data sources, regardless of your permissions. Click the icon to copy the data source name. If you do not have write-through permission for a data source, click Request next to the data source to request write-through permission. For more information, see Request Data Source Permissions. If you do not have a Doris data source, click Create Data Source to create one. For more information, see Create a Doris Data Source.
	Table	Select the target table for output data. Enter a keyword to search for tables, or enter the exact table name and click Exact Match. After you select a table, the system automatically checks its status. Click the icon to copy the selected table name. If the target table does not exist in the Doris data source, use the one-click table creation feature to generate it quickly. To do so: Click One-Click Table Creation. Dataphin automatically generates the SQL script to create the target table, including the table name (default: source table name) and field types (converted from Dataphin fields). Modify the SQL script as needed, then click Create. After the table is created, Dataphin uses it as the target table for output data. Note If a table with the same name already exists in the development environment, clicking Create returns an error. If no matching table is found, you can still integrate data using a manually entered table name.
	Production Table Missing Strategy	Choose how to handle missing production tables. Options are No Action or Automatic Creation. Default: Automatic Creation. If you choose No Action, the task publishes without creating the production table. If you choose Automatic Creation, the task creates a table with the same name in the target environment when published. No Action: If the target table does not exist, the system warns you during submission but lets you publish the task. You must manually create the target table in the production environment before running the task. Automatic Creation: You need to Edit Table Creation Statement. The table creation statement of the selected table is pre-filled by default, and you can adjust it. The table name in the statement uses the placeholder `${table_name}`, and only this placeholder is supported. It will be replaced with the actual table name during execution. If the target table does not exist, Dataphin first runs the CREATE TABLE statement. If table creation fails, publishing fails. Fix the statement based on the error message, then republish. If the target table already exists, no action is taken. Note This setting is available only for projects in Dev-Prod mode.
	Data Format	Select CSV or JSON. If you select CSV, also configure CSV Column Delimiter and CSV Row Delimiter.
	CSV Column Delimiter (Optional)	The column delimiter for StreamLoad CSV import. Default: `_@dp@_`. Leave this field empty to use the default. If your data contains `_@dp@_`, use another character as the delimiter.
	CSV Row Delimiter (Optional)	The row delimiter for StreamLoad CSV import. Default: `_#dp#_`. Leave this field empty to use the default. If your data contains `_#dp#_`, use another character as the delimiter.
	Bulk Write Size (Optional)	The maximum data size per batch write. Works together with Bulk Write Count — the system writes data when either limit is reached. Default: 32 MB.
	Bulk Write Count (Optional)	The maximum number of rows per batch write. Default: 2,048 rows. Works together with Bulk Write Count and Bulk Write Size. When accumulated data reaches either limit (size or count), it is treated as a full batch and written to the target at once. We recommend setting the bulk write size to 32 MB. Adjust the bulk write count based on the average record size. Set it high to maximize batch efficiency. For example, if each record is about 1 KB, set the bulk write size to 16 MB and the bulk write count to more than 16,384 (16 MB ÷ 1 KB). Here, we use 20,000 rows. With this setup, the system triggers batch writes when the accumulated data reaches 16 MB.
	Pre-SQL Statement (Optional)	An SQL script to run on the database before importing data. For example, to maintain service availability, run this sequence: create target table Target_A, write data to Target_A, rename the live service table Service_B to Temp_C, rename Target_A to Service_B, and delete Temp_C.
	Post-SQL Statement (Optional)	An SQL script to run on the database after importing data.
Field Mapping	Input Fields	Lists input fields from upstream components.
	Output Fields	Lists output fields. You can: Manage fields: Click Field Management to select output fields. Click the icon to move Selected Input Fields to Unselected Input Fields. Click the icon to move Unselected Input Fields to Selected Input Fields. Batch Add: Click Batch Add to configure items in bulk using JSON, TEXT, and DDL formats. JSON format example: `// Example: [{ "name": "user_id", "type": "String" }, { "name": "user_name", "type": "String" }]` Note The `name` field specifies the name of the field to import, and the `type` field specifies the field type after import. For example, `"name":"user_id","type":"String"` imports the field whose name is `user_id` and sets its field type to `String`. You can configure multiple settings in TEXT format, for example: `// Example: user_id,String user_name,String` The row delimiter separates field entries. Default: line feed (\n). Supported delimiters: \n, semicolon (;), and period (.). The column delimiter separates field names from field types. Default: comma (,). Batch configuration in DDL format, such as: `CREATE TABLE tablename ( id INT PRIMARY KEY, name VARCHAR(50), age INT );` Create output field: Click + Create Output Field. Enter the Column name and select the Type. Click the icon to save the row.
	Mapping	Map input fields to target table fields. Quick Mapping includes Row Mapping and Name Mapping. Name Mapping: Maps fields with identical names. Row Mapping: Maps fields by position when source and target field names differ but their row positions match.

Click Confirm to finish configuring the Doris Output Component.