Create a Doris data source
By creating a Doris data source, you can enable Dataphin to read business data from Doris or write data to Doris. This topic describes how to create a Doris data source.
Background information
Doris, also known as Apache Doris, is a high-performance, real-time analytical database based on MPP architecture. It can return query results for massive data with sub-second response time. It effectively supports both high-concurrency point query scenarios and complex analytical scenarios with high throughput. With these capabilities, it meets the requirements for report analysis, ad hoc query, unified data warehouse construction, and data lake federated query acceleration. For more information, see Doris official website.
Permissions
Only custom global roles with the permission to create data sources and the roles of super administrator, data source administrator, domain architect, and project administrator can create data sources.
Procedure
On the Dataphin homepage, click Management Center > Data Source Management in the top navigation bar.
On the Datasource page, click +Create Data Source.
In the Big Data section of the Create Data Source page, select Doris.
If you have recently used Doris, you can also select it in the Recently Used section. You can also enter Doris keywords in the search box to quickly find it.
On the Create Doris Data Source page, configure the connection parameters.
Configure the basic information of the data source.
Parameter
Description
Datasource Name
The name must meet the following requirements:
It can contain only Chinese characters, uppercase and lowercase letters, digits, underscores (_), and hyphens (-).
It cannot exceed 64 characters in length.
Datasource Code
After you configure the data source code, reference tables in the data source in Flink_SQL nodes using the format
datasource_code.table_nameordatasource_code.schema.table_name. To automatically access the data source of the current environment, use the variable format${datasource_code}.tableor${datasource_code}.schema.table. For more information, see and Develop Flink_SQL nodes.ImportantThe data source code cannot be modified after it is configured.
You can preview data on the object details page in the asset directory and asset checklist only after the data source code is configured.
In Flink SQL, only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, SelectDB, and GaussDB data warehouse service (DWS) data sources are currently supported.
Data Source Description
A brief description of the Doris data source. It cannot exceed 128 characters.
Data Source Configuration
Based on whether the business data source distinguishes between production and development data sources:
If the business data source distinguishes between production and development data sources, select Production + Development Data Source.
If the business data source does not distinguish between production and development data sources, select Production Data Source.
Tag
You can categorize and tag data sources based on tags. For information about how to create tags, see Manage data source tags.
Configure the connection parameters between the data source and Dataphin.
If you select Production + Development Data Source for Data Source Configuration, you need to configure the connection information for both Production + Development Data Source. If you select Production Data Source, you only need to configure the connection information for the Production Data Source.
NoteTypically, production and development data sources should be configured as separate data sources to achieve environment isolation and reduce the impact of development data sources on production data sources. However, Dataphin also supports configuring them as the same data source with identical parameter values.
Parameter
Description
JDBC URL
Enter the endpoint of the data source. The endpoint format is
jdbc:mysql://host:port/dbname.Username,Password
The username and password used to log on to the Doris data source.
FE Node URL
Enter the endpoint of the FE node to access the FE node through a web server. The endpoint format is
{FE IP}:{Http Port}. The default port is 8030. You can configure multiple FE nodes. Separate them with commas (,).SSL Encryption
To establish an encrypted connection through SSL, you need to Enable SSL encryption, Upload The Truststore Certificate, and enter the Truststore Certificate Password.
Configure advanced settings for the data source.
Parameter
Description
connectTimeout
The connectTimeout duration of the database (in milliseconds). The default is 900,000 milliseconds (15 minutes).
NoteIf you include a connectTimeout configuration in the JDBC URL, the connectTimeout will be the timeout period configured in the JDBC URL.
For data sources created before Dataphin V3.11, the default connectTimeout is
-1, which indicates no timeout limit.
socketTimeout
The socketTimeout duration of the database (in milliseconds). The default is 1,800,000 milliseconds (30 minutes).
NoteIf you include a socketTimeout configuration in the JDBC URL, the socketTimeout will be the timeout period configured in the JDBC URL.
For data sources created before Dataphin V3.11, the default socketTimeout is
-1, which indicates no timeout limit.
Connection Retry Count
If the database connection times out, the system will automatically retry the connection until the specified number of retries is reached. If the connection still fails after the maximum number of retries, the connection is considered failed.
NoteThe default retry count is 1, and you can configure a value between 0 and 10.
The connection retry count will be applied by default to offline integration tasks and global quality (requires the asset quality function module to be enabled). In offline integration tasks, you can configure task-level retry counts separately.
Select a Default Resource Group, which is used to run tasks related to the current data source, including database SQL, offline database migration, data preview, and more.
Perform a Test Connection or directly click OK to save and complete the creation of the Doris data source.
Click Test Connection, and the system will test whether the data source can connect to Dataphin normally. If you directly click OK, the system will automatically test the connection for all selected clusters. However, even if all selected clusters fail the connection test, the data source can still be created normally.
ImportantIf the connection test fails, you can troubleshoot based on common network connectivity issues. For more information, see Network connectivity solutions.