Create an OushuDB data source
An OushuDB data source allows Dataphin to read data from or write data to OushuDB. This topic describes how to create one.
Limitations
To create data sources, you must have a custom global role with the New Data Source permission or one of the following system roles: super administrator, data source administrator, data domain architect, or project administrator.
Procedure
On the Dataphin home page, choose Administration Center > Data Source Management from the top navigation bar.
On the Data Sources page, click + New Data Source.
On the New Data Source page, in the Relational Database section, select OushuDB.
If you have recently used OushuDB, you can also select OushuDB from the Recently Used section. Alternatively, you can enter keywords for OushuDB in the search box to perform a quick search.
On the New OushuDB Data Source page, configure the connection parameters for the data source.
Configure the basic information for the data source.
Parameter
Description
Data source name
The naming conventions are as follows:
The name can contain only Chinese characters, uppercase and lowercase letters, digits, underscores (_), and hyphens (-).
It must be 64 characters or less.
Data source code
After you configure the data source code, you can reference tables from this data source in Flink_SQL tasks by using the format
data_source_code.table_nameordata_source_code.schema.table_name. To automatically access the data source of the current environment, use the variable format${data_source_code}.tableor${data_source_code}.schema.table. For more information, see Dataphin data source table development methods.ImportantThe data source code cannot be modified after it is configured.
You can preview data on the object details page in the asset directory and asset checklist only after the data source code is configured.
In Flink SQL, only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, SelectDB, and GaussDB data warehouse service (DWS) data sources are currently supported.
Version
Only version 6.4.0 is supported.
Data source description
A description of the OushuDB data source, up to 128 characters long.
Time zone
Dataphin uses the selected time zone to process time-based data in integration tasks. The default is Asia/Shanghai. Click Modify to select a different time zone:
GMT: GMT-12:00, GMT-11:00, GMT-10:00, GMT-09:30, GMT-09:00, GMT-08:00, GMT-07:00, GMT-06:00, GMT-05:00, GMT-04:00, GMT-03:00, GMT-03:00, GMT-02:30, GMT-02:00, GMT-01:00, GMT+00:00, GMT+01:00, GMT+02:00, GMT+03:00, GMT+03:30, GMT+04:00, GMT+04:30, GMT+05:00, GMT+05:30, GMT+05:45, GMT+06:00, GMT+06:30, GMT+07:00, GMT+08:00, GMT+08:45, GMT+09:00, GMT+09:30, GMT+10:00, GMT+10:30, GMT+11:00, GMT+12:00, GMT+12:45, GMT+13:00, and GMT+14:00.
Daylight saving time: Africa/Cairo, America/Chicago, America/Denver, America/Los_Angeles, America/New_York, America/Sao_Paulo, Asia/Bangkok, Asia/Dubai, Asia/Kolkata, Asia/Shanghai, Asia/Tokyo, Atlantic/Azores, Australia/Sydney, Europe/Berlin, Europe/London, Europe/Moscow, Europe/Paris, Pacific/Auckland, and Pacific/Honolulu.
Data source configuration
Select an option based on whether your data source has separate production and development environments:
If your data source is separated into production and development environments, select production + dev data source.
If your data source does not have separate environments, select production data source.
Tag
You can add tags to classify the data source. To learn how to create tags, see Manage data source tags.
Configure the connection parameters between the data source and Dataphin.
If you selected production + dev data source, you must configure connection information for both the production and development environments. If you selected production data source, configure connection information for the production environment only.
NoteFor best practice, configure the production and development data sources as separate instances to isolate the environments and prevent development activities from affecting production. However, Dataphin allows you to use the same data source for both by entering identical parameter values.
For configuration method, you can select JDBC URL or host. The default option is JDBC URL.
JDBC URL
Parameter
Description
JDBC URL
The format of the connection URL is
jdbc:oushudb://host1:port1,host2:port2/database.Schema
Enter the schema for your username.
Username and Password
Enter the username and password for database authentication. To ensure that tasks run as expected, make sure the account has the required data permissions.
Host
Host configuration
Parameter
Description
Server address
Enter the IP address and port number of the server.
You can click + Add to add multiple IP addresses and port numbers. Click the
icon to delete an entry. You must keep at least one entry.dbname
Enter the database name.
Parameter configuration
Parameter
Description
Parameter
Parameter name: Select an existing parameter name or enter a custom one.
A custom parameter name can contain only uppercase and lowercase letters, digits, periods (.), underscores (_), and hyphens (-).
Parameter value: This field is required if a parameter name is selected. The value can contain only letters, digits, periods (.), underscores (_), and hyphens (-), and must be 256 characters or less.
NoteYou can click + Add Parameter to add multiple parameters, and click the
icon to delete a parameter. You can add up to 30 parameters.Schema
Enter the schema that is associated with the username.
Username and Password
Enter the username and password for database authentication. To ensure that tasks run as expected, make sure the account has the required data permissions.
NoteIf you create a data source by using the host method and later switch to the JDBC URL method, the system automatically constructs the JDBC URL from the specified IP addresses and ports.
Configure advanced settings for the data source.
Parameter
Description
connectTimeout
The connection timeout for the database, in seconds. The default value is 900 seconds (15 minutes).
NoteIf you also configure the
connectTimeoutparameter in the JDBC URL, the value in the JDBC URL takes precedence.socketTimeout
The socket timeout for the database, in seconds. The default value is 1,800 seconds (30 minutes).
NoteIf you also configure the
socketTimeoutparameter in the JDBC URL, the value in the JDBC URL takes precedence.Connection retries
If the database connection times out, the system automatically retries the connection until the specified number of retries is reached. If the connection still fails after the maximum number of retries, the connection attempt fails.
NoteThe default number of retries is 1. You can set this value to an integer from 0 to 10.
The number of connection retries applies by default to offline integration tasks and Global Quality. The Global Quality feature must be enabled. For offline integration tasks, you can also configure the number of retries at the task level.
NoteDuplicate parameters are resolved based on the following precedence rules:
If a parameter is specified in the JDBC URL, advanced settings, and host configuration, the value in the JDBC URL takes precedence.
If a parameter is specified in both the JDBC URL and advanced settings, the value in the JDBC URL takes precedence.
If a parameter is specified in both the advanced settings and the host configuration, the value in the advanced settings takes precedence.
Select a default resource group. This resource group is used to run tasks related to the data source, such as database SQL, offline full-database migration, and data preview.
Click test connection or click OK to save the configuration and create the OushuDB data source.
When you click test connection, the system verifies that Dataphin can connect to the data source. If you click OK directly, the system automatically tests the connection for all selected clusters. You can create the data source even if the connection tests fail.
Test Connection tests the connection for the Default Cluster or Registered Scheduling Clusters that have been registered in Dataphin and are in normal use. The Default Cluster is selected by default and cannot be deselected. If there are no resource groups under a Registered Scheduling Cluster, connection testing is not supported. You need to create a resource group first before testing the connection.
The selected clusters are only used to test network connectivity with the current data source and are not used for running related tasks later.
The test connection usually takes less than 2 minutes. If it times out, you can click the
icon to view the specific reason and retry.Regardless of whether the test result is Connection Failed, Connection Successful, or Succeeded With Warning, the system will record the generation time of the final result.
NoteOnly the test results for the Default Cluster include three connection statuses: Succeeded With Warning, Connection Successful, and Connection Failed. The test results for Registered Scheduling Clusters in Dataphin only include two connection statuses: Connection Successful and Connection Failed.
When the test result is Connection Failed, you can click the
icon to view the specific failure reason.When the test result is Succeeded With Warning, it means that the application cluster connection is successful but the scheduling cluster connection failed. The current data source cannot be used for data development and integration. You can click the
icon to view the log information.