Create an OpenSearch data source to write data from Dataphin to OpenSearch. This topic describes how to create an OpenSearch data source.
Limits
Before you create a data source for an Alibaba Cloud product in Dataphin, you must add the Dataphin IP address to the database whitelist or security group. This step ensures network connectivity between the data source and Dataphin. For more information, see Data source whitelist configuration.
Permissions
Only custom global roles with the Create Data Source permission and the following system roles can create data sources: super administrator, data source administrator, board architect, and project administrator.
Procedure
In the top menu bar of the Dataphin homepage, choose Management Center > Datasource Management.
On the Datasource page, click +New Data Source.
On the New Data Source page, in the NoSQL area, select OpenSearch.
If you recently used OpenSearch, you can also select it from the Recently Used area. Alternatively, you can enter `OpenSearch` in the search box to find it quickly.
On the Create OpenSearch Data Source page, configure the data source connection parameters.
Configure the basic information for the data source.
Parameter
Description
Datasource Name
Enter a name for the data source. The naming conventions are as follows:
The name can contain only Chinese characters, uppercase and lowercase letters, digits, underscores (_), and hyphens (-).
The maximum length is 64 characters.
Datasource Code
After you configure the data source code, you can directly access Dataphin data source tables in Flink_SQL nodes or using the Dataphin JDBC client. Use the format
data_source_code.table_nameordata_source_code.schema.table_namefor quick access. To automatically switch data sources based on the task execution environment, use the variable format${data_source_code}.tableor${data_source_code}.schema.table. For more information, see or Develop Flink_SQL nodes.ImportantThe data source code cannot be modified after it is configured.
You can preview data on the object details page in the asset directory and asset checklist only after the data source code is configured.
In Flink SQL, only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, SelectDB, and GaussDB data warehouse service (DWS) data sources are currently supported.
Version
Only version 2.x is supported.
Data Source Description
A brief description of the data source, up to 128 characters in length.
Data Source Configuration
Select the data source type to configure:
If your business data source has separate production and development environments, select Production + Development Data Source.
If your business data source does not have separate environments, select Production Data Source.
Tag
Add tags to the data source for classification. For more information about how to create tags, see Manage data source tags.
Configure the connection parameters between the data source and Dataphin.
If you selected Production + Development Data Source in the previous step, the configuration page for both is displayed. If you selected Production Data Source, only the configuration page for the production data source is displayed.
NoteTypically, the production and development data sources should be configured as different sources to isolate the environments and reduce the impact of the development environment on the production environment. However, Dataphin also supports configuring them as the same data source with identical parameter values.
Parameter
Description
Endpoint
Enter the endpoint for OpenSearch, for example,
http://opensearch-host:9200.Username, Password
If authentication is enabled for the OpenSearch instance, enter the username and password to access the instance.
Select the Default Resource Group. This resource group is used to run nodes related to the current data source, such as database SQL, offline full database migration, and data preview.
Click Test Connection or click OK to save the configuration and create the OpenSearch data source.
When you click Test Connection, the system tests the connectivity between the data source and Dataphin. If you click OK directly, the system automatically tests the connection for all selected clusters. The data source can be created even if all selected clusters fail the connection test.
Test Connection tests the connection for the Public Scheduling Cluster or Registered Scheduling Clusters that have been registered in Dataphin and are in normal use. The Public Scheduling Cluster is selected by default and cannot be deselected. If there are no resource groups under a Registered Scheduling Cluster, connection testing is not supported. You need to create a resource group first before testing the connection.
The selected clusters are only used to test network connectivity with the current data source and are not used for running related tasks later.
The test connection usually takes less than 2 minutes. If it times out, you can click the
icon to view the specific reason and retry.Regardless of whether the test result is Connection Failed, Connection Successful, or Succeeded With Warning, the system will record the generation time of the final result.
NoteOnly the test results for the Public Scheduling Cluster include three connection statuses: Succeeded With Warning, Connection Successful, and Connection Failed. The test results for Registered Scheduling Clusters in Dataphin only include two connection statuses: Connection Successful and Connection Failed.
When the test result is Connection Failed, you can click the
icon to view the specific failure reason.When the test result is Succeeded With Warning, it means that the application cluster connection is successful but the scheduling cluster connection failed. The current data source cannot be used for data development and integration. You can click the
icon to view the log information.
ImportantIf the connection test fails, you can troubleshoot the issue based on common network connectivity problems. For more information, see Network connectivity solutions.
If the connection test returns the
VPC_GRANT_ACCESS_API_ERRORerror, see Solution for the VPC_GRANT_ACCESS_API_ERROR error to resolve the issue.