Create a PolarDB data source

更新时间:
复制 MD 格式

By creating a PolarDB data source, you can enable Dataphin to read business data from PolarDB or write data to PolarDB. This topic describes how to create a PolarDB data source.

Prerequisites

To create a data source based on an Alibaba Cloud product in Dataphin, you must ensure that the IP address of Dataphin has been added to the database whitelist (or security group) to enable network connectivity between the data source and Dataphin. For more information, see Data source whitelist configuration.

Background information

PolarDB is a new generation of relational database independently developed by Alibaba Cloud. It is a cloud-hosted database product compatible with MySQL and PostgreSQL syntax. Under the storage-compute decoupled architecture, it leverages the advantages of software and hardware integration to provide users with database services featuring extreme elasticity, high performance, mass storage, and reliability. If you are using PolarDB, you need to first create a PolarDB data source before connecting to Dataphin for data development.For more information about PolarDB, see What is PolarDB.

Policy description

Only custom global roles with the Create Data Source permission and the super administrator, data source administrator, domain architect, and project administrator roles can create data sources.

Procedure

  1. In the top navigation bar of the Dataphin homepage, choose Management Center > Datasource Management.

  2. On the Datasource page, click +Create Data Source.

  3. On the Create Data Source page, in the Relational Database section, select PolarDB.

    If you have recently used PolarDB, you can also select PolarDB in the Recently Used section. You can also enter PolarDB keywords in the search box to quickly search for it.

  4. On the Create PolarDB Data Source page, configure the parameters for connecting to the data source.

    1. Configure the basic information of the data source.

      Parameter

      Description

      Datasource Name

      Enter a name for the data source. The name must meet the following requirements:

      • It can contain only Chinese characters, letters, digits, underscores (_), and hyphens (-).

      • The name can be up to 64 characters in length.

      Datasource Code

      After configuring the data source code, you can directly access Dataphin data source tables in Flink_SQL tasks or using the Dataphin JDBC client through the format data source code.table name or data source code.schema.table name for quick consumption. To automatically switch data sources based on the task execution environment, please access using the variable format ${data source code}.table or ${data source code}.schema.table. For more information, see Flink_SQL task development method.

      Important
      • The data source code cannot be modified after it is configured.

      • You can preview data on the object details page in the asset directory and asset checklist only after the data source code is configured.

      • In Flink SQL, only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, SelectDB, and GaussDB data warehouse service (DWS) data sources are currently supported.

      Data Source Description

      A brief description of the PolarDB data source. It cannot exceed 128 characters.

      Database Type

      Based on the underlying engine type of PolarDB, select the corresponding database type:

      • If the underlying engine of PolarDB is MySQL, select MYSQL.

      • If the underlying engine of PolarDB is PostgreSQL, select POSTGRE_SQL.

      Time Zone

      Time-formatted data in integration nodes is processed based on the current time zone. The default time zone is Asia/Shanghai. Click Modify to select a destination time zone. The following options are available:

      • GMT: GMT-12:00, GMT-11:00, GMT-10:00, GMT-09:30, GMT-09:00, GMT-08:00, GMT-07:00, GMT-06:00, GMT-05:00, GMT-04:00, GMT-03:00, GMT-03:00, GMT-02:30, GMT-02:00, GMT-01:00, GMT+00:00, GMT+01:00, GMT+02:00, GMT+03:00, GMT+03:30, GMT+04:00, GMT+04:30, GMT+05:00, GMT+05:30, GMT+05:45, GMT+06:00, GMT+06:30, GMT+07:00, GMT+08:00, GMT+08:45, GMT+09:00, GMT+09:30, GMT+10:00, GMT+10:30, GMT+11:00, GMT+12:00, GMT+12:45, GMT+13:00, GMT+14:00.

      • Daylight Saving Time: Africa/Cairo, America/Chicago, America/Denver, America/Los_Angeles, America/New_York, America/Sao_Paulo, Asia/Bangkok, Asia/Dubai, Asia/Kolkata, Asia/Shanghai, Asia/Tokyo, Atlantic/Azores, Australia/Sydney, Europe/Berlin, Europe/London, Europe/Moscow, Europe/Paris, Pacific/Auckland, Pacific/Honolulu.

      Data Source Configuration

      Based on whether the business data source distinguishes between production data sources and development data sources:

      • If the business data source distinguishes between production data sources and development data sources, select Production + Development Data Source.

      • If the business data source does not distinguish between production data sources and development data sources, select Production Data Source.

      Tag

      You can categorize and tag data sources based on tags. For information about how to create tags, see Manage data source tags.

    2. Configure the connection parameters between the data source and Dataphin.

      If you select Production + Development Data Source for Data Source Configuration, you need to configure the connection information for both Production + Development Data Source. If you select Production Data Source for Data Source Configuration, you only need to configure the connection information for the Production Data Source.

      Note

      Typically, production data sources and development data sources should be configured as different data sources to achieve environment isolation between development data sources and production data sources, reducing the impact of development data sources on production data sources. However, Dataphin also supports configuring them as the same data source, meaning with identical parameter values.

      Parameter

      Description

      JDBC URL

      Configure the corresponding connection address based on the selected database type.

      • MySQL: Format: jdbc:mysql://host:port/dbname

      • PostgreSQL: Format: jdbc:postgresql://host:port/dbname

      Schema

      When the database type is selected as PostgreSQL in the previous step, you need to configure the Schema. Schema is the database name of the PostgreSQL instance.

      Username, Password

      The username and password used to log on to the PolarDB data source engine MySQL or PostgreSQL.

      Type

      Supports Directly Connectable Database, Alibaba Cloud Database, and Self-managed Database On ECS (VPC). You can select and configure based on the database type and business requirements:

      • Directly Connectable Database: Connect directly to the database through the default scheduling cluster or registered scheduling cluster. This is suitable for scenarios such as: ① public network databases, ② databases in the same network environment as the registered scheduling cluster. To add access whitelist, you can add the public network outbound IP of the Dataphin default scheduling cluster: 47.102.192.174.

      • Alibaba Cloud Database: Databases purchased on Alibaba Cloud. Supports access via VPC Proxy or Direct Connection.

        • VPC Proxy: When the current database is an Alibaba Cloud database in a VPC network environment, please specify the authorized IP whitelist: 100.104.0.0/16.

          • Region: The region where the database is located. Only databases in the same region as the Dataphin instance are supported. If your Dataphin instance is located in China (Shanghai), only China (Shanghai) region can be selected.

          • VPC ID: Enter the VPC ID of the VPC network where the database is located. You can log on to the VPC console to view it. As shown in the following figure:

            image..png

          • VPC Instance ID: Enter the VPC instance ID of the database, which is VpcInstanceId. If an error message indicates that the instance cannot be found, you can obtain VpcCloudInstanceId by calling the DescribeDrdsInstance API.

        • Direct Connection: Connect directly to the database through the default scheduling cluster or registered scheduling cluster. To add access whitelist, you can add the public network outbound IP of the Dataphin default scheduling cluster: 47.102.192.174.

      • Self-managed Database On ECS (VPC): When the current database is an Alibaba Cloud database in a VPC network environment, please specify the authorized IP whitelist: 100.104.0.0/16.

        • Region: The region where the database is located. Only databases in the same region as the Dataphin instance are supported. If your Dataphin instance is located in China (Shanghai), only China (Shanghai) region can be selected.

        • VPC ID: Enter the VPC ID of the VPC network where the ECS is located. You can log on to the VPC console to view it. As shown in the following figure:

          image..png

        • ECS ID: Enter the ECS ID of the ECS server where the database is deployed. You can log on to the ECS console to view it. As shown in the following figure:

          image..png

    3. Configure advanced settings for the data source.

      Parameter

      Description

      connectTimeout

      The connectTimeout duration of the database (unit: milliseconds), default is 900000 milliseconds (15 minutes).

      Note
      • If you have connectTimeout configuration in the JDBC URL, the connectTimeout will be the timeout period configured in the JDBC URL.

      • For data sources created before Dataphin V3.11, the default connectTimeout is -1, indicating no timeout limit.

      socketTimeout

      The socketTimeout duration of the database (unit: milliseconds), default is 1800000 milliseconds (30 minutes).

      Note
      • If you have socketTimeout configuration in the JDBC URL, the socketTimeout will be the timeout period configured in the JDBC URL.

      • For data sources created before Dataphin V3.11, the default socketTimeout is -1, indicating no timeout limit.

      Connection Retries

      If the database connection times out, it will automatically retry connecting until the set number of retries is completed. If the maximum number of retries is reached and the connection is still unsuccessful, the connection fails.

      Note
      • The default number of retries is 1 time, and you can configure a parameter between 0 and 10.

      • The connection retry count will be applied by default to offline integration tasks and global quality (requires activation of the asset quality function module). In offline integration tasks, you can configure task-level retry counts separately.

  5. Select Default Resource Group, which is used to run tasks related to the current data source, including database SQL, offline database migration, data preview, and more.

  6. Perform a Test Connection or directly click OK to save and complete the creation of the PolarDB data source.

    Click Test Connection, and the system will test whether the data source can connect normally with Dataphin. If you directly click OK, the system will automatically test the connection for all selected clusters, but even if all selected clusters fail to connect, the data source can still be created normally.

    Important