Import database data to DataHub

更新时间:
复制 MD 格式

To simplify real-time data import from databases into DataHub, DataHub and Data Integration have jointly developed the "Real-time Database Import" feature. This feature streamlines the data ingestion workflow into DataHub. The process is as follows:

demo1

Walkthrough

For Alibaba Cloud accounts:

Log in to the DataHub console on the public cloud, select the target Project, and click 'Real-time Database Import' in the upper-right corner.

If no DataWorks project exists, Data Integration automatically creates a default workspace for you named Data Integration default workspace (di_${Alibaba Cloud account Uid}).

Click 'Create Import Task'.

In the Add DataHub data source dialog box, enter the Data source name, Data source description, DataHub endpoint, DataHub Project, AccessKey ID, and AccessKey Secret, and then click Complete.

Create a MySQL data source

When you create a MySQL data source, select a Data source type (Alibaba Cloud instance mode or connection string mode) and enter required information such as Data source name, Data source description, region, RDS instance ID, Primary account ID of RDS instance, Default database name, Username, and Password. You can enable the Read from standby instance first switch as needed.

Create an exclusive resource group for Data Integration and then test the connection.

Next, configure the synchronization source and rules.

Next, set the destination Topic.

Complete the configuration and start the task.

For RAM users:

A RAM user with the required permissions can log in to the DataHub console on the public cloud, select the target Project, and follow the same procedure as an Alibaba Cloud account. If the RAM user does not have a DataWorks project, Data Integration automatically creates a default workspace for them named Data Integration default workspace (di_${Alibaba Cloud account Uid}).

Best practices for granting permissions to a RAM user:

To grant a RAM user access to only a specific Project (for example, test_ss), use the following custom policy:

{
  "Statement": [
    {
      "Action": [
        "dhs:ListProject",
        "dhs:InitializeDataImportProcess"
      ],
      "Effect": "Allow",
      "Resource": "acs:dhs:*:*:projects/*"
    },
    {
      "Action": [
        "dhs:GetProject"
      ],
      "Effect": "Allow",
      "Resource": "acs:dhs:*:*:projects/test_ss"
    },
    {
      "Action": [
        "dhs:*Topic",
        "dhs:*Shard",
        "dhs:*Subscription",
        "dhs:*Connector",
        "dhs:*Records"
      ],
      "Effect": "Allow",
      "Resource": "acs:dhs:*:*:projects/test_ss/topics/*"
    },
    {
      "Action": "ram:CreateServiceLinkedRole",
      "Resource": "*",
      "Effect": "Allow",
      "Condition": {
        "StringEquals": {
          "ram:ServiceName": [                      
                  "dwconnection.datahub.aliyuncs.com"
          ]
        }
      }
    }
  ],
  "Version": "1"
}