Data import methods

更新时间:
复制 MD 格式

DataWorks is an end-to-end big data development and governance platform from Alibaba Cloud that integrates data integration, data development, and data operations. You can use DataWorks to configure an import task that performs a full import of data from sources such as MySQL, PolarDB, PostgreSQL, Oracle, SQL Server, and Cassandra into LindormTable. This topic describes how to configure a Lindorm import task in DataWorks.

Prerequisites

Add the client IP address to the Lindorm whitelist.

Usage notes

  • To use public network access or a single-node Lindorm instance, you must first upgrade the SDK and update the configuration. For more information, see Step 1 in Connect to and use LindormTable by using the HBase Java API.

  • If your application on an ECS instance accesses the Lindorm instance over a Virtual Private Cloud (VPC), both instances must meet the following conditions for network connectivity.

    • The instances must be in the same region. We recommend that you place them in the same zone to reduce network latency.

    • The ECS instance and the Lindorm instance must be in the same VPC.

Step 1: Create a workspace

Before you configure an import task, create a workspace in DataWorks to manage subsequent data development and tasks.

Step 2: Create a resource group

A resource group helps you allocate resources and manage user permissions within your account.

The following table describes the available resource group types.

Resource type

Configuration document

Notes

exclusive resource group

Exclusive resource group mode

Exclusive resource groups cannot be used across regions. For example, an exclusive resource group in the China (Shanghai) region can be used only by workspaces in the same region and cannot be bound to VPCs in other regions. Additionally, an exclusive resource group cannot access a Lindorm cluster across VSwitches.

default resource group

None

DataWorks charges additional fees for accessing Lindorm over the public network.

Step 3: Network configuration

Configure the network for your resource group type to ensure connectivity between DataWorks and your Lindorm instance.

Exclusive resource group

  1. On the Instance Details page of your Lindorm instance, find the VPC and VSwitch information in the Basic Information section.

  2. Bind the DataWorks exclusive resource group to the VPC of the Lindorm instance.

  3. In the VPC console, get the IPv4 CIDR block of the VPC and VSwitch that are bound to the DataWorks exclusive resource group. From the left navigation pane, click VPC. Click the target VPC instance to go to its details page. On the Basic Information tab, find and copy the value of IPv4 CIDR Block.

  4. Add this IPv4 CIDR block to the Lindorm whitelist.

Default resource group

See Add a whitelist to find the IP addresses for your region, and add them to the Lindorm whitelist.

Step 4: Create sync task

Create a data import offline sync task.

Step 5: Modify task configuration

Important

The lindorm.client.seedserver parameter in the sample script specifies the HBase Java API-compatible endpoint for LindormTable.

Step 6: Submit and publish task

To run the task periodically, publish it to the production environment. For more information about publishing tasks, see Publish a task.