Migrate Bigtable data to Lindorm wide table-Lindorm(Lindorm)-阿里云帮助中心

Migrate data from Google Bigtable to a Lindorm wide table using a Spark job on the Lindorm Compute Engine. The migration runs over a virtual private cloud (VPC) connection and authenticates to Bigtable using a service account key.

Migration flow: Set up Bigtable authentication (steps 1–3) → Configure and submit the Spark job (step 4) → Monitor job status.

Prerequisites

Before you begin, make sure you have:

LindormTable activated. See Activate LindormTable
The Lindorm Compute Engine activated. See Activate the Lindorm compute engine
Bigtable and Lindorm deployed in the same VPC. Migration is only supported over a VPC connection; placing both instances in the same VPC minimizes latency

Step 1: Enable the Bigtable API

Enable the Bigtable API on your Google Cloud project. See Control access to Bigtable with IAM.

Step 2: Create a service account key

Create a service account key, download it, and rename the file to GOOGLE_APPLICATION_CREDENTIALS.json.

Step 3: Upload the key to Lindorm

Log on to the Lindorm console.
In the upper-left corner, select the region where your instance is deployed.
On the Instances page, click the ID of the target instance, or click View Instance Details in the Actions column.
In the left navigation pane, click Compute Engine.
Click the Job Management tab.
Click Upload Resources.
In the Upload Resources dialog box, select GOOGLE_APPLICATION_CREDENTIALS.json.
Click Upload in the upper-left corner of the dialog box.

Step 4: Configure and submit the migration job

Build the main program parameters

Define your Bigtable source and Lindorm destination in JSON. Replace the placeholder values with your actual configuration.

{
  "bigtable.projectId": "myproject",
  "bigtable.instanceId": "myInstanceId",
  "bigtable.tableName": "bigtable_name",

  "lindorm.seedServer": "ld-bp1hn6yq0yb34****-proxy-lindorm.lindorm.rds.aliyuncs.com:30020",
  "lindorm.namespace": "default",
  "lindorm.tableName": "lindorm_table_name",
  "lindorm.userName": "root",
  "lindorm.password": "test****",
  "lindorm.batchSize": 16
}

Parameter descriptions

Parameter	Description
`bigtable.projectId`	Project ID of the Bigtable source
`bigtable.instanceId`	Instance ID of the Bigtable source
`bigtable.tableName`	Name of the source table in Bigtable
`lindorm.seedServer`	HBase Java API endpoint for LindormTable
`lindorm.namespace`	Destination namespace in Lindorm
`lindorm.tableName`	Destination table name in Lindorm
`lindorm.userName`	Username for LindormTable
`lindorm.password`	Password for LindormTable
`lindorm.batchSize`	Number of rows written per batch. Default: `16`. Increase for higher write throughput; reduce if you see memory overflow or garbage collection (GC) pauses.

Encode parameters as Base64

Convert the JSON above to Base64. Example output:

ewoiYmlndGFibGUucHJvamVjdElkIjoibXlwcm9qZWN0IiwKImJpZ3RhYmxlLmluc3RhbmNlSWQiOiJteUluc3RhbmNlSWQiLAoiYmlndGFibGUudGFibGVOYW1lIjoiYmlndGFibGVfbmFtZSIsCgoibGluZG9ybS5zZWVkU2VydmVyIjoibGQtKioqKioqKioqKi1wcm94eS1saW5kb3JtLmxpbmRvcm0ucmRzLmFsaXl1bmNzLmNvbTozMDAyMCIsCiJsaW5kb3JtLm5hbWVzcGFjZSI6ImRlZmF1bHQiLAoibGluZG9ybS50YWJsZU5hbWUiOiJsaW5kb3JtX3RhYmxlX25hbWUiLAoibGluZG9ybS51c2VyTmFtZSI6IioqKioqKioqKiIsCiJsaW5kb3JtLnBhc3N3b3JkIjoiKioqKioqKioqIiwKImxpbmRvcm0uYmF0Y2hTaXplIjoxNgp9

Create and run the job

Log on to the Lindorm console.
In the upper-left corner, select the region where your instance is deployed.
On the Instances page, click the ID of the target instance, or click View Instance Details in the Actions column.
In the left navigation pane, click Compute Engine.
Click the Job Management tab.
Click Create Job.
Enter a Job Name and select a job type.
Paste the following job configuration, replacing the args value with your Base64-encoded parameters:

{
  "token": "bf198279-5d1f-4aca-97f7-d16eda2f****",
  "appName": "bigtable-to-lindorm",
  "mainResource": "hdfs:///ldps-user-resource/ldps-bigtable-to-lindorm-1.0-SNAPSHOT.jar",
  "mainClass": "com.alibaba.bds.Main",
  "configs": {
    "spark.dynamicAllocation.maxExecutors": "50",
    "spark.executor.cores": 4,
    "spark.executor.memory": "11264m",
    "spark.executor.memoryOverhead": "5120m",
    "spark.kubernetes.executor.disk.size": 500
  },
  "args": ["ewoiYmlndGFibGUucHJvamVjdElkIjoibXlwcm9qZWN0IiwKImJpZ3RhYmxlLmluc3RhbmNlSWQiOiJteUluc3RhbmNlSWQiLAoiYmlndGFibGUudGFibGVOYW1lIjoiYmlndGFibGVfbmFtZSIsCgoibGluZG9ybS5zZWVkU2VydmVyIjoibGQtKioqKioqKioqKi1wcm94eS1saW5kb3JtLmxpbmRvcm0ucmRzLmFsaXl1bmNzLmNvbTozMDAyMCIsCiJsaW5kb3JtLm5hbWVzcGFjZSI6ImRlZmF1bHQiLAoibGluZG9ybS50YWJsZU5hbWUiOiJsaW5kb3JtX3RhYmxlX25hbWUiLAoibGluZG9ybS51c2VyTmFtZSI6IioqKioqKioqKiIsCiJsaW5kb3JtLnBhc3N3b3JkIjoiKioqKioqKioqIiwKImxpbmRvcm0uYmF0Y2hTaXplIjoxNgp9"]
}

Parameter descriptions

Parameter	Description
`token`	Authentication token for submitting Spark jobs to the compute resource. Find it in the Lindorm console: click the instance ID, go to Database Connections, then switch to the Compute Engine tab.
`appName`	Job identifier name
`mainResource`	Path to the JAR package in Hadoop Distributed File System (HDFS) or Object Storage Service (OSS)
`mainClass`	Entry point class of the JAR job
`configs`	Spark executor settings. Key parameters: `spark.dynamicAllocation.maxExecutors` (max number of executors), `spark.executor.cores` (compute slots per executor), `spark.executor.memory` (heap memory per executor in MB), `spark.executor.memoryOverhead` (off-heap memory per executor in MB), `spark.kubernetes.executor.disk.size` (local disk size per executor in GB)
`args`	Base64-encoded main program parameters

Click Save, then click Run in the upper-right corner.

Monitor job status

Click Jobs to view the status and details of the running job.

Troubleshooting

Bigtable read errors

Symptom: The job fails immediately or cannot connect to Bigtable.

API not enabled: Verify the Bigtable API is enabled in your Google Cloud project. See Step 1.
Authentication error: Confirm that GOOGLE_APPLICATION_CREDENTIALS.json was uploaded correctly and that the service account has read access to the Bigtable table.

Lindorm write errors

Symptom: The job starts but fails partway through, with errors related to memory or GC.

Memory overflow or GC pressure: Reduce lindorm.batchSize (for example, from 16 to 8) to lower the per-batch memory footprint, or increase spark.executor.memory and spark.executor.memoryOverhead.
Connection failure: Verify that lindorm.seedServer is the correct HBase Java API endpoint for your LindormTable instance and that both instances are in the same VPC.

Performance tuning

Two settings have the most impact on migration throughput:

`spark.dynamicAllocation.maxExecutors`: More executors increase parallelism but consume more cluster resources. Start with 50 and adjust based on your cluster capacity and data volume.
`lindorm.batchSize`: A larger value increases write throughput per batch. If you see memory overflow or frequent GC pauses, reduce this value.