Migrate data from Google Bigtable to a Lindorm wide table using a Spark job on the Lindorm Compute Engine. The migration runs over a virtual private cloud (VPC) connection and authenticates to Bigtable using a service account key.
Migration flow: Set up Bigtable authentication (steps 1–3) → Configure and submit the Spark job (step 4) → Monitor job status.
Prerequisites
Before you begin, make sure you have:
LindormTable activated. See Activate LindormTable
The Lindorm Compute Engine activated. See Activate the Lindorm compute engine
Bigtable and Lindorm deployed in the same VPC. Migration is only supported over a VPC connection; placing both instances in the same VPC minimizes latency
Step 1: Enable the Bigtable API
Enable the Bigtable API on your Google Cloud project. See Control access to Bigtable with IAM.
Step 2: Create a service account key
Create a service account key, download it, and rename the file to GOOGLE_APPLICATION_CREDENTIALS.json.
Step 3: Upload the key to Lindorm
Log on to the Lindorm console.
In the upper-left corner, select the region where your instance is deployed.
On the Instances page, click the ID of the target instance, or click View Instance Details in the Actions column.
In the left navigation pane, click Compute Engine.
Click the Job Management tab.
Click Upload Resources.
In the Upload Resources dialog box, select
GOOGLE_APPLICATION_CREDENTIALS.json.Click Upload in the upper-left corner of the dialog box.
Step 4: Configure and submit the migration job
Build the main program parameters
Define your Bigtable source and Lindorm destination in JSON. Replace the placeholder values with your actual configuration.
{
"bigtable.projectId": "myproject",
"bigtable.instanceId": "myInstanceId",
"bigtable.tableName": "bigtable_name",
"lindorm.seedServer": "ld-bp1hn6yq0yb34****-proxy-lindorm.lindorm.rds.aliyuncs.com:30020",
"lindorm.namespace": "default",
"lindorm.tableName": "lindorm_table_name",
"lindorm.userName": "root",
"lindorm.password": "test****",
"lindorm.batchSize": 16
}Parameter descriptions
| Parameter | Description |
|---|---|
bigtable.projectId | Project ID of the Bigtable source |
bigtable.instanceId | Instance ID of the Bigtable source |
bigtable.tableName | Name of the source table in Bigtable |
lindorm.seedServer | HBase Java API endpoint for LindormTable |
lindorm.namespace | Destination namespace in Lindorm |
lindorm.tableName | Destination table name in Lindorm |
lindorm.userName | Username for LindormTable |
lindorm.password | Password for LindormTable |
lindorm.batchSize | Number of rows written per batch. Default: 16. Increase for higher write throughput; reduce if you see memory overflow or garbage collection (GC) pauses. |
Encode parameters as Base64
Convert the JSON above to Base64. Example output:
ewoiYmlndGFibGUucHJvamVjdElkIjoibXlwcm9qZWN0IiwKImJpZ3RhYmxlLmluc3RhbmNlSWQiOiJteUluc3RhbmNlSWQiLAoiYmlndGFibGUudGFibGVOYW1lIjoiYmlndGFibGVfbmFtZSIsCgoibGluZG9ybS5zZWVkU2VydmVyIjoibGQtKioqKioqKioqKi1wcm94eS1saW5kb3JtLmxpbmRvcm0ucmRzLmFsaXl1bmNzLmNvbTozMDAyMCIsCiJsaW5kb3JtLm5hbWVzcGFjZSI6ImRlZmF1bHQiLAoibGluZG9ybS50YWJsZU5hbWUiOiJsaW5kb3JtX3RhYmxlX25hbWUiLAoibGluZG9ybS51c2VyTmFtZSI6IioqKioqKioqKiIsCiJsaW5kb3JtLnBhc3N3b3JkIjoiKioqKioqKioqIiwKImxpbmRvcm0uYmF0Y2hTaXplIjoxNgp9Create and run the job
Log on to the Lindorm console.
In the upper-left corner, select the region where your instance is deployed.
On the Instances page, click the ID of the target instance, or click View Instance Details in the Actions column.
In the left navigation pane, click Compute Engine.
Click the Job Management tab.
Click Create Job.
Enter a Job Name and select a job type.
Paste the following job configuration, replacing the
argsvalue with your Base64-encoded parameters:
{
"token": "bf198279-5d1f-4aca-97f7-d16eda2f****",
"appName": "bigtable-to-lindorm",
"mainResource": "hdfs:///ldps-user-resource/ldps-bigtable-to-lindorm-1.0-SNAPSHOT.jar",
"mainClass": "com.alibaba.bds.Main",
"configs": {
"spark.dynamicAllocation.maxExecutors": "50",
"spark.executor.cores": 4,
"spark.executor.memory": "11264m",
"spark.executor.memoryOverhead": "5120m",
"spark.kubernetes.executor.disk.size": 500
},
"args": ["ewoiYmlndGFibGUucHJvamVjdElkIjoibXlwcm9qZWN0IiwKImJpZ3RhYmxlLmluc3RhbmNlSWQiOiJteUluc3RhbmNlSWQiLAoiYmlndGFibGUudGFibGVOYW1lIjoiYmlndGFibGVfbmFtZSIsCgoibGluZG9ybS5zZWVkU2VydmVyIjoibGQtKioqKioqKioqKi1wcm94eS1saW5kb3JtLmxpbmRvcm0ucmRzLmFsaXl1bmNzLmNvbTozMDAyMCIsCiJsaW5kb3JtLm5hbWVzcGFjZSI6ImRlZmF1bHQiLAoibGluZG9ybS50YWJsZU5hbWUiOiJsaW5kb3JtX3RhYmxlX25hbWUiLAoibGluZG9ybS51c2VyTmFtZSI6IioqKioqKioqKiIsCiJsaW5kb3JtLnBhc3N3b3JkIjoiKioqKioqKioqIiwKImxpbmRvcm0uYmF0Y2hTaXplIjoxNgp9"]
}Parameter descriptions
| Parameter | Description |
|---|---|
token | Authentication token for submitting Spark jobs to the compute resource. Find it in the Lindorm console: click the instance ID, go to Database Connections, then switch to the Compute Engine tab. |
appName | Job identifier name |
mainResource | Path to the JAR package in Hadoop Distributed File System (HDFS) or Object Storage Service (OSS) |
mainClass | Entry point class of the JAR job |
configs | Spark executor settings. Key parameters: spark.dynamicAllocation.maxExecutors (max number of executors), spark.executor.cores (compute slots per executor), spark.executor.memory (heap memory per executor in MB), spark.executor.memoryOverhead (off-heap memory per executor in MB), spark.kubernetes.executor.disk.size (local disk size per executor in GB) |
args | Base64-encoded main program parameters |
Click Save, then click Run in the upper-right corner.
Monitor job status
Click Jobs to view the status and details of the running job.
Troubleshooting
Bigtable read errors
Symptom: The job fails immediately or cannot connect to Bigtable.
API not enabled: Verify the Bigtable API is enabled in your Google Cloud project. See Step 1.
Authentication error: Confirm that
GOOGLE_APPLICATION_CREDENTIALS.jsonwas uploaded correctly and that the service account has read access to the Bigtable table.
Lindorm write errors
Symptom: The job starts but fails partway through, with errors related to memory or GC.
Memory overflow or GC pressure: Reduce
lindorm.batchSize(for example, from16to8) to lower the per-batch memory footprint, or increasespark.executor.memoryandspark.executor.memoryOverhead.Connection failure: Verify that
lindorm.seedServeris the correct HBase Java API endpoint for your LindormTable instance and that both instances are in the same VPC.
Performance tuning
Two settings have the most impact on migration throughput:
`spark.dynamicAllocation.maxExecutors`: More executors increase parallelism but consume more cluster resources. Start with
50and adjust based on your cluster capacity and data volume.`lindorm.batchSize`: A larger value increases write throughput per batch. If you see memory overflow or frequent GC pauses, reduce this value.