Import Bigtable data

更新时间:
复制 MD 格式

Migrate data from Google Bigtable to a Lindorm wide table using a Spark job on the Lindorm Compute Engine. The migration runs over a virtual private cloud (VPC) connection and authenticates to Bigtable using a service account key.

Migration flow: Set up Bigtable authentication (steps 1–3) → Configure and submit the Spark job (step 4) → Monitor job status.

Prerequisites

Before you begin, make sure you have:

  • LindormTable activated. See Activate LindormTable

  • The Lindorm Compute Engine activated. See Activate the Lindorm compute engine

  • Bigtable and Lindorm deployed in the same VPC. Migration is only supported over a VPC connection; placing both instances in the same VPC minimizes latency

Step 1: Enable the Bigtable API

Enable the Bigtable API on your Google Cloud project. See Control access to Bigtable with IAM.

Step 2: Create a service account key

Create a service account key, download it, and rename the file to GOOGLE_APPLICATION_CREDENTIALS.json.

Step 3: Upload the key to Lindorm

  1. Log on to the Lindorm console.

  2. In the upper-left corner, select the region where your instance is deployed.

  3. On the Instances page, click the ID of the target instance, or click View Instance Details in the Actions column.

  4. In the left navigation pane, click Compute Engine.

  5. Click the Job Management tab.

  6. Click Upload Resources.

  7. In the Upload Resources dialog box, select GOOGLE_APPLICATION_CREDENTIALS.json.

  8. Click Upload in the upper-left corner of the dialog box.

Step 4: Configure and submit the migration job

Build the main program parameters

Define your Bigtable source and Lindorm destination in JSON. Replace the placeholder values with your actual configuration.

{
  "bigtable.projectId": "myproject",
  "bigtable.instanceId": "myInstanceId",
  "bigtable.tableName": "bigtable_name",

  "lindorm.seedServer": "ld-bp1hn6yq0yb34****-proxy-lindorm.lindorm.rds.aliyuncs.com:30020",
  "lindorm.namespace": "default",
  "lindorm.tableName": "lindorm_table_name",
  "lindorm.userName": "root",
  "lindorm.password": "test****",
  "lindorm.batchSize": 16
}

Parameter descriptions

ParameterDescription
bigtable.projectIdProject ID of the Bigtable source
bigtable.instanceIdInstance ID of the Bigtable source
bigtable.tableNameName of the source table in Bigtable
lindorm.seedServerHBase Java API endpoint for LindormTable
lindorm.namespaceDestination namespace in Lindorm
lindorm.tableNameDestination table name in Lindorm
lindorm.userNameUsername for LindormTable
lindorm.passwordPassword for LindormTable
lindorm.batchSizeNumber of rows written per batch. Default: 16. Increase for higher write throughput; reduce if you see memory overflow or garbage collection (GC) pauses.

Encode parameters as Base64

Convert the JSON above to Base64. Example output:

ewoiYmlndGFibGUucHJvamVjdElkIjoibXlwcm9qZWN0IiwKImJpZ3RhYmxlLmluc3RhbmNlSWQiOiJteUluc3RhbmNlSWQiLAoiYmlndGFibGUudGFibGVOYW1lIjoiYmlndGFibGVfbmFtZSIsCgoibGluZG9ybS5zZWVkU2VydmVyIjoibGQtKioqKioqKioqKi1wcm94eS1saW5kb3JtLmxpbmRvcm0ucmRzLmFsaXl1bmNzLmNvbTozMDAyMCIsCiJsaW5kb3JtLm5hbWVzcGFjZSI6ImRlZmF1bHQiLAoibGluZG9ybS50YWJsZU5hbWUiOiJsaW5kb3JtX3RhYmxlX25hbWUiLAoibGluZG9ybS51c2VyTmFtZSI6IioqKioqKioqKiIsCiJsaW5kb3JtLnBhc3N3b3JkIjoiKioqKioqKioqIiwKImxpbmRvcm0uYmF0Y2hTaXplIjoxNgp9

Create and run the job

  1. Log on to the Lindorm console.

  2. In the upper-left corner, select the region where your instance is deployed.

  3. On the Instances page, click the ID of the target instance, or click View Instance Details in the Actions column.

  4. In the left navigation pane, click Compute Engine.

  5. Click the Job Management tab.

  6. Click Create Job.

  7. Enter a Job Name and select a job type.

  8. Paste the following job configuration, replacing the args value with your Base64-encoded parameters:

{
  "token": "bf198279-5d1f-4aca-97f7-d16eda2f****",
  "appName": "bigtable-to-lindorm",
  "mainResource": "hdfs:///ldps-user-resource/ldps-bigtable-to-lindorm-1.0-SNAPSHOT.jar",
  "mainClass": "com.alibaba.bds.Main",
  "configs": {
    "spark.dynamicAllocation.maxExecutors": "50",
    "spark.executor.cores": 4,
    "spark.executor.memory": "11264m",
    "spark.executor.memoryOverhead": "5120m",
    "spark.kubernetes.executor.disk.size": 500
  },
  "args": ["ewoiYmlndGFibGUucHJvamVjdElkIjoibXlwcm9qZWN0IiwKImJpZ3RhYmxlLmluc3RhbmNlSWQiOiJteUluc3RhbmNlSWQiLAoiYmlndGFibGUudGFibGVOYW1lIjoiYmlndGFibGVfbmFtZSIsCgoibGluZG9ybS5zZWVkU2VydmVyIjoibGQtKioqKioqKioqKi1wcm94eS1saW5kb3JtLmxpbmRvcm0ucmRzLmFsaXl1bmNzLmNvbTozMDAyMCIsCiJsaW5kb3JtLm5hbWVzcGFjZSI6ImRlZmF1bHQiLAoibGluZG9ybS50YWJsZU5hbWUiOiJsaW5kb3JtX3RhYmxlX25hbWUiLAoibGluZG9ybS51c2VyTmFtZSI6IioqKioqKioqKiIsCiJsaW5kb3JtLnBhc3N3b3JkIjoiKioqKioqKioqIiwKImxpbmRvcm0uYmF0Y2hTaXplIjoxNgp9"]
}

Parameter descriptions

ParameterDescription
tokenAuthentication token for submitting Spark jobs to the compute resource. Find it in the Lindorm console: click the instance ID, go to Database Connections, then switch to the Compute Engine tab.
appNameJob identifier name
mainResourcePath to the JAR package in Hadoop Distributed File System (HDFS) or Object Storage Service (OSS)
mainClassEntry point class of the JAR job
configsSpark executor settings. Key parameters: spark.dynamicAllocation.maxExecutors (max number of executors), spark.executor.cores (compute slots per executor), spark.executor.memory (heap memory per executor in MB), spark.executor.memoryOverhead (off-heap memory per executor in MB), spark.kubernetes.executor.disk.size (local disk size per executor in GB)
argsBase64-encoded main program parameters
  1. Click Save, then click Run in the upper-right corner.

Monitor job status

Click Jobs to view the status and details of the running job.

Troubleshooting

Bigtable read errors

Symptom: The job fails immediately or cannot connect to Bigtable.

  • API not enabled: Verify the Bigtable API is enabled in your Google Cloud project. See Step 1.

  • Authentication error: Confirm that GOOGLE_APPLICATION_CREDENTIALS.json was uploaded correctly and that the service account has read access to the Bigtable table.

Lindorm write errors

Symptom: The job starts but fails partway through, with errors related to memory or GC.

  • Memory overflow or GC pressure: Reduce lindorm.batchSize (for example, from 16 to 8) to lower the per-batch memory footprint, or increase spark.executor.memory and spark.executor.memoryOverhead.

  • Connection failure: Verify that lindorm.seedServer is the correct HBase Java API endpoint for your LindormTable instance and that both instances are in the same VPC.

Performance tuning

Two settings have the most impact on migration throughput:

  • `spark.dynamicAllocation.maxExecutors`: More executors increase parallelism but consume more cluster resources. Start with 50 and adjust based on your cluster capacity and data volume.

  • `lindorm.batchSize`: A larger value increases write throughput per batch. If you see memory overflow or frequent GC pauses, reduce this value.