This topic describes how to incrementally archive data from HBase to MaxCompute.
Important
The Incremental Archive to MaxCompute feature was discontinued on June 16, 2023. This feature is not available for LTS instances purchased after this date. If your LTS instance was purchased before June 16, 2023, you can continue to use this feature.
Prerequisites
-
LTS is activated.
-
An HBase data source is added.
-
A MaxCompute data source is added.
Supported versions
-
Self-managed HBase 1.x and 2.x.
-
EMR HBase.
-
ApsaraDB for HBase Standard Edition, ApsaraDB for HBase Performance-enhanced Edition (in cluster mode), and Lindorm.
Limitations
-
Real-time data archiving is based on the HBase write-ahead log (WAL). Therefore, data imported via bulk loading cannot be exported.
Log lifecycle
-
After you enable archiving, logs are retained for 48 hours by default if they are not consumed. After this period, the log subscription is automatically canceled, and the retained data is automatically deleted.
-
If you release an LTS instance without stopping the synchronization task, the task pauses and data consumption stops.
Submit an archiving job
-
Go to the LTS console. In the left-side navigation pane, choose Lindorm/HBase Export > Incremental Archive to MaxCompute.
-
Click Create Job. On the job creation page, specify a Task Name (optional), select the source cluster and destination cluster, and specify the HBase tables to export. In the Table Mapping section, configure the column mapping JSON. You can set the
tableModeparameter towideTable(wide table mode). Before you submit the job, verify that the log retention period in the source cluster, defined by thehbase.master.logcleaner.ttlparameter, is long enough to prevent job failures. After you complete the configuration, click Create. This configuration archives real-time data from thewal-testHBase table to MaxCompute.-
The archived columns are
cf1:a,cf1:b,cf1:c, andcf1:d. -
The
mergeIntervalparameter specifies the archiving interval in milliseconds. The default is86400000(one day). -
The
mergeStartAtparameter sets the job's start time inyyyyMMddHHmmssformat. You can use a past timestamp, such as20190930000000, which starts archiving data from 00:00:00 on September 30, 2019.
-
-
View the table archiving progress. After submitting the job, go to the job details page. The Real-time Synchronization Channel section shows the synchronization latency and offset. The Table Merge section shows the merge jobs. After a merge is complete, you can query the latest partition of the table in MaxCompute. The page shows the status for different components. For example, the status for Table Creation Details is SUCCEEDED, the Real-time Synchronization Channel is RUNNING with a 4520 ms synchronization latency, and the Table Merge is RUNNING with 47.50% progress.
-
Log on to the MaxCompute console and query the table data. Run a
SELECTstatement, such asSELECT * from hbase2odps.wal_test_xxx WHERE pt = 'xxxxxxxx', to query the archived data. Verify that the result contains columns such as rowkey, cf1_string, cf1_long, cf1_short, cf1_bigdecimal, cf1_double, cf1_float, cf1_boolean, cf1_null, and pt. This confirms that the data was successfully archived from HBase to MaxCompute.
Parameters
The configuration for each exported table uses the following format:
hbaseTable/odpsTable {"cols": ["cf1:a|string", "cf1:b|int", "cf1:c|long", "cf1:d|short","cf1:e|decimal", "cf1:f|double","cf1:g|float","cf1:h|boolean","cf1:i"], "mergeInterval": 86400000, "mergeStartAt": "20191008100547"}
hbaseTable/odpsTable {"cols": ["cf1:a", "cf1:b", "cf1:c"], "mergeStartAt": "20191008000000"}
hbaseTable {"mergeEnabled": false} // No merge operation is performed.
The export configuration consists of three parts: hbaseTable, odpsTable, and tbConf.
-
hbaseTable: Specifies the source HBase table. -
odpsTable: Optional. Specifies the name of the destination table. By default, this name is the same as the HBase table name. MaxCompute table names do not support periods (.) or hyphens (-); these characters are automatically converted to underscores (_). -
tbConf: Specifies the archiving behavior for the table. The following table describes the supported parameters.
|
Parameter |
Description |
Example |
|
cols |
Specifies the columns to export and their data types. If you omit a data type, the system converts the value to the HexString format. |
"cols": ["cf1:a", "cf1:b", "cf1:c"] |
|
mergeEnabled |
Specifies whether to convert a key-value (KV) table into a wide table. Default: |
"mergeEnabled": false |
|
mergeStartAt |
Specifies the start time for the merge operation. You can specify a past time. The value must be in the |
"mergeStartAt": "20191008000000" |
|
mergeInterval |
Specifies the interval at which merge operations are performed, in milliseconds. The default value is |
"mergeInterval": 86400000 |