Configure HBase input component
The HBase input component reads data from an HBase data source. To synchronize data from HBase to other data sources, configure the HBase input component first, and then configure the target data source.
Prerequisites
-
You have purchased and enabled the high availability (HA) feature of the DataService Studio or Tag Service module to configure primary/secondary links for data sources.
-
You have created an HBase data source. For more information, see .
-
The account used to configure the HBase input component properties must have read-through permission on the data source. If you do not have the permission, you need to request it. For more information, see Request, renew, and return permissions on a data source.
Procedure
-
In the top navigation bar of the Dataphin homepage, choose Develop > Data Integration.
-
In the top navigation bar of the integration page, select a project (In Dev-Prod mode, you need to select an environment).
-
In the left-side navigation pane, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline that you want to develop to open its configuration page.
-
Click Component Library in the upper-right corner of the page to open the Component Library panel.
-
In the left-side navigation pane of the Component Library panel, select Inputs. Find the HBase component in the input component list on the right and drag it to the canvas.
-
Click the
icon in the HBase input component card to open the HBase Input Configuration dialog box. -
In the HBase Input Configuration dialog box, configure the parameters.
Parameter
Description
Step Name
The name of the HBase input component. Dataphin automatically generates a step name, which you can modify. The name must meet the following requirements:
-
It can contain only Chinese characters, letters, underscores (_), and digits.
-
It cannot exceed 64 characters in length.
Datasource
All HBase data sources in the current Dataphin instance are listed, including those for which you may not have read-through permission. Click the
icon to copy the data source name.-
For data sources for which you do not have read-through permission, you can click Request next to the data source to request read-through permission. For more information, see Request permission on a data source.
-
If you do not have an HBase data source, click Create to create one. For more information, see .
Select Link
If you have enabled the high availability feature of Tag Service and the selected HBase data source has Active/standby Links, you can select either the Active Link or Standby Link for integration. This only affects the production data source.
Table
You can enter a keyword to search for tables or enter the exact table name and click Exact Match. Click the
icon to copy the name of the selected table.Output Mode
Select an output mode: Normal Mode or Multi-version Mode (Vertical Table).
maxversion
If you select Multi-version Mode (Vertical Table) as the output mode, you need to specify maxversion.
maxversion specifies the number of versions to read. A value of -1 indicates that all versions are read.
File Encoding
Select a file encoding format: UTF-8 or GBK.
Start Rowkey
The starting rowkey for scanning. All rows with rowkeys lexicographically greater than or equal to this value are included in the scan results. For example,
aaa(string) or10110(binary).End Rowkey
The end rowkey for scanning. All rows with rowkeys lexicographically less than this value are scanned. The end rowkey itself is excluded (left-closed, right-open interval). For example, to scan all user records from
user0001touser9999in an HBase table, set the start rowkey touser0001and the end rowkey touser10000. This returns all rows withuser-prefixed rowkey values betweenuser0001anduser10000, excluding the row with rowkeyuser10000.Start Rowkey Type
Select the type of the start rowkey: String or Binary.
Output Fields
The output fields of the component.
-
Batch Add Fields.
-
Click Batch Add.
-
Configure in JSON format. For example:
// Example: [{ "name": "cf1:q1", "type": "string" }, { "name": "cf1:q2", "type": "string" }, { "name": "cf1:q3", "type": "string" }]Notenameis the column family and field name, andtypeis the field type. For example,"name":"cf1:a","type":"String"indicates that fieldain column familycf1is imported as typeString. -
Configure in TEXT format. For example:
// Example: cf1:q1,string cf1:q2,string cf1:q3,string-
The row delimiter is used to separate the information of each field. The default is a line feed (\n). Supported delimiters include line feed (\n), semicolon (;), and period (.).
-
The column delimiter is used to separate the field name and field type. The default is a comma (,).
-
-
-
Click OK.
-
-
Create A New Output Field.
Click Create Output Field, and fill in the Column Family, Column, and select the Type as prompted.
-
Manage output fields.
You can perform the following operations on added fields:
-
Click and drag the Column
icon next to to change the position of the field. -
Click the Operation
icon in the column to edit an existing field. -
Click the Operation
icon in the column to delete an existing field.
-
-
-
Click OK to complete the property configuration of the HBase input component.