Configure HBase input component

更新时间: 2026-06-23 11:35:19

The HBase input component reads data from an HBase data source. To synchronize data from HBase to other data sources, configure the HBase input component first, and then configure the target data source.

Prerequisites

  • You have purchased and enabled the high availability (HA) feature of the DataService Studio or Tag Service module to configure primary/secondary links for data sources.

  • You have created an HBase data source. For more information, see .

  • The account used to configure the HBase input component properties must have read-through permission on the data source. If you do not have the permission, you need to request it. For more information, see Request, renew, and return permissions on a data source.

Procedure

  1. In the top navigation bar of the Dataphin homepage, choose Develop > Data Integration.

  2. In the top navigation bar of the integration page, select a project (In Dev-Prod mode, you need to select an environment).

  3. In the left-side navigation pane, click Batch Pipeline. In the Batch Pipeline list, click the offline pipeline that you want to develop to open its configuration page.

  4. Click Component Library in the upper-right corner of the page to open the Component Library panel.

  5. In the left-side navigation pane of the Component Library panel, select Inputs. Find the HBase component in the input component list on the right and drag it to the canvas.

  6. Click the image icon in the HBase input component card to open the HBase Input Configuration dialog box.

  7. In the HBase Input Configuration dialog box, configure the parameters.

    Parameter

    Description

    Step Name

    The name of the HBase input component. Dataphin automatically generates a step name, which you can modify. The name must meet the following requirements:

    • It can contain only Chinese characters, letters, underscores (_), and digits.

    • It cannot exceed 64 characters in length.

    Datasource

    All HBase data sources in the current Dataphin instance are listed, including those for which you may not have read-through permission. Click the image icon to copy the data source name.

    • For data sources for which you do not have read-through permission, you can click Request next to the data source to request read-through permission. For more information, see Request permission on a data source.

    • If you do not have an HBase data source, click Create to create one. For more information, see .

    Select Link

    If you have enabled the high availability feature of Tag Service and the selected HBase data source has Active/standby Links, you can select either the Active Link or Standby Link for integration. This only affects the production data source.

    Table

    You can enter a keyword to search for tables or enter the exact table name and click Exact Match. Click the image icon to copy the name of the selected table.

    Output Mode

    Select an output mode: Normal Mode or Multi-version Mode (Vertical Table).

    maxversion

    If you select Multi-version Mode (Vertical Table) as the output mode, you need to specify maxversion.

    maxversion specifies the number of versions to read. A value of -1 indicates that all versions are read.

    File Encoding

    Select a file encoding format: UTF-8 or GBK.

    Start Rowkey

    The starting rowkey for scanning. All rows with rowkeys lexicographically greater than or equal to this value are included in the scan results. For example, aaa (string) or 10110 (binary).

    End Rowkey

    The end rowkey for scanning. All rows with rowkeys lexicographically less than this value are scanned. The end rowkey itself is excluded (left-closed, right-open interval). For example, to scan all user records from user0001 to user9999 in an HBase table, set the start rowkey to user0001 and the end rowkey to user10000. This returns all rows with user-prefixed rowkey values between user0001 and user10000, excluding the row with rowkey user10000.

    Start Rowkey Type

    Select the type of the start rowkey: String or Binary.

    Output Fields

    The output fields of the component.

    • Batch Add Fields.

      1. Click Batch Add.

        • Configure in JSON format. For example:

          // Example:
          [{
            "name": "cf1:q1",
            "type": "string" 
           },
           { 
            "name": "cf1:q2",
            "type": "string"
           }, 
           {
            "name": "cf1:q3", 
            "type": "string"
           }]
          Note

          name is the column family and field name, and type is the field type. For example, "name":"cf1:a","type":"String" indicates that field a in column family cf1 is imported as type String.

        • Configure in TEXT format. For example:

          // Example:
          cf1:q1,string
          cf1:q2,string
          cf1:q3,string
          • The row delimiter is used to separate the information of each field. The default is a line feed (\n). Supported delimiters include line feed (\n), semicolon (;), and period (.).

          • The column delimiter is used to separate the field name and field type. The default is a comma (,).

      2. Click OK.

    • Create A New Output Field.

      Click Create Output Field, and fill in the Column Family, Column, and select the Type as prompted.

    • Manage output fields.

      You can perform the following operations on added fields:

      • Click and drag the Columnimage icon next to to change the position of the field.

      • Click the Operationagag icon in the column to edit an existing field.

      • Click the Operationagfag icon in the column to delete an existing field.

  8. Click OK to complete the property configuration of the HBase input component.

上一篇: Configure the Hive input component 下一篇: Configure MongoDB Input Widget
阿里云首页 智能数据建设与治理 Dataphin 相关技术圈