Create a Real-Time Dataset Through HBase

更新时间: 2026-06-17 09:50:13

Dataphin enables you to define dataset metrics by parsing fields from HBase data source tables using calculation scripts. Learn how to create and configure a real-time dataset from an HBase data source.

Prerequisites

  • A tag project for the dataset is created. For more information, see .

  • An HBase data source for the real-time dataset is created. For more information, see .

Procedure

  1. Navigate to the Dataphin home page and click Tag -> Tag Workbench from the top menu bar.

  2. Proceed as follows to access the Add Real-time Dataset dialog box:

    Select Tag Project -> click Real-time Dataset -> click Add Dataset.

    image

  3. In the Add Real-time Dataset dialog box, select Hbase Dataset.

  4. On the Add Hbase configuration page, input the basic information and establish the processing logic for the dataset.

  • Basic Information

  • Parameter

    Description

    Dataset Name

    Enter the dataset name. The name can contain Chinese and English characters, numbers, and underscores (_), up to 64 characters.

    Dataset Code

    Enter a unique identifier for the dataset. The identifier must start with a letter and can contain lowercase letters, numbers, and underscores (_), up to 64 characters.

    Owner

    Select the owner of the real-time dataset.

    Description

    Provide a brief description of the real-time dataset, limited to 1000 characters.

  • Processing Logic.

  • Parameter

    Description

    Data Source

    Select the HBase data source. If none is available, create one first. For more information, see .

    Source Table

    Select the source table from the HBase data source to process.

    RowKey Configuration Rule

    Set up the RowKey rule. RowKey expressions support functions and variables:

    • For Variable, use ${} to enclose it. For example, ${variable}.

    • For String Constant, use quotation marks "". For example, "hello world".

    • Function: The md5() function takes a string as a parameter. For example, md5("hello word") or md5({user_id}).

    • For Expression, use the plus sign + to concatenate strings. For instance, ${user_id} + "hello world".

    Entity

    After setting the RowKey configuration rule, click Entity Parsing to parse the entity list. Specify the Primary Key Name and Value Type, which can be String or Long Integer.

    Metric Configuration

    Define the metric by specifying the column family name, field name, field type, metric display name, value type, and a brief description. Supported value types include String, Long Integer, Double Precision Floating-Point Number, Date, Boolean, Decimal.

    To configure additional metrics, click + Add.

  1. To finalize the creation of the real-time dataset, click Publish.

    Note

    After saving, click Authenticate to test the processing logic with parameter values.

What to do next

After creating and publishing the real-time dataset, create the corresponding real-time tags. For more information, see .

上一篇: Create a real-time dataset through event preprocessing 下一篇: Create a real-time dataset through MySQL
阿里云首页 智能数据建设与治理 Dataphin 相关技术圈