Create a Hologres internal table

更新时间:
复制 MD 格式

You can create Hologres internal tables by using Data Definition Language (DDL) statements or by using the visual interface provided in DataWorks. This topic describes how to use the visual interface in DataWorks to create a Hologres internal table.

Prerequisites

Background information

Hologres supports two types of tables: internal tables and external tables. The following list describes the differences:

  • An internal table stores data directly from MaxCompute. You can synchronize data from a MaxCompute source table to a Hologres internal table for fast queries and analysis. This method offers better query performance than using an external table.

  • An external table does not store data directly. Instead, it maps to a MaxCompute source table to accelerate data queries and analysis. This method avoids data redundancy and eliminates the need for data import and export, allowing you to get query results quickly.

As a data development and processing platform, DataWorks provides a convenient visual interface for creating tables. You can also create tables directly in Hologres by using DDL statements. For more information, see CREATE TABLE.

Limitations

This feature is available only in the China (Shanghai) and China (Beijing) regions.

Procedure

  1. Go to the Data Studio page.

    Log on to the DataWorks console. In the target region, click Data Integration > Data Integration in the left-side navigation pane. Select a workspace from the drop-down list and click Go to Data Integration.

  2. Create a Workflow.

    If you already have a Workflow, skip this step.

    1. Hover over the 新建 icon and select Create Workflow.

    2. In the Create Workflow dialog box, enter a Workflow Name.

    3. Click Create.

  3. Create a Hologres internal table.

    1. Hover over the 新建 icon and select Create Table > Hologres > Table.

    2. In the Create Table dialog box, set Table Type to Internal Table, and configure the engine, path, and name for the table.

      For Engine Instance, select Hologres. After you complete the configuration, click Create.

  4. Configure the Hologres internal table.

    On the configuration page, configure the properties of the Hologres internal table.

    1. Configure basic properties.

      The following table describes the main basic properties.

      Parameter

      Description

      Storage Mode

      The storage mode of the table in Hologres. The default is columnar storage.

      • Column-oriented Storage: Suitable for online analytical processing (OLAP) scenarios. It is ideal for complex queries, data joins, scans, filters, and aggregations. The write and update performance is lower than that of row-oriented storage.

      • Row-oriented Storage: Suitable for key-value (KV) scenarios. It is ideal for point lookups and scans based on primary keys. It provides better write and update performance.

      • Hybrid Row-column Storage: Suitable for scenarios where both columnar and row-oriented storage are required. It supports both high-performance point lookups and OLAP analysis. This mode incurs higher storage overhead and internal data synchronization costs.

      Note

      For more information about storage formats, see the description of the orientation parameter in CREATE TABLE.

      Lifecycle

      The life cycle of the table, in seconds. By default, the life cycle is permanent.

      Note

      The life cycle countdown starts when data is first written to the table. After the specified life cycle ends, the table data is cleared within a random period of time.

      Binlog

      Specifies whether to enable binlog for the table. If you enable binlog, you must specify its life cycle. By default, the binlog life cycle is permanent.

      Note

      Only Hologres V0.9 and later versions support the table-level binlog feature. For more information about binlog, see Subscribe to Hologres binlogs.

    2. Configure business information.

      Note

      A table's business information is for management purposes only and does not affect the underlying logic.

      Parameter

      Description

      Theme

      The primary and secondary folders to which the table belongs. You can classify tables based on business purposes and assign tables of the same type to the same folder.

      Note

      The primary and secondary themes are just folder representations in DataWorks, designed to help you better manage your tables.

      Layer

      The table's physical data warehouse layer. Layers define and manage data warehouse tiers, which typically include the ODS layer, common layer, and analytics layer. You can assign the table to a suitable layer based on your business needs.

      Note

      Click the 新建 icon to customize layers. For more information, see Manage tables.

      Physical Category

      The physical category of the table, used for more detailed classification from a business perspective. Common categories include Basic Business, Advanced Business, and Other.

      Note

      Click the 新建 icon to customize physical categories. For more information, see Manage categories.

    3. Configure the table schema.

      Parameter

      Description

      Field Design

      Add and define the fields for the table. For more information about the data types that Hologres supports, see Data type summary.

      Storage Design

      Define the storage methods for the table fields.

      • Distribution Column: Specifies the distribution strategy for the table. Data is distributed to different shards based on the distribution column. Subsequent operations such as computing and scanning are performed at the shard level.

      • Segment key: Typically, you specify a time-related column as the segmented column. When a query includes this column, the system can quickly locate the data's storage location. This is suitable for data that is strongly time-related, such as logs and traffic data.

      • Clustering Key: Used to create a clustered index on a specified column. Hologres sorts the data based on the clustered index to accelerate range and filter queries on the indexed column.

      • Dictionary Encoding Columns: Used to build a dictionary mapping for the values in a specified column. Dictionary encoding can convert string comparisons into numeric comparisons, accelerating queries such as GROUP BY and FILTER.

      • Bitmap Column: Also known as a bit-encoded column, it allows for fast equality filtering of data within a storage file. Therefore, we recommend that you create a bitmap index for data that is frequently used in equality filter conditions.

      For more information about storage methods, see CREATE TABLE.

      Partition

      Define the partition field for the table.

      Note

      If a partitioned table has a primary key, the primary key must include the partition field.

  5. Commit and publish the Hologres internal table.

    After you define the table schema, you must commit it to the development environment and the production environment. After a successful commit, you can view the table in the engine project of that environment.

    Note

    If your workspace is in basic mode, you only need to commit the table to the production environment. For more information about the differences between basic mode and standard mode workspaces, see Differences between workspace modes.

    Actions

    Description

    Load from Development Environment

    Loads the table's information from the development environment and displays it on the current page.

    Note

    You can perform this operation only after the table has been committed to the development environment. After you perform this operation, the table's information from the development environment overwrites the information on the current page.

    Commit to Development Environment

    Commits the table to the development environment of DataWorks. This creates the current table in the development environment's Hologres database.

    After the commit, you can view the table schema in the Hologres directory of the corresponding workflow in DataStudio. The workflow is specified by the path you selected when creating the table.

    Load from Production Environment

    Loads the table's information from the production environment and displays it on the current page.

    Note

    You can perform this operation only after the table has been committed to the production environment. After you perform this operation, the table's information from the production environment overwrites the information on the current page.

    Commit to Production Environment

    Commits the table to the production environment of DataWorks. This creates the current table in the production environment's Hologres database.

Next steps

After you create the Hologres internal table, you can perform the following operations: