Use warehouses

更新时间:
复制 MD 格式

To further optimize query performance and management flexibility, StarRocks v3.3 introduces the warehouse concept. A warehouse is a group of compute nodes (CNs or BEs) within a StarRocks instance that performs computation. This topic describes how to use warehouses in StarRocks.

Background information

In a distributed database system, efficient resource allocation and utilization are key to ensuring high performance. StarRocks is a high-performance Massively Parallel Processing (MPP) analytical database designed for real-time analysis and processing of massive datasets. StarRocks distributes and executes queries in parallel, providing high throughput with low latency.

Common use cases for warehouses include:

  • Load balancing and resource isolation: Route query requests from different business units or user groups to different warehouses to achieve load balancing and prevent workloads from interfering with each other.

  • Performance optimization: Assign latency-sensitive queries or critical workloads to a high-performance warehouse that has more powerful computing resources or better network conditions.

  • Data security and compliance: Restrict queries on sensitive data to warehouses that meet specific security standards or are located in certain geographic regions to ensure compliance with data protection regulations.

To support these use cases, StarRocks provides features that let you flexibly use warehouses to optimize query performance and resource management.

Prerequisites

You have created an instance in EMR Serverless StarRocks.

Note

To add a new warehouse, you must create a StarRocks v3.3 or later Shared-data. A Shared-nothing supports only the default warehouse (default_warehouse) and does not support the multi-warehouse mode.

Usage notes

Currently, all asynchronous tasks, such as dynamic partition creation, compaction, schema change, materialized view refresh, and statistics collection, run in the default_warehouse. Therefore, for optimal performance, we recommend that you configure at least three compute nodes in the default_warehouse.

View warehouse and compute node information

View all warehouses

  • Syntax

    SHOW WAREHOUSES [LIKE '<pattern>'];

    The optional [LIKE '<pattern>'] clause filters warehouses by name.

    • The <pattern> supports the following wildcards:

      • %: Matches any sequence of zero or more characters.

      • _: Matches any single character.

    • <pattern> examples

      • LIKE 'test%': Matches all warehouses whose names start with test.

      • LIKE '%_warehouse': Matches all warehouses whose names end with _warehouse.

  • Examples

    • View all warehouses.

      SHOW WAREHOUSES;
    • Filter warehouses by name.

      SHOW WAREHOUSES LIKE 'test%';

View all compute nodes

The output includes the warehouse name for each compute node.

SHOW COMPUTE NODES;

View compute nodes in a warehouse

  • View compute nodes in a single warehouse.

    SHOW NODES FROM WAREHOUSE <warehouse_name>; 
  • View compute nodes in multiple warehouses.

    SHOW NODES FROM WAREHOUSES [LIKE '<pattern>']; 

    If you omit [LIKE '<pattern>'], the statement returns node information for all warehouses.

Use a warehouse for operations

Note

If no warehouse is specified, all operations are performed in the built-in default_warehouse by default.

Queries

For queries, you can only specify a warehouse by using the SET_VAR hint or by manually switching warehouses in the current session.

Use SET_VAR

In StarRocks, you can use the SET_VAR hint to dynamically specify the warehouse for the current query. This method is suitable for temporarily switching warehouses without affecting other queries or session configurations. For more information about the SET_VAR hint, see System variables.

The syntax is as follows:

SELECT /*+SET_VAR(warehouse="<warehouse_name>")*/ * FROM <database_name>.<table_name>;

In this statement, /*+SET_VAR(warehouse="<warehouse_name>")*/ is a fixed syntax for dynamically setting query-level parameters. Replace <warehouse_name> with your warehouse name, and <database_name> and <table_name> with the actual database and table names.

With the SET_VAR hint, you can flexibly specify a warehouse for a single query to achieve resource isolation and performance optimization. Note the following points:

  • Version requirement: StarRocks v3.3 or later is required.

  • Syntax: Ensure that the position and format of /*+SET_VAR(...)*/ are correct.

  • Permissions and resources: Ensure that the target warehouse exists and you have the required access permissions.

Current session

  1. Specify a warehouse.

    SET [session] warehouse [=] <warehouse_name>;

    The keywords session and = are optional.

  2. View the current warehouse.

    SHOW variables LIKE "%warehouse%";

    The specified warehouse is effective only for the current session. If no warehouse is specified, default_warehouse is used. Asynchronous tasks such as compaction, materialized view refresh, and statistics collection all use the default_warehouse.

  3. Run a query.

    SELECT * FROM my_db.my_table;

Data loads

  • Stream Load

    Add the warehouse information to the header of the HTTP request. The syntax is as follows:

    curl --location-trusted -u <username>:<password> \
        -H "label:<label_name>" \
        -H "timeout:100" \
        -H "max_filter_ratio:1" \
        -H "warehouse:<warehouse_name>" \
        -T <file_path> \
        -XPUT http://<fe_host>:<fe_http_port>/api/<database_name>/<table_name>/_stream_load

    The parameters are described in the following table.

    Parameter

    Required

    Description

    <username>:<password>

    Yes

    The username and password for your StarRocks instance. If the account has no password, specify only <username>:, for example, admin:.

    label

    No

    The label for the load job. If you do not specify a label, StarRocks automatically generates one. Data with the same label cannot be successfully loaded more than once, which prevents data duplication.

    timeout

    No

    The timeout for the load job, in seconds. The value must be in the range [1, 259200].

    max_filter_ratio

    No

    The maximum tolerance for data rows that can be filtered due to poor data quality. The value must be in the range [0, 1].

    warehouse

    Yes

    The warehouse to use for the Stream Load job.

    <file_path>

    Yes

    The path to the data file.

    fe_host

    Yes

    The private network address of an FE in the StarRocks instance. You can view this address on the Instance Details page.

    Note

    If you submit the load job directly to a BE or CN, you must provide the IP address of that BE or CN. For example, you can run SHOW COMPUTE NODES; to view the IP addresses.

    fe_http_port

    Yes

    The HTTP port of the FE. The default is 8030.

    Note

    If you submit the load job directly to a specific BE or CN, you must provide its HTTP port. The default is 8040.

    database_name

    Yes

    The name of the database where the target StarRocks table resides.

    table_name

    Yes

    The name of the target StarRocks table.

  • Transaction import

    When you call the /api/transaction/begin API to start a transaction, you must specify a label and add the warehouse information to the HTTP request header. The syntax is as follows:

    curl --location-trusted -u <username>:<password> \
        -H "label:<label_name>" \
        -H "db:<database_name>" \
        -H "table:<table_name>" \
        -H "warehouse:<warehouse_name>" \
        -XPOST http://<fe_host>:<fe_http_port>/api/transaction/begin

    Subsequently, when using the /api/transaction/load, /api/transaction/prepare, and /api/transaction/commit APIs, you must use the same label as in the /api/transaction/begin request, but you do not need to specify the warehouse in the HTTP request header again.

  • Broker Load

    LOAD LABEL [<database_name>.]<label_name>
    (
        data_desc[, data_desc ...]
    )
    WITH BROKER
    (
        StorageCredentialParams
    )
    [PROPERTIES
    (
        opt_properties
    )
    ]

    The "warehouse" parameter is added to opt_properties in the statement to create a load job. It defaults to the current session's warehouse. For example:

    LOAD LABEL my_db.path_extra_fields_label
    (
      DATA INFILE( "hdfs://<hdfs_ip>:9000/starrocks_test_data/hdfs_data/*.orc")
      INTO TABLE `my_tbl` 
      FORMAT AS "orc" 
    )  
    WITH BROKER hdfs_broker 
    (
      "hadoop.security.authentication" = "simple",
      "username" = "<hdfs_username>",
      "password" = "<hdfs_password>"
    ) 
     PROPERTIES
    (
      "warehouse"="test"
    );
  • Routine load

    The warehouse parameter is added to job_properties, defaulting to the current session's warehouse.

    CREATE ROUTINE LOAD <database_name>.<job_name> ON <table_name>
    [load_properties]
    [job_properties]
    FROM data_source
    [data_source_properties]

    The following example uses a Routine Load job to specify the warehouse named test, continuously consuming messages from Kafka and loading them into StarRocks.

    CREATE ROUTINE LOAD my_db.routine_load_basic_types_fe824e00_680a_11ee_81d7_00163e135aa9 
    ON my_table
    COLUMNS (k1,k2,k3,k4,k5,k6,v1,v2,v3,v4,v5,v6,v7,v8), 
    COLUMNS TERMINATED BY ',',where v1 != 0 
    PROPERTIES (
        "desired_concurrent_number"="1",
        "max_error_number"="1000",
        "max_batch_interval"="7",
        "warehouse"="test"
    )
    FROM KAFKA (
        "kafka_broker_list"="<kafka_broker_ip>:<kafka_broker_port>",
        "kafka_topic"="xxx",
        "kafka_partitions"="0",
        "kafka_offsets"="OFFSET_BEGINNING"
    );
  • Flink Connector

    StarRocks provides native support for Apache Flink, allowing you to efficiently write data to StarRocks using Flink. Starting from Flink connector v1.2.9, you can configure the sink.properties.warehouse parameter to write data to a specific warehouse for more flexible resource management and performance optimization.

    Use sink.properties.warehouse to specify the target warehouse.

    sink.properties.warehouse=<warehouse_name>
    • <warehouse_name>: The name of the target warehouse.

    • If this parameter is not specified, the default_warehouse is used.

Materialized views

The warehouse parameter has been added to the PROPERTIES of the CREATE MATERIALIZED VIEW statement. It defaults to the current session's warehouse. The syntax is as follows:

CREATE MATERIALIZED VIEW [IF NOT EXISTS] [database_name.]<mv_name>
[COMMENT ""]
[PROPERTIES ("key"="value", ...)]
AS 
<query_statement>

For example, to create a materialized view named order_mv and specify that subsequent rewrite operations run in the warehouse named test, use the following statement:

CREATE MATERIALIZED VIEW order_mv
DISTRIBUTED BY HASH(`order_id`)
REFRESH ASYNC EVERY (interval 1 MINUTE) 
PROPERTIES ("warehouse" = "test")
AS SELECT 
    order_list.order_id,
    sum(goods.price) as total
FROM order_list INNER JOIN goods ON goods.item_id1 = order_list.item_id2
GROUP BY order_id;

After successful execution, you can view the warehouse information in two ways. The fe.audit.log also records the execution of materialized view rewrite statements, which includes the warehouse field.

  • Method 1: Run the following command to query the details of the materialized view "order_mv".

    SHOW materialized views WHERE name="order_mv";
  • Method 2: Run the following command to view the complete SQL statement for creating the materialized view "order_mv".

    SHOW CREATE materialized VIEW order_mv;

Compaction

By default, compaction runs in the default_warehouse. To specify a different warehouse, you can modify the lake_compaction_warehouse configuration item. You can either modify it dynamically in the current session or restart the FE after modifying it in fe.conf.

For example, to specify the warehouse test for compaction:

  • Method 1: You can modify FE dynamic parameters online using the ADMIN SET FRONTEND CONFIG command.

    ADMIN SET FRONTEND CONFIG ("lake_compaction_warehouse" = "test");
  • Method 2: Add lake_compaction_warehouse = test; to the fe.conf file and restart FE to apply the changes.

After the modification, you can run show proc '/compactions' ; to view the recent Compaction records. The last column shows the Warehouse information.

Connect to a warehouse

JDBC

The syntax is as follows:

jdbc.url=jdbc:mysql://<mysql_host>:3306/dbName?sessionVariables=warehouse=<warehouse_name>

Superset

Refer to Superset Support | StarRocks to install starrocks-sqlalchemy and use init_command to set the corresponding warehouse.

starrocks://<username>:<password>@<host>:<port>/<database_name>?init_command=set warehouse = <warehouse_name> 

Set a default warehouse for a user

Note

This feature is supported only in shared-data instances of v3.3.8 and later.

You can specify a default warehouse (e.g., sr_warehouse) for a user (e.g., sr_user) by either modifying the user's properties or setting it when creating the user.

  • Method 1: Modify the properties of an existing user with ALTER USER.

    ALTER USER sr_user SET PROPERTIES ('session.warehouse' = 'sr_warehouse');
  • Method 2: Create a new user and specify the default warehouse with CREATE USER.

    CREATE USER 'sr_user' PROPERTIES ('session.warehouse' = 'sr_warehouse');

After setting the property, you can run the SHOW PROPERTY command to view the properties for a specific user.

Permissions

You can grant permissions to users and roles using the GRANT statement. The syntax is as follows:

GRANT  
    { USAGE } 
    ON { WAREHOUSE <warehouse_name> [, <warehouse_name>,...] | ALL WAREHOUSE } 
    TO { ROLE | USER} {<role_name>|<user_identity>} [ WITH GRANT OPTION ]

Privilege

Description

USAGE

Allows a user to use the specified warehouse to run queries and other operations.

Having the USAGE privilege on a warehouse also implicitly grants the SHOW privilege.

Example: Grant the USAGE permission to the user user1.

GRANT USAGE ON warehouse waa TO user1;

For more information about privilege management, see Manage user privileges.