The StarRocks Connector uses the External Catalog mechanism to seamlessly access MaxCompute data sources—without importing data or creating external tables—and run complex SQL queries. This improves data analytics efficiency and reduces operational and maintenance (O&M) complexity and cost. This topic describes how to access MaxCompute using the StarRocks Connector.
Background information
StarRocks is a next-generation, high-speed Massively Parallel Processing (MPP) database that supports multiple enterprise-level analytical workloads, such as online analytical processing (OLAP)-based multidimensional analysis, custom report creation, real-time data analysis, and ad hoc queries. For details about StarRocks, see StarRocks Introduction. StarRocks supports the Catalog (data catalog) feature, which lets you manage both internal and external data within a single system and easily access and query data stored in various external sources. For more information, see Catalog Overview.
Prerequisites
-
You have deployed a StarRocks cluster of version 3.2.3 or later and ensured that the cluster can properly access the MaxCompute service. For more information about how to deploy a StarRocks cluster, see Deploy a StarRocks shared-nothing cluster using Docker Compose .
- Purchase a dedicated Data Transmission Service resource group (subscription) or activate open storage (pay-as-you-go) resources.
Limits
-
Supports reading standard tables, partitioned tables, clustered tables, Delta Tables, and materialized views. Does not support reading external tables or logical views.
-
Does not support reading JSON data types.
-
Only supports MaxCompute projects with Schema disabled. For details about Schema, see Schema operations.
Create a MaxCompute Catalog
Log on to StarRocks and create a MaxCompute Catalog in StarRocks. For more information about External Catalogs, see Create External Catalog.
Syntax
CREATE EXTERNAL CATALOG <catalog_name>
[COMMENT <comment>]
PROPERTIES
(
"type" = "odps",
CatalogParams,
ScanParams,
CachingMetaParams
)
Parameter descriptions
-
Common parameters
Parameter
Required
Description
catalog_name
Yes
Name of the MaxCompute Catalog. Follow these naming rules:
-
Must consist of letters (a–z or A–Z), digits (0–9), or underscores (_), and must start with a letter.
-
Maximum length is 1023 characters.
-
Catalog names are case-sensitive.
type
Yes
Type of the data source. Set this parameter to
odps.comment
No
Description of the MaxCompute Catalog.
CatalogParams
Yes
Parameters for StarRocks to access MaxCompute.
ScanParams
No
Parameters for StarRocks to access MaxCompute file storage.
CachingMetaParams
No
Parameters that define the metadata caching policy.
-
-
CatalogParams parameters
Parameter
Required
Description
odps.endpoint
Yes
Connection address of the MaxCompute service. Configure the endpoint based on the region and network connectivity method selected when you created the MaxCompute project. For endpoint values by region and network type, see Endpoint.
ImportantCurrently, only Alibaba Cloud VPC networks are supported.
odps.project
Yes
Name of the MaxCompute project. Log on to the MaxCompute console and go to the page to obtain the project name.
NoteIf you created a workspace in standard mode, distinguish between the production environment and development environment (_dev) project names when configuring this parameter.
odps.access.id
Yes
AccessKey ID of your Alibaba Cloud account or Resource Access Management (RAM) user. Go to the AccessKey management page to obtain the AccessKey ID.
odps.access.key
Yes
AccessKey secret corresponding to the AccessKey ID.
odps.tunnel.quota
Yes
-
ScanParams parameters
Parameter
Required
Description
odps.split.policy
No
Sharding strategy used when scanning data. Valid values:
-
size (default): shards by data size. Default value: 256 MB.
-
row_offset: shards by row count.
NoteThe sharding strategy significantly affects data scan concurrency in the compute engine. If the default strategy does not meet your needs, choose the
row_offsetstrategy and adjust it using theodps.split.row.countparameter.odps.split.row.count
No
Maximum number of rows per shard. Default value: 4194304. Value range: greater than 0.
NoteConfigure this parameter only when
odps.split.policyis set torow_offset. -
-
CachingMetaParams parameters
Parameter
Required
Description
odps.cache.table.enable
No
Whether to cache table metadata. Valid values:
-
true (default): enables table metadata caching. This improves metadata retrieval and update speed and is suitable for scenarios involving frequent DDL operations, large multi-user environments, or complex query optimization.
-
false: disables table metadata caching. Because cache resources are limited and excessive reliance on caching may consume too much memory, disable caching for infrequent queries or tables whose metadata changes frequently.
odps.cache.table.expire
No
Time interval, in seconds (s), after which cached table metadata is automatically evicted. Default value: 86400 (24 hours). Value range: greater than or equal to 0.
odps.cache.table.size
No
Number of table metadata entries to cache. Default value: 1000.
odps.cache.partition.enable
No
Whether to cache metadata for all partitions under the target table. Valid values:
-
true (default): enables caching of metadata for all partitions under the target table.
-
false: disables caching. Metadata for all partitions is not cached.
NoteIf the table is not partitioned, the
odps.cache.partition.enableparameter has no effect.odps.cache.partition.expire
No
Time interval, in seconds (s), after which cached partition metadata is automatically evicted. Default value: 86400 (24 hours). Value range: greater than or equal to 0.
odps.cache.partition.size
No
Number of partitioned tables whose metadata can be cached. Default value: 1000.
odps.cache.table-name.enable
No
Whether to cache all table names in the MaxCompute project. Valid values:
-
true: enables caching of all table names in the MaxCompute project.
-
false (default): disables caching of all table names in the MaxCompute project.
odps.cache.table-name.expire
No
Time interval, in seconds (s), after which cached table name information in the MaxCompute project is automatically evicted. Default value: 86400 (24 hours). Value range: greater than or equal to 0.
-
Example
The following example creates a MaxCompute Catalog named odps_catalog, using the MaxCompute project mf_mc_bj as an example.
-- Create catalog
CREATE EXTERNAL CATALOG odps_catalog PROPERTIES(
"type"="odps",
"odps.access.id"="<yourAccessKeyId>",
"odps.access.key"="<yourAccessKeySecret>",
"odps.endpoint"="http://service.cn-beijing.maxcompute.aliyun.com/api",
"odps.tunnel.quota"="pay-as-you-go",
"odps.project"="mf_mc_bj"
);
Access MaxCompute
-
Switch to the target Catalog and MaxCompute project. This example uses
odps_catalogandmf_mc_bj.-
Syntax
-
Method 1
-- Specify Catalog SET CATALOG <catalog_name>; -- Specify MaxCompute project USE <project_name>; -
Method 2
USE <catalog_name>.<project_name>;
-
-
Example
SET CATALOG odps_catalog; USE mf_mc_bj;
-
-
Query data. This example uses the
srctable.
SELECT * FROM src LIMIT 10;
The result is as follows.
+------+-------+
| key | value |
+------+-------+
| 1 | 1 |
| 3 | 3 |
| 2 | 2 |
| 4 | 100 |
| 5 | 200 |
| 6 | 300 |
| 3 | 400 |
+------+-------+
Other operations
View MaxCompute Catalog
-
List all Catalogs in the current StarRocks cluster.
SHOW CATALOGS;The result is as follows.
+-----------------+----------+------------------------------------------------------------------+ | Catalog | Type | Comment | +-----------------+----------+------------------------------------------------------------------+ | default_catalog | Internal | An internal catalog contains this cluster‘s self-managed tables. | | odps_catalog | Odps | NULL | +-----------------+----------+------------------------------------------------------------------+ -
Show the CREATE statement for a specific Catalog.
SHOW CREATE CATALOG odps_catalog;The result is as follows.
+--------------+--------------------------------------------------------------------------+ | Catalog | Create Catalog | +--------------+--------------------------------------------------------------------------+ | odps_catalog | CREATE EXTERNAL CATALOG `odps_catalog` PROPERTIES ("odps.endpoint" = "http://service.cn-beijing.maxcompute.aliyun.com/api", "odps.access.id" = "<yourAccessKeyId>", "odps.access.key" = "<yourAccessKeySecret>", "odps.project" = "odps_project", "type" = "odps" )| +-----------------------------------------------------------------------------------------+
View MaxCompute table schema
-
Describe a MaxCompute table schema.
-
Syntax
DESC[RIBE] <catalog_name>.<database_name>.<table_name>; -
Parameter descriptions
-
catalog_name: required. Name of the Catalog.
-
database_name: required. Name of the MaxCompute project under the target Catalog.
-
table_name: required. Name of the table under the target MaxCompute project.
-
-
Example
DESC odps_catalog.mf_mc_bj.src;The result is as follows.
+-------+------+------+-------+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+------+------+-------+---------+-------+ | id | INT | Yes | false | NULL | | | a | INT | Yes | false | NULL | | +-------+------+------+-------+---------+-------+
-
-
Show the MaxCompute table schema and its CREATE TABLE statement.
-
Syntax
SHOW CREATE TABLE <catalog_name>.<database_name>.<table_name>; -
Parameter descriptions
-
catalog_name: required. Name of the Catalog.
-
database_name: required. Name of the MaxCompute project under the target Catalog.
-
table_name: required. Name of the table under the target MaxCompute project.
-
-
Example
SHOW CREATE TABLE odps_catalog.mf_mc_bj.src;The result is as follows.
+-------+--------------------------------------------------------------------------------+ | Table | Create Table | +-------+--------------------------------------------------------------------------------+ | src | CREATE TABLE `src` ( `id` int(11) DEFAULT NULL, `a` int(11) DEFAULT NULL ) | +-------+--------------------------------------------------------------------------------+
-
Delete a MaxCompute Catalog
Run the following command to delete a MaxCompute Catalog.
DROP CATALOG odps_catalog;
Cost-Based Optimizer (CBO) statistics collection
The current version of StarRocks does not support automatic collection of MaxCompute table statistics. Run the following command to manually create a statistics collection task.
-- Replace <table_name> with the MaxCompute table name
ANALYZE TABLE <table_name>;
StarRocks CBO can perform cost estimation based on various statistics and select the execution plan with the lowest cost from tens of thousands of execution plans, improving the efficiency and performance of complex queries. For more information, see CBO statistics.
Manually refresh metadata cache
By default, StarRocks caches MaxCompute metadata to improve query performance. After changing a table schema or updating table data, manually refresh the table's metadata to ensure StarRocks retrieves the latest metadata immediately.
-- Replace <table_name> with the MaxCompute table name
REFRESH EXTERNAL TABLE <table_name>;
FAQ
Error when reading data: Your slot quota is exceeded.
-
Problem Analysis
This error occurs when the data read volume exceeds the Data Transmission Service quota. During reads, StarRocks splits the table into multiple shards and distributes them to BE nodes, which use thread pools to read data concurrently. The number of slots occupied during the read is typically:
Max(number of shards, number of BEs × BE thread pool concurrency). -
Solution
-
Purchase a dedicated Data Transmission Service resource group with higher concurrency. For instructions, see Purchase and use dedicated Data Transmission Service resource groups.
-
Adjust the number of shards by configuring
ScanParamsparameters. If you use therow_offsetsharding strategy, also increase theodps.split.row.countparameter. -
Modify the StarRocks BE thread pool parameters
scanner_thread_pool_queue_sizeandscanner_thread_pool_thread_num. For details, see Introduction to StarRocks.
-
Appendix: Type mapping
|
MaxCompute field type |
StarRocks field type |
|
BOOLEAN |
BOOLEAN |
|
TINYINT |
TINYINT |
|
SMALLINT |
SMALLINT |
|
INT |
INT |
|
BIGINT |
BIGINT |
|
FLOAT |
FLOAT |
|
DOUBLE |
DOUBLE |
|
DECIMAL(p, s) |
DECIMAL(p, s) |
|
STRING |
VARCHAR(1073741824) |
|
VARCHAR(n) |
VARCHAR(n) |
|
CHAR(n) |
CHAR(n) |
|
JSON |
VARCHAR(1073741824) |
|
BINARY |
VARBINARY |
|
DATE |
DATE |
|
DATETIME |
DATETIME |
|
TIMESTAMP |
DATETIME Important
StarRocks does not have a TIMESTAMP type. When reading TIMESTAMP data, it is treated as DATETIME, which results in loss of precision. |
|
ARRAY |
ARRAY |
|
MAP |
MAP |
|
STRUCT |
STRUCT |