Overview of Open Storage

更新时间:
复制 MD 格式

The Storage API (also called Open Storage) gives third-party compute engines direct read access to MaxCompute's underlying storage. The Storage API provides an efficient, low-latency, and secure method for data access. Engines such as Apache Spark, StarRocks, Presto, and Apache Flink connect through a connector, bypassing data exports and reducing access latency.

image

How it works

A third-party compute engine connects to MaxCompute through a connector that calls the Storage API. The API exposes MaxCompute tables with standard table semantics, so engines read columnar data directly from storage without moving data out of MaxCompute.

Key capabilities:

  • High throughput: Supports efficient columnar reads, predicate pushdown, and the Apache Arrow data format.

  • Secure and user-friendly: Enforces project isolation, access control, and data encryption at the storage layer — without exposing storage internals to the caller.

  • Ecosystem integration: Spark on EMR and StarRocks support dedicated connectors for direct MaxCompute access.

Use cases

The Storage API suits workloads that require multi-engine access to the same data without duplication:

  • Cross-engine analytics: Run Spark, StarRocks, or Presto queries directly on MaxCompute tables without exporting data first.

  • Flexible framework switching: Switch between compute frameworks for different processing needs while keeping data in one place.

Limitations

Supported and unsupported table types

Third-party engines can read the following table types through the Storage API:

  • Standard tables

  • Partitioned tables

  • Clustered tables

  • Delta Tables

  • Materialized views

The following table types are not supported:

  • External tables

  • Logical views

Unsupported data types

Reading data of the JSON type is not supported.

Throughput limits (pay-as-you-go)

LimitValue
Concurrent requests per tenant1,000
Transmission rate per concurrent request10 MB/s

Data transmission resources

To run data transmission tasks, use exclusive resource groups for Data Transmission Service (DTS). DTS resource groups use the subscription billing method — charges are based on the number of concurrent instances purchased.

ResourceBillingSupported regionsSetup guide
Subscription fees for exclusive data transmission resourcesSubscription — charged per concurrent instance
  • China (Beijing)

  • China (Hangzhou)

  • China (Shanghai)

  • China (Shenzhen)

  • China (Chengdu)

  • China (Hong Kong)

  • East China 1 Finance Cloud

  • Indonesia (Jakarta)

  • Singapore

  • US (Silicon Valley)

  • US (Virginia)

  • Japan (Tokyo)

  • Germany (Frankfurt)

Purchase and use an exclusive resource group for Data Transmission Service

Pay-as-you-go Storage API

This resource is pay-as-you-go. Each tenant receives a free monthly quota of 1 TB for data reads and writes. If the free quota is used up, you are charged for the logical size of data that is read or written.

  • China (Hangzhou)

  • China (Beijing)

  • China (Shanghai)

  • China (Chengdu)

  • China (Shenzhen)

  • China (Ulanqab)

Using Open Storage (pay-as-you-go)

To monitor usage, go to the Resource Observation page. For more information, see Resource Observation and pay-as-you-go Storage API resources.

Use the pay-as-you-go Storage API

  1. Enable the Storage API Switch switch.

    1. Log on to the MaxCompute console, and select a region in the upper-left corner.

    2. In the navigation pane on the left, choose Manage Configurations > Tenants.

    3. On the Tenants page, click the Tenant Property tab.

      On the Tenant Property tab, turn on the Storage API Switch.

      The name of the pay-as-you-go Storage API resource is pay-as-you-go. For more information, see Tenant Properties.

  2. Grant permissions

    By default, no accounts, including Alibaba Cloud accounts or roles, have the permissions to specify a quota at the job level. You must grant the required permissions.

    1. Add a role.

      1. Log on to the MaxCompute console, and select a region in the upper-left corner.

      2. In the navigation pane on the left, choose Manage Configurations > Tenants.

      3. On the Tenants page, click the Roles tab.

      4. On the Roles tab, click Add Role. In the Add Role dialog box, enter a Role Name and Policy Content, and then click OK to create the role.

        {
            "Statement": [{
                    "Action": [
                        "odps:List",
                        "odps:Usage"],
                    "Effect": "Allow",
                    "Resource": ["acs:odps:*:regions/*/quotas/pay-as-you-go"]}],
            "Version": "1"
        }

        Metric description:

        • Action: The operation permission that you want to grant. You can specify multiple operations in a single authorization statement. If you specify multiple operations, separate them with commas (,). For more information about the valid values, see MaxCompute permissions. For more information about the parameters in a policy document, see Basic elements of an access policy.

        • Resource: The scope of the authorized resource. The format is ["acs:odps:Tenant/${tenant_id}:regions/${region_id}/quotas/${quota_name}"].

          ["acs:odps:*:regions/*/quotas/pay-as-you-go"] specifies the pay-as-you-go quota for the Storage API in all regions of the current tenant.

    2. Grant the role to the account that you use to specify a quota at the job level.

      By default, an Alibaba Cloud account or a RAM user who has the Super_Administrator role at the tenant level can grant permissions.

      The authorization object determines which of the following two scenarios occurs.

      • Grant permissions to an Alibaba Cloud account.

        Run the following command to grant permissions to an Alibaba Cloud account.

        -- Add an Alibaba Cloud account to the tenant and grant a role to the Alibaba Cloud account.
        Add tenant user <Aliyun$xxxx>;
        Grant tenant role <role_name> to user <Aliyun$xxxx>;
        -- View the permissions of a role or user in the tenant.
        Show grants for tenant role <role_name>;
        Show grants for tenant user <user_name>;
        Show principals for tenant [role] <role_name>;
      • Grant permissions to a RAM user.

        1. Log on to the MaxCompute console, and select a region in the upper-left corner.

        2. In the navigation pane on the left, choose Manage Configurations > Tenants.

        3. On the Tenants page, click the Users tab.

        4. In the Edit Role dialog box, select the roles to assign to the user from the Available Roles area, move them to the Added Roles area, and then click OK.

  3. Use pay-as-you-go Storage API resources

    When a third-party engine accesses MaxCompute, you can set the quota name to pay-as-you-go. The following example uses the Java SDK.

    // The AccessKey ID and AccessKey secret of your Alibaba Cloud account or RAM user.
    // An AccessKey pair of an Alibaba Cloud account has permissions to call all API operations. Using these credentials to perform operations in Alibaba Cloud services is a high-risk operation. We recommend that you create a RAM user to call API operations or perform routine O&M. To create a RAM user, log on to the RAM console.
    // In this example, the AccessKey ID and AccessKey secret are stored in environment variables. You can also store the AccessKey pair in a configuration file as needed.
    // To prevent security risks, we recommend that you do not hard-code the AccessKey pair in your code.
    private static String accessId = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID");
    private static String accessKey = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET");
    // The name of the pay-as-you-go quota that is used to access MaxCompute.
    String quotaName = "pay-as-you-go";
    // The name of the MaxCompute project.
    String project = "<project>";
    // Create an Odps object to connect to the MaxCompute service.
    Account account = new AliyunAccount(accessId, accessKey);
    Odps odps = new Odps(account);
    odps.setDefaultProject(project);
    // The endpoint of the MaxCompute service. Only VPCs are supported.
    odps.setEndpoint(endpoint);
    Credentials credentials = Credentials.newBuilder().withAccount(odps.getAccount()).withAppAccount(odps.getAppAccount()).build();
    EnvironmentSettings settings = EnvironmentSettings.newBuilder().withCredentials(credentials).withServiceEndpoint(odps.getEndpoint()).withQuotaName(quotaName).build();
    Note

    Configure an endpoint based on the region and network connectivity that you selected when you created the MaxCompute project. For more information about the endpoints for each region and network type, see Endpoints.

Arrow data type mapping

The Storage API uses Apache Arrow as the in-memory data format for transmission. Data written through the Storage API is not processed — duplicate keys in MAP columns are retained as-is.

Warning

TIMESTAMP and INTERVAL_DAY_TIME types are subject to precision loss. TIMESTAMP values beyond the Arrow TimestampType range have the high-precision part truncated. INTERVAL_DAY_TIME (nanosecond precision) is truncated to milliseconds.

The following table maps MaxCompute types to their Arrow equivalents.

MaxCompute typeArrow typeNotes
TINYINTInt8Type
SMALLINTInt16Type
INTInt32Type
BIGINTInt64Type
FLOATFloatType
DOUBLEDoubleType
BOOLEANBooleanType
DECIMALDecimal128TypeRead: converted to decimal(38,18); overflow throws an exception. Write: Arrow decimal(precision,scale) maps to DECIMAL(38,18); precision and scale must match.
DECIMAL(precision, scale)Decimal128Type
STRINGStringType
BINARYBinaryType
VARCHARStringType
CHARStringType
DATETIMETimestampTypeTime unit: milliseconds. Timezone: UTC.
TIMESTAMPTimestampTypeTime unit: nanoseconds. Timezone: UTC. Supports a wider value range; data beyond the Arrow precision range is truncated.
DATEDate32Type
INTERVAL_DAY_TIMEDayTimeIntervalTypeMaxCompute precision: nanoseconds. Arrow precision: milliseconds. The nanosecond part is truncated on read and write.
INTERVAL_YEAR_MONTHMonthIntervalType
ARRAYListType
MAPMapTypeDuplicate keys are retained on write. On query, the SQL engine applies "last write wins" (example: writing {'a': 1, 'b': 2, 'a': 3} returns {'a': 3, 'b': 2}).
STRUCTStructType
JSONStringTypeNot supported: Reading JSON data through the Storage API is not supported.

What's next

    Access MaxCompute using a connector:

    Access MaxCompute using an SDK: