External tables overview

更新时间:
复制 MD 格式

MaxCompute external tables let you query and analyze data stored in external storage systems such as OSS.

How it works

MaxCompute SQL processes exabyte-scale offline data. As big data workloads expand, data increasingly resides in data lakes, real-time data warehouses, NoSQL databases, and other systems.

MaxCompute supports both data warehouse and data lake scenarios with its distributed storage and compute architecture. Two modes are available for processing external data:

  • Mode 1: Import then compute

    Import data into MaxCompute. Structured data is stored as tables for SQL queries or third-party engine access. Unstructured data can also be stored and processed in MaxCompute for higher I/O efficiency.

  • Mode 2: Compute in place

    Use MaxCompute SQL as a compute engine to query data directly in external storage, such as a data lake.

MaxCompute provides strong data warehouse governance and integrates with external storage systems. External tables are the primary mechanism for accessing external data.

  • Definition and mechanism

    • A DDL statement defines the table name, schema, properties, permissions, location, and protocol for accessing external data. This metadata is stored in MaxCompute.

    • SQL queries use this metadata to connect to external data sources, retrieve or update metadata, and read, compute, or write data.

  • Use cases

    External tables enable direct computation on external data, external data source management within MaxCompute governance, and batch processing of structured or unstructured lake data, data exchange, and real-time data warehouse archiving.

Billing

  1. Storage fees

    • No storage fees are incurred.

    • External table data remains in the external system and is not copied to MaxCompute. MaxCompute does not charge storage fees for external tables. Check the billing rules of your external data source for its storage costs.

  2. Compute fees
    Compute fees follow MaxCompute compute resource billing rules:

    • Subscription or reserved CU resources:

      • External table compute fees are included in the prepaid compute resource fees.

    • Pay-as-you-go:

      • Billed: only the volume of data scanned by compute jobs accessing OSS and OTS is metered.

      • Not yet billed: access to HDFS, Hologres, RDS, HBase, Lindorm, and similar services (whether through external tables or Lakehouse 2.0 external schemas) is not metered and does not incur compute fees.

  3. Network fees

    Connecting to external tables over the public MaxCompute endpoint incurs public network traffic and download fees. 计费项与计费方式.

  4. External data source fees

    External data sources may charge compute, access, or data transfer fees. Check the billing rules of the corresponding service.

Limits

  • Tunnel and Tunnel SDK currently do not support external tables. Upload data to MaxCompute internal tables through Tunnel, or upload to OSS through OSS Python SDK and map the data with an external table in MaxCompute.

  • External tables support writing data to external data sources, but write capabilities and consistency are subject to limitations of the external system:

    • Hologres: MaxCompute accesses Hologres metadata over JDBC and cannot guarantee write transaction atomicity. The MaxCompute SQL engine can read Hologres Pangu data directly but cannot write to it. Due to the complexity of distributed multi-process parallel writes, MaxCompute does not support INSERT OVERWRITE for Hologres, and a failed job may result in partial data writes.

    • HDFS: Also based on HMS. HDFS writes have a small probability of inconsistency.

    • OSS: OSS external table writes use .odps metadata files to reduce the probability of incomplete writes. If you disable .odps metadata files for engine compatibility, writes also have a small probability of being incomplete.

  • When you use INSERT OVERWRITE to write external data, MaxCompute writes the new data first, then deletes existing data in the table or partition during the DDL commit phase, and replaces it with the new data. Deleted data cannot be rolled back. Back up your data before writing, and validate the results after the write completes. If issues occur, re-run the full write.

References

Supported external tables

MaxCompute supports external tables for OSS, Hologres, RDS, and other services:

External table examples

The following examples show how to process unstructured data with MaxCompute external tables:

FAQ