What is Alibaba Cloud MaxCompute-MaxCompute(MaxCompute)-阿里云帮助中心

Why MaxCompute

Traditional data platforms require you to provision and manage clusters, plan capacity in advance, and handle infrastructure scaling manually. MaxCompute eliminates all of that:

No hardware or software to install, configure, or manage
No cluster capacity planning — resources scale independently with your workload
No compute costs when idle — pay-as-you-go billing only charges for what you use

Storage and compute scale independently, so you can connect all your data assets in one place and eliminate data silos without over-provisioning for either dimension.

What you can build

MaxCompute provides a single platform for the following workloads:

Data warehousing and BI: Run SQL-based analytics on datasets at TB, PB, and EB scale. Integrate with Quick BI for dashboards and reports, or connect via Java Database Connectivity (JDBC) to any compatible BI tool.
Near real-time analytics: Ingest streaming data directly into the warehouse and run high-concurrency queries at second-level latency. Deep integration with Hologres delivers 10x or greater query acceleration without moving data.
Data lake analytics: Query data in OSS or Hadoop Distributed File System (HDFS) directly, using external tables or Apache Spark. Run joint analysis across your data lake and data warehouse in a single service.
AI and machine learning: Train models on MaxCompute data using Platform for AI (PAI), run Spark-ML pipelines, or bring in third-party Python machine learning libraries — all within the same compute environment.
Data engineering: Build ETL/ELT pipelines with DataWorks, manage workflows, and govern data across projects. Ingest from major streaming sources, including Apache Flink and Apache Kafka, with native connectors.

Core features

Serverless compute

MaxCompute is an out-of-the-box service accessed through APIs. A pre-provisioned shared resource pool handles peak workloads on demand — no cluster setup, no idle capacity costs.

Independent storage and compute scaling

Storage and compute resources scale independently. This lets you handle sudden traffic spikes without over-provisioning, and grow your data assets without being constrained by compute costs.

Columnar storage with 5x compression

The MaxCompute storage engine uses columnar storage, which typically achieves a 5x compression ratio compared to row-based formats. This significantly reduces storage costs at scale.

Unified batch and streaming

MaxCompute supports both batch and real-time data ingestion in a single service. Streaming data written to MaxCompute is available for analysis without a separate pipeline to a different system.

Lakehouse architecture

Map external data in OSS or HDFS to external tables and query it alongside warehouse data in a unified interface. For more information, see Lakehouse of MaxCompute.

SQL and multi-engine support

MaxCompute provides a native SQL compute engine for standard SQL workloads. For Spark and Mars workloads, the CUPID computing platform runs third-party engine tasks with full integration into MaxCompute's storage, permissions, and resource system.

Security and compliance

MaxCompute includes over 20 security features that meet Level 3 standards for classified information security protection. Coverage spans infrastructure, data centers, networks, power supply, platform security, permission management, and privacy protection — combining capabilities from both open-source big data systems and managed databases.

Video introduction

Product architecture

The following figure shows the MaxCompute architecture.

MaxCompute is designed around a layered architecture that separates storage, compute, and service concerns — so each layer can scale and evolve independently.

Module	Description
Storage engine	The MaxCompute storage engine stores tables and resources using columnar storage with a typical 5x compression ratio. External tables provide direct read access to data in OSS, Tablestore, and RDS without copying data into MaxCompute.
Compute engine	Two compute engines handle different workload types: the MaxCompute SQL engine runs SQL tasks directly (see Overview of MaxCompute SQL); the CUPID computing platform runs Apache Spark and Mars tasks with full access to MaxCompute storage and permissions.
Cloud service layer	Task queues with configurable resource allocations and priorities control how jobs execute. A scheduling system manages compute resource allocation across all queues. Multi-layered data protection — including project-level isolation, access control, and data encryption — is enforced at this layer.
Metadata and security	Information Schema provides tenant-level metadata and historical usage logs covering resource consumption, run duration, and data processed. This supports job optimization and capacity planning. A comprehensive security system adds access control, data encryption, and dynamic data masking. See Security features.
User interfaces	Tunnel — a data transmission service with shared and dedicated cluster options. Restful API and SDKs — Restful API, SDK for Java, and PyODPS. [JDBC driver](https://www.alibabacloud.com/help/en/document_detail/143408.html#concept-2338749) — for connecting standard BI and SQL tools. Connectors — for Apache Flink, Apache Spark, and Apache Kafka.
TopConsole	The MaxCompute console for project management, quota management, and tenant management. Includes job operations and maintenance (O&M), resource monitoring, materialized views, and cost analytics and optimization. See Resource management and use.
Data ecosystem support	Deep integration with DataWorks for one-stop data development, analytics, and governance. Also supports data lake connectivity, data integration, and development with third-party engines.

Ecosystem integrations

MaxCompute integrates with the following Alibaba Cloud services:

DataWorks — Build and schedule ETL/ELT pipelines, manage data governance policies, and develop SQL or Spark jobs in a web-based IDE, all with native MaxCompute project support.
Platform for AI (PAI) — Train machine learning models directly on MaxCompute data using built-in algorithm components, without exporting data to a separate environment.
Hologres — Deeply integrates with MaxCompute as a real-time data warehouse, supporting queries on associated external tables and direct reads from the storage layer, achieving over 5x higher query efficiency than other external table types. Accelerate interactive queries on MaxCompute data with 10x or greater performance. Hologres also supports batch import of MaxCompute metadata, eliminating the need to manually create external tables.
Quick BIFor more information about the Alibaba Cloud services that integrate with MaxCompute, see Supported Alibaba Cloud services. — Create reports and interactive dashboards directly on MaxCompute data with no data movement.

Use cases

MaxCompute has been validated at scale within Alibaba Group and is suited for:

Data warehousing and Business Intelligence (BI) analytics for large internet companies
Website log analysis
E-commerce transaction analysis
User behavior and interest analysis

Product advantages

Easy to use: High-performance storage and compute optimized for data warehousing. Pre-integrated with services and supports standard SQL for simple development. Built-in management and security capabilities. Fully managed with a pay-as-you-go model; you incur no compute costs when not in use.
Elasticity that matches business growth: With decoupled storage and compute, resources scale independently and dynamically. This on-demand elasticity meets sudden business growth without upfront capacity planning.
Support for various analytics scenarios: Supports an open data ecosystem, providing a unified platform for data warehousing, BI, Near-real-time Analytics, Data Lake Analysis, and Machine Learning.For more information about its development history, product awards, and customer use cases, see Development history and Customer use cases.
Open platform: Provides open APIs and a rich ecosystem, offering flexibility for data and application migration and custom development. Flexibly combines with open-source and commercial products like Airflow and Tableau to build a wide range of data applications.

What is MaxCompute?