Conventions for data calls between data warehouse layers-MaxCompute(MaxCompute)-阿里云帮助中心

After you layer the data warehouse, you must define conventions for data calls between the layers.

Data warehouse layering separates raw ingestion from transformation and serving — but only if each layer accesses data through the correct path. Bypassing layers creates redundant processing, inconsistent outputs, and downstream jobs that are hard to maintain. The rules below define which layer reads from which, and why each constraint exists.

Principles for using data at different layers

Layer overview

Layer	Full name	Role
ODS	Operational data store (ODS) layer	Raw data ingested from source systems
CDM	Common data model (CDM) layer	Cleansing, integration, and shared metric computation (contains DWD and DWS sub-layers)
DWD	Data warehouse detail (DWD) layer	Detail-grain fact and dimension tables within CDM
DWS	Data warehouse service (DWS) layer	Pre-aggregated, coarse-grained summary tables within CDM
ADS	Application data service (ADS) layer	Serving layer that delivers data to applications and BI tools

Data access rules

ADS → CDM (required path)

The ADS layer reads from the CDM layer. If CDM already provides the required data, the ADS layer must not re-process the ODS layer directly. The CDM layer needs to actively collect data construction requirements from the ADS layer and integrate common data. The ADS layer also needs to cooperate with the CDM layer to continuously construct common data.

Not allowed: ADS jobs cannot read ODS layer data directly. If no CDM processing exists for a given ODS dataset, access it through a CDM view — not by pointing ADS directly at ODS tables.
Required: CDM views used as the access bridge must be encapsulated in periodically triggered nodes. This makes them maintainable and ensures they participate in the standard job dependency graph.

The data of the ODS layer needs to be used properly to avoid unreasonable data replication and redundant subsets.

CDM dependency depth

We recommend keeping CDM job dependency chains to 10 levels or fewer. Deep chains increase the blast radius of failures and make root-cause analysis harder.

Job-to-table cardinality

Generally, each scheduled job produces one output table. This keeps the dependency model simple: one job, one owner, one output.

If multiple jobs write to the same output table — each writing to different partitions — create a virtual node in DataWorks that depends on all contributing jobs. Generally, downstream jobs then depend on this virtual node, not on individual upstream jobs.

DWS → DWD priority (within CDM)

Within the CDM layer, the DWS layer reads from the DWD layer and preferentially uses pre-aggregated, coarse-grained data wherever possible. Avoid having DWS jobs compute directly from large DWD datasets — use the already-aggregated results instead.

Accumulating snapshot fact tables → transaction fact tables

Within CDM, accumulating snapshot fact tables preferentially use the data of transaction fact tables. This preserves consistency: snapshots reflect the same grain and business logic as the underlying transactions.

Optimize DWS to reduce ADS → DWD dependency

When ADS jobs access DWD directly, it signals a gap in DWS coverage. Extend DWS to cover those aggregation needs so that ADS can read from coarse-grained DWS data instead of computing directly against DWD.