After you layer the data warehouse, you must define conventions for data calls between the layers.
Data warehouse layering separates raw ingestion from transformation and serving — but only if each layer accesses data through the correct path. Bypassing layers creates redundant processing, inconsistent outputs, and downstream jobs that are hard to maintain. The rules below define which layer reads from which, and why each constraint exists.
Principles for using data at different layers
Layer overview
| Layer | Full name | Role |
|---|---|---|
| ODS | Operational data store (ODS) layer | Raw data ingested from source systems |
| CDM | Common data model (CDM) layer | Cleansing, integration, and shared metric computation (contains DWD and DWS sub-layers) |
| DWD | Data warehouse detail (DWD) layer | Detail-grain fact and dimension tables within CDM |
| DWS | Data warehouse service (DWS) layer | Pre-aggregated, coarse-grained summary tables within CDM |
| ADS | Application data service (ADS) layer | Serving layer that delivers data to applications and BI tools |
Data access rules
ADS → CDM (required path)
The ADS layer reads from the CDM layer. If CDM already provides the required data, the ADS layer must not re-process the ODS layer directly. The CDM layer needs to actively collect data construction requirements from the ADS layer and integrate common data. The ADS layer also needs to cooperate with the CDM layer to continuously construct common data.
Not allowed: ADS jobs cannot read ODS layer data directly. If no CDM processing exists for a given ODS dataset, access it through a CDM view — not by pointing ADS directly at ODS tables.
Required: CDM views used as the access bridge must be encapsulated in periodically triggered nodes. This makes them maintainable and ensures they participate in the standard job dependency graph.
The data of the ODS layer needs to be used properly to avoid unreasonable data replication and redundant subsets.
CDM dependency depth
We recommend keeping CDM job dependency chains to 10 levels or fewer. Deep chains increase the blast radius of failures and make root-cause analysis harder.
Job-to-table cardinality
Generally, each scheduled job produces one output table. This keeps the dependency model simple: one job, one owner, one output.
If multiple jobs write to the same output table — each writing to different partitions — create a virtual node in DataWorks that depends on all contributing jobs. Generally, downstream jobs then depend on this virtual node, not on individual upstream jobs.
DWS → DWD priority (within CDM)
Within the CDM layer, the DWS layer reads from the DWD layer and preferentially uses pre-aggregated, coarse-grained data wherever possible. Avoid having DWS jobs compute directly from large DWD datasets — use the already-aggregated results instead.
Accumulating snapshot fact tables → transaction fact tables
Within CDM, accumulating snapshot fact tables preferentially use the data of transaction fact tables. This preserves consistency: snapshots reflect the same grain and business logic as the underlying transactions.
Optimize DWS to reduce ADS → DWD dependency
When ADS jobs access DWD directly, it signals a gap in DWS coverage. Extend DWS to cover those aggregation needs so that ADS can read from coarse-grained DWS data instead of computing directly against DWD.