A COMMON TABLE EXPRESSION (CTE) is a temporarily named result set that simplifies SQL. MaxCompute supports standard SQL CTEs to improve the readability and execution efficiency of SQL statements. This topic describes the features, syntax, and examples of CTEs.
Overview
A CTE can be considered a temporary result set defined within the execution scope of a single DML statement. Similar to a derived table, a CTE is not stored as an object and persists only for the duration of the query. Using CTEs during development improves SQL readability and simplifies the maintenance of complex queries.
A CTE is a statement-level clause expression that begins with WITH, followed by the expression name. MaxCompute supports two forms of CTE:
NON RECURSIVE CTE: A CTE that does not reference itself and does not iterate. Use this to simplify queries that reuse the same subquery logic.
RECURSIVE CTE: A CTE that can reference itself iteratively, enabling recursive query capabilities in SQL. Typically used to traverse hierarchical data such as organizational charts.
Materialized CTE: When defining a CTE, use the MATERIALIZE HINT in the SELECT statement to cache the CTE result in a temporary table. Subsequent references read from the cache, avoiding memory limit issues in deeply nested CTE scenarios and improving performance.
NON RECURSIVE CTE
Syntax
WITH
<cte_name> [(col_name [, col_name] ...)] AS (
<cte_query>
)
[, <cte_name> [(col_name [, col_name] ...)] AS (
<cte_query2>
)
, ...]Parameters
Parameter | Required | Description |
| Yes | The name of the CTE. Must be unique within the |
| No | Output column names for the CTE. If omitted, column names are inherited from the |
| Yes | A |
Example
The following query uses a UNION ALL to combine two JOIN operations. Both joins share the same left-hand subquery, which must be duplicated without a CTE:
INSERT OVERWRITE TABLE srcp PARTITION (p='abc')
SELECT * FROM (
SELECT a.key, b.value
FROM (
SELECT * FROM src WHERE key IS NOT NULL) a
JOIN (
SELECT * FROM src2 WHERE value > 0) b
ON a.key = b.key
) c
UNION ALL
SELECT * FROM (
SELECT a.key, b.value
FROM (
SELECT * FROM src WHERE key IS NOT NULL) a
LEFT OUTER JOIN (
SELECT * FROM src3 WHERE value > 0) b
ON a.key = b.key AND b.key IS NOT NULL
) d;Rewriting with a CTE removes the duplication. The subquery a is defined once and reused by both joins:
WITH
a AS (SELECT * FROM src WHERE key IS NOT NULL),
b AS (SELECT * FROM src2 WHERE value > 0),
c AS (SELECT * FROM src3 WHERE value > 0),
d AS (SELECT a.key, b.value FROM a JOIN b ON a.key = b.key),
e AS (SELECT a.key, c.value FROM a LEFT OUTER JOIN c ON a.key = c.key AND c.key IS NOT NULL)
INSERT OVERWRITE TABLE srcp PARTITION (p='abc')
SELECT * FROM d UNION ALL SELECT * FROM e;RECURSIVE CTE
Syntax
WITH RECURSIVE <cte_name> [(col_name [, col_name] ...)] AS (
<initial_part> UNION ALL <recursive_part>
)
SELECT ... FROM ...;Parameters
Parameter | Required | Description |
| Yes | The recursive CTE clause must start with |
| Yes | The name of the CTE. Must be unique within the current |
| No | Output column names. If omitted, column names are inferred from |
| Yes | A |
| Yes | A |
| Yes | Connects |
Limitations
Recursive CTEs cannot appear in
IN,EXISTS, or scalar subqueries.The default maximum number of iterations is 10. Increase the limit by setting
odps.sql.rcte.max.iterate.num(maximum value: 100).Intermediate results are not saved between iterations. If the task fails, execution restarts from the beginning. For long-running recursive computations, either limit the number of iterations or store intermediate results in a temporary table.
Recursive CTEs are not supported in Query Acceleration (MaxQA/MCQA) mode.
In MCQA mode, if
interactive_auto_rerun=trueis set, the task falls back to normal mode. Otherwise, the task fails.In MaxQA mode, auto-fallback is not supported. The job fails directly and must be manually submitted to a batch processing quota group for retry.
Examples
Example 1: Define a recursive CTE named cte_name.
-- Method 1: Explicitly specify output column names WITH RECURSIVE cte_name(a, b) AS ( SELECT 1L, 1L -- initial_part: iteration 0 UNION ALL SELECT a+1, b+1 FROM cte_name WHERE a+1 <= 5 -- recursive_part: references previous iteration ) SELECT * FROM cte_name ORDER BY a LIMIT 100; -- Method 2: Infer column names from initial_part WITH RECURSIVE cte_name AS ( SELECT 1L AS a, 1L AS b UNION ALL SELECT a+1, b+1 FROM cte_name WHERE a+1 <= 5 ) SELECT * FROM cte_name ORDER BY a LIMIT 100;NoteIn recursive_part, set a termination condition to avoid an infinite loop. In this example,
WHERE a + 1 <= 5serves as the termination criterion. If the WHERE condition is not met, the dataset generated in the current iteration is empty, and the iteration stops.If output column names are not explicitly specified, the system supports automatic inference. For example, in Method 2, the output column names of initial_part are used as the output column names of the recursive CTE.
Result:
+------------+------------+ | a | b | +------------+------------+ | 1 | 1 | | 2 | 2 | | 3 | 3 | | 4 | 4 | | 5 | 5 | +------------+------------+Example 2: Recursive CTE in a subquery (compile error)
Recursive CTEs are not allowed inside
IN,EXISTS, or scalar subqueries. The following query fails to compile:WITH RECURSIVE cte_name(a, b) AS ( SELECT 1L, 1L UNION ALL SELECT a+1, b+1 FROM cte_name WHERE a+1 <= 5) SELECT x, x IN (SELECT a FROM cte_name) FROM VALUES (1L), (2L) AS t(x);Error:
FAILED: ODPS-0130071:[5,31] Semantic analysis exception - using Recursive-CTE cte_name in scalar/in/exists sub-query is not allowed, please check your query, the query text location is from [line 5, column 13] to [line 5, column 40]Example 3: Traverse an organizational hierarchy
Create an
employeestable and insert data:CREATE TABLE employees(name STRING, boss_name STRING); INSERT INTO TABLE employees VALUES ('zhang_3', null), ('li_4', 'zhang_3'), ('wang_5', 'zhang_3'), ('zhao_6', 'li_4'), ('qian_7', 'wang_5');Define a recursive CTE named
company_hierarchywith three output columns: the employee's name, their manager's name, and their level in the hierarchy:WITH RECURSIVE company_hierarchy(name, boss_name, level) AS ( SELECT name, boss_name, 0L FROM employees WHERE boss_name IS NULL UNION ALL SELECT e.name, e.boss_name, h.level + 1 FROM employees e JOIN company_hierarchy h ON e.boss_name = h.name ) SELECT * FROM company_hierarchy ORDER BY level, boss_name, name LIMIT 1000;Execution proceeds as follows:
Iteration 0 (
initial_part): Selects employees whereboss_name IS NULL, assigninglevel = 0. Result:('zhang_3', NULL, 0).Iteration 1 (
recursive_part): Joinsemployeeswith the working table (iteration 0). The conditione.boss_name = h.namefinds employees managed byzhang_3. Result:li_4andwang_5atlevel = 1.Iteration 2: Finds employees managed by
li_4orwang_5. Result:zhao_6andqian_7atlevel = 2.Iteration 3: No employees have
zhao_6orqian_7as managers. The working table is empty, and iteration stops.
Result:
+---------+-----------+------------+ | name | boss_name | level | +---------+-----------+------------+ | zhang_3 | NULL | 0 | | li_4 | zhang_3 | 1 | | wang_5 | zhang_3 | 1 | | zhao_6 | li_4 | 2 | | qian_7 | wang_5 | 2 | +---------+-----------+------------+Example 4: Cyclic data causes an infinite loop
Insert a record into the employees table from Example 3 where an employee is their own manager.
INSERT INTO TABLE employees VALUES('qian_7', 'qian_7');This record declares that the manager of
qian_7isqian_7themselves. Running the recursive CTE defined earlier results in an infinite loop. The system limits the maximum number of iterations. The query eventually fails.Error:
FAILED: ODPS-0010000:System internal error - recursive-cte: company_hierarchy exceed max iterate number 10
Materialized CTE
For non-recursive CTEs, MaxCompute expands all CTEs inline when generating an execution plan. For example:
WITH v1 AS (SELECT SIN(1.0) AS a)
SELECT a FROM v1 UNION ALL SELECT a FROM v1;Results:
+------------+
| a |
+------------+
| 0.8414709848078965 |
| 0.8414709848078965 |
+------------+This is equivalent to running SIN(1.0) twice:
SELECT a FROM (SELECT SIN(1.0) AS a)
UNION ALL
SELECT a FROM (SELECT SIN(1.0) AS a);In complex scenarios with deeply nested CTEs, if all CTEs are expanded into the most basic leaf nodes, the result is a very large syntax tree. This may cause failures due to excessive syntax tree nodes during execution plan generation and may lead to memory limit issues. For example:
WITH
v1 AS (SELECT 1L AS a, 2L AS b, 3L AS c),
v2 AS (SELECT * FROM v1 UNION ALL SELECT * FROM v1 UNION ALL SELECT * FROM v1),
v3 AS (SELECT * FROM v2 UNION ALL SELECT * FROM v2 UNION ALL SELECT * FROM v2),
v4 AS (SELECT * FROM v3 UNION ALL SELECT * FROM v3 UNION ALL SELECT * FROM v3),
v5 AS (SELECT * FROM v4 UNION ALL SELECT * FROM v4 UNION ALL SELECT * FROM v4),
v6 AS (SELECT * FROM v5 UNION ALL SELECT * FROM v5 UNION ALL SELECT * FROM v5),
v7 AS (SELECT * FROM v6 UNION ALL SELECT * FROM v6 UNION ALL SELECT * FROM v6),
v8 AS (SELECT * FROM v7 UNION ALL SELECT * FROM v7 UNION ALL SELECT * FROM v7),
v9 AS (SELECT * FROM v8 UNION ALL SELECT * FROM v8 UNION ALL SELECT * FROM v8)
SELECT * FROM v9;To address this issue, MaxCompute provides the materialized CTE capability, which caches CTE computation results for reference by SQL outside the WITH clause without full expansion. This mechanism effectively avoids memory limit issues caused by nested CTE expansion and improves CTE statement performance.
Usage example
Add the /*+ MATERIALIZE */ hint to the top-level SELECT of a non-recursive CTE to cache its result in a temporary table. Subsequent references read from the cache instead of re-running the query:
WITH v1 AS (SELECT /*+ MATERIALIZE */ SIN(1.0) AS a)
SELECT a FROM v1 UNION ALL SELECT a FROM v1;
-- Result:
+------------+
| a |
+------------+
| 0.8414709848078965 |
| 0.8414709848078965 |
+------------+When the hint is effective, the Job Details tab in LogView shows multiple Fuxi Jobs, confirming that the intermediate result was stored.
Limitations
The MATERIALIZE hint must be applied to the top-level SELECT statement of a non-recursive CTE. Recursive CTEs do not need the MATERIALIZE hint.
Incorrect example: In the following CTE, the top-level statement is UNION rather than SELECT, so the MATERIALIZE hint does not take effect.
WITH v1 AS ( SELECT /*+ MATERIALIZE */ SIN(1.0) AS a UNION ALL SELECT /*+ MATERIALIZE */ SIN(1.0) AS a) SELECT a FROM v1 UNION ALL SELECT a FROM v1;Correct example: Rewrite the incorrect example above by wrapping it in a subquery.
WITH v1 AS (SELECT /*+ MATERIALIZE */ * FROM (SELECT SIN(1.0) AS a UNION ALL SELECT SIN(1.0) AS a) ) SELECT a FROM v1 UNION ALL SELECT a FROM v1;
If the CTE uses a non-deterministic function such as
RANDor a non-deterministic Java/Python UDF, materializing the CTE caches a single evaluation of the function. Subsequent references return the cached value, which changes the semantic behavior of queries that expect independent random values per call.Materialized CTEs are not supported in Query Acceleration (MCQA). If the task runs in MCQA mode with
interactive_auto_rerun=true, it falls back to normal mode. Otherwise, the task fails.