Logical table DQL operations
Logical tables support DQL (Data Query Language) operations including standard queries and a special syntax for querying fields across associated tables.
Limits
-
The SELECT statement can return a maximum of 10,000 rows and the result set must not exceed 10MB. However, when used as a subquery, the SELECT statement is not subject to this limitation and will return all results to the parent query.
-
By default, the SELECT statement is restricted from performing a full table scan on partitioned tables.
SELECT standard query
You can use standard SELECT statements to query logical tables and retrieve data that meets specific criteria.
[with <cte>[, ...] ]
select [all | distinct] <select_expr>[, <except_expr>)][, <replace_expr>] ...
from <ld_data_unit>.<logical_table_name>
[where <where_condition>]
[group by {<col_list>|rollup(<col_list>)}]
[having <having_condition>]
[order by <order_condition>]
[distribute by <distribute_condition> [sort by <sort_condition>]|[ cluster by <cluster_condition>] ]
[limit <number>]
[window <window_clause>]
The data unit name is required and cannot be omitted.
Example explanation:
After creating the logical table 'user' in Dataphin, run the following command to retrieve data from it.
SELECT * FROM ld_test.dim_user WHERE ds>0;
SELECT special query (WHERE clause)
Dataphin provides a specialized syntax for querying fields from associated logical tables, simplifying SQL and reducing code complexity.
select <table_alias>.<column_list>
,<table_alias>.<dim_role>[.<dim_role>].<dim_columnn_list>
from <ld_data_unit>.<logical_table_name> <table_alias>
where <where_condition>
Command explanation:
-
table_alias: The alias for the logical table being queried. Required.
-
dim_role: The role name of the associated logical dimension table within the primary table.
-
column_list: The list of fields from the primary table.
-
dim_column_list: The list of fields from the associated logical dimension table.
-
ld_data_unit: The identifier for the data unit.
Example explanation:
In the following example, the order logical fact table ld_test.fct_crt_sale_order_id is linked with the customer logical dimension table ld_test.dim_customer and the address logical dimension table ld_test.dim_address. Additionally, the address logical dimension table ld_test.dim_address is connected to the city logical dimension table ld_test.dim_region_lvl3, which is a hierarchy dimension table with the role name 'dim_city' in 'dim_address'.
To query the customer name and delivery city for a specific order (order ID=123), use the following command:
select crt_sale_order_id
,ord.dim_customer.user_name
,ord.dim_address.dim_city.region_lvl3_name as city_name
from ld_test.fct_crt_sale_order_di ord
where ds = '20230101'
and crt_sale_order_id = 123
This syntax is equivalent to the following standard SQL statement:
select crt_sale_order_id
,cust.user_name
,city.region_lvl3_name as city_name
from (select *
from ld_test.fct_crt_sale_order_di
where ds = '20230101'
and crt_sale_order_id = 123) ord
left outer join
(select *
from ld_test.dim_customer
where ds = '20230101') cust
on (ord.customer_id = cust.customer_id)
left outer join
(select *
from ld_test.dim_address
where ds = '20230101') addr
on (ord.delivery_address_id = addr.address_id)
left outer join
(select *
from ld_test.dim_region_lvl3 --original city hierarchy dimension table name, role name is dim_city
where ds = '20230101') city
on (addr.city_id = city.region_lvl3_id)