Lindorm Distributed Processing System (LDPS) is the compute engine for Lindorm. It lets you run SQL statements to access data across multiple Lindorm engine services, including LindormTable, LindormTSDB, and LindormSearch. Before writing queries in LDPS, understand the data hierarchy and the available catalog data sources.
Data hierarchy
LDPS follows the Apache Spark SQL three-tier hierarchy. Each tier contains the next:
Catalog: The top-level container. A catalog identifies a data source and contains one or more namespaces.
Namespace: A logical grouping within a catalog, equivalent to a database or schema. A namespace contains one or more tables.
Table: A data object within a namespace, corresponding to a table in the underlying database.
To reference a table in a query, use the fully qualified three-part name <catalog>.<namespace>.<table>. For example, to query a table named tableX in a namespace named DB1 within the lindorm_table catalog:
SELECT fieldA FROM lindorm_table.DB1.tableX;Available catalogs
LDPS automatically creates catalogs based on the engine services enabled on your Lindorm instance. Run SHOW CATALOGS to list all available catalogs.
| Catalog | Description | Reference |
|---|---|---|
spark_catalog | The default LDPS catalog. Use Hive Metastore to manage its metadata. | Access data in Hive, Spark SQL reference |
lindorm_table | Read from and write to LindormTable. | Access data in LindormTable |
lindorm_cdc | Read and write Lindorm Change Data Capture (CDC) data. | Access data in a Lindorm CDC data source |
Query syntax
Use either of the following approaches to query a table.
Option 1: Fully qualified name
Prefix the table reference with the catalog and namespace:
SELECT fieldA FROM lindorm_table.DB1.tableX;Option 2: USE statement
Set the active catalog and namespace first, then reference the table directly:
USE lindorm_table.DB1;
SELECT fieldA FROM tableX;