Integration with data lakehouse development tools

更新时间:
复制 MD 格式

AnalyticDB for MySQL integrates with DataWorks and DMS so you can write, test, and schedule your data lakehouse workflows without leaving your browser. Both tools provide online integrated development environments (IDEs) with Notebook-based development, and connect directly to a scheduling system when you move from development to production.

Choose an engine and tool

AnalyticDB for MySQL provides two compute engines:

  • Spark — for Python (PySpark) and SQL (Spark SQL) workloads requiring distributed compute at scale

  • XIHE — a MySQL-compatible engine for interactive SQL analytics

Use the table below to find the integration path that fits your engine and preferred development tool.

Job development and scheduling

Compute engine Integration tool Steps References
Spark (PySpark) DataWorks 1. Create a Job resource group.<br>2. Use a DataWorks Notebook to develop a Python or SQL workflow.<br>3. Publish the Notebook to DataWorks for scheduling. Build a data lakehouse for analytics using DataWorks and AnalyticDB Spark
DMS 1. Create a Job resource group.<br>2. Use a DMS Notebook to develop a Python or SQL workflow.<br>3. Publish the Notebook to DMS Airflow for scheduling. Quickly build an open data lakehouse for analytics using AnalyticDB Spark<br>Build a data lakehouse workflow using AnalyticDB Spark and DMS Airflow
Spark (Spark SQL) DataWorks 1. Create a Spark Interactive resource group.<br>2. In DataWorks, create an AnalyticDB Spark SQL node.<br>3. Define the SQL processing logic and schedule it. ADB Spark SQL node
DMS 1. Create a Spark Interactive resource group.<br>2. On DMS, create an Airflow instance in the same VPC as the AnalyticDB cluster.<br>3. In the Airflow Directed Acyclic Graph (DAG) code, define an ADBSparkSQLOperator. DMSAnalyticDBSparkSqlOperator
XIHE DataWorks 1. In DataWorks, create an AnalyticDB Spark SQL node.<br>2. Define the SQL processing logic and schedule it. ADB for MySQL node
DMS 1. On DMS, create an Airflow instance in the same VPC as the AnalyticDB cluster.<br>2. In the Airflow DAG code, define a DMSSqlOperator. DMSSqlOperator

Data governance

Feature Description References
Data lineage Track data links across ETL tasks implemented with DataWorks SQL. Data lineage analysis
Data Quality Configure monitoring rules in DataWorks to check AnalyticDB for MySQL data quality and trigger alerts. Data Quality