Use a DLF catalog

更新时间:
复制 MD 格式

EMR Serverless Spark lets you view databases and tables in bound data catalogs and add existing catalogs.

Limitations

This feature is only supported in EMR versions esr-4.3.0, esr-3.3.0, esr-2.7.0, and later.

Add a DLF catalog

After you bind a DLF catalog to your Serverless Spark workspace, submitted jobs can access it by default.

Note

After you bind a DLF catalog to a Serverless Spark workspace, both Livy Gateway and Kyuubi Gateway natively use it as the default data catalog.

During workspace creation

To create a Serverless Spark workspace, see Create a workspace.

When you create the workspace, enable Use DLF as Metadata Service. In the DLF Data Catalog section, select the data catalog to bind. The Execution Role defaults to AliyunEMRSparkJobRunDefaultRole.

To an existing workspace

  1. Go to the Data Catalog page.

    1. Log on to the EMR console.

    2. In the left-side navigation pane, choose EMR Serverless > Spark.

    3. On the Spark page, click the name of the target workspace.

    4. On the EMR Serverless Spark page, click Catalog in the left-side navigation pane.

      Note

      The Data Catalog page shows the databases and tables in the DLF data catalog you selected when creating the workspace.

  2. Click Add Catalog.

  3. In the Add Catalog dialog box, configure the settings and click Add.

    • DLF Catalog: A metadata management service for managing and querying metadata in a data lake. Select an existing DLF data catalog or create a new one to access metadata in your data lake.

To create a new DLF data catalog, click Create Catalog. You are redirected to the Data Lake Formation console. For details, see Metadata management.

Related documentation

Manage data catalogs