Enable Kerberos authentication

更新时间:
复制 MD 格式

You can configure and enable Kerberos authentication in a Serverless Spark workspace. Once enabled, clients must use Kerberos authentication when submitting Spark tasks. This enhances task execution security.

Prerequisites

  • You have created a principal and uploaded its exported keytab file to OSS.

    If you use an EMR on ECS cluster, see Basic Kerberos usage for more information.

  • A Serverless Spark workspace has been created. For more information, see Manage workspaces.

Limitations

  • A workspace can be bound to only one Kerberos cluster.

  • Kerberos authentication is supported only for Spark batch jobs.

Procedure

Step 1: Prepare the network

Before you configure Kerberos authentication, you must set up the network to ensure connectivity between Serverless Spark and your VPC. For more information, see Establish network connectivity between EMR Serverless Spark and other VPCs.

Note

When adding a security group rule, we recommend opening UDP port 88, which is the default port for the Kerberos service.

Step 2: Configure Kerberos authentication

  1. Navigate to the Kerberos Authentication page.

    1. Log on to the EMR console.

    2. In the left navigation pane, choose EMR Serverless > Spark.

    3. On the Spark page, click the name of the target workspace.

    4. On the EMR Serverless Spark page, choose Security Center > Kerberos Authentication in the left navigation pane.

  2. Click Attach Kerberos.

  3. On the Attach Kerberos page, configure the following parameters and click OK.

    Parameter

    Description

    Kerberos Name

    Enter a custom name.

    Normal Network Connection

    Select the network connection that you created.

    Kerberos krb5.conf

    1. Enter the content of the krb5.conf file.

      The krb5.conf file is typically located in the /etc/krb5.conf directory on the Kerberos server. You can obtain the file content based on your environment:

      • If you use the Kerberos service of an EMR Datalake cluster, perform the following steps to obtain the content:

        1. Log on to the master node of the EMR cluster. For more information, see Log on to a cluster.

        2. Run the following command to view and manually copy the content of the /etc/krb5.conf file.

          cat /etc/krb5.conf
        3. Paste the content into the Kerberos krb5.conf field.

      • For other EMR clusters or self-managed Kerberos services, replace the hostname in the file with a private IP address in the VPC.

    2. (Optional) Depending on the network protocol, you might need to add configurations to the krb5.conf file.

      • If you opened UDP port 88 when you configured the network connection in Step 1: Prepare the network, no additional configuration is required.

      • If you use the TCP protocol for the network connection configured in Step 1: Prepare the network, you must add the udp_preference_limit = 1 configuration under the [libdefaults] section.

        image

  4. In the Actions column, click Enable Authentication.

  5. In the dialog box, click OK.

Step 3: Submit a Spark batch job

After enabling Kerberos authentication, you must provide client credentials when you submit a Spark batch job. If you submit a job without the required configurations, the job fails with an error indicating that spark.kerberos.keytab and spark.kerberos.principal not configured.

  1. Create a Spark batch job. For more information, see PySpark Quick Start.

  2. On the new development tab, add the following configurations and then click Run.

    image

    Parameter

    Description

    Normal Network Connection

    Select the name of the network connection that you created in Step 1.

    Spark Configuration

    Configure the following parameters.

    spark.files oss://<bucketname>/path/test.keytab
    spark.kerberos.keytab test.keytab
    spark.kerberos.principal <username>@<REALM>

    The following table describes the parameters.

    • spark.files: The complete OSS path of the uploaded keytab file.

    • spark.kerberos.keytab: The name of the keytab file.

    • spark.kerberos.principal: The principal in the keytab file used for authentication with the Kerberos service. You can run the klist -kt <keytab_file> command to view the principal name in the target keytab file.

    If you need to connect to a Kerberos-enabled Hive Metastore to obtain metadata, add the following information to the Spark Configuration section.

    spark.hive.metastore.sasl.enabled true
    spark.hive.metastore.kerberos.principal hive/<hostname>@<REALM>

    For the spark.hive.metastore.kerberos.principal parameter, specify the principal from the keytab file used by the Hive Metastore. To find the path of the keytab file, go to the EMR on ECS console, find the Hive service, and navigate to the Configure page. On the hive-site.xml tab, check the value of the hive.metastore.kerberos.keytab.file parameter. You can then run the klist -kt <path_to_hive_metastore_keytab_file> command to view the principal.

    The value of the spark.hive.metastore.kerberos.principal parameter must be in one of the following formats:

    • The typical format is hive/<hostname>@<REALM>. In this format, <hostname> is the fully qualified domain name (FQDN) of the Hive Metastore node. You can run the hostname -f command to obtain the FQDN. <REALM> is the realm of the Key Distribution Center (KDC).

    • If the Hive Metastore connection address uses a hostname, you can simplify the format to hive/_HOST@<REALM>. When connecting, Spark automatically replaces _HOST with the hostname in the Hive Metastore connection address. This format is required if you need to configure multiple Hive Metastores.

  3. Once the job is running, go to the Execution Records section and click Details in the Actions column of the job.

  4. On the Application page in Job History, you can view the log information.

    image

Step 4 (Optional): Connect to a Kerberos metastore

If your workspace data catalog connects to a Kerberos-enabled Hive Metastore, you must specify the keytab file path and principal name when adding the external Hive Metastore.

image

  • Metastore Service Address: The address must match the service address defined in the metastore service's principal. The address is typically a hostname.

  • Kerberos keytab file path: The path to the Kerberos keytab file.

  • Kerberos principal: The principal in the keytab file used for authentication. To find the principal name, run the klist -kt <keytab_file> command.