Managing files

更新时间:
复制 MD 格式

Before running a job, you can upload resources like files and JARs or add file directories to EMR Serverless Spark. This allows your jobs to seamlessly access these resources at runtime.

Managed file directory vs. integrated file directory

  • Managed file directory: Upload local files from the console to a service-managed storage space. You can then directly access these files during job execution.

  • Integrated file directory: Mount external file systems, such as OSS, NAS, or CPFS, to your notebook sessions and jobs for direct access to the files within those directories.

Limitations

Managed file directory

  • The maximum size for a single uploaded file is 500 MB.

Integrated file directory

  • A single workspace supports a maximum of 10 integrated file directories.

  • You cannot mount integrated file directories for jobs submitted through Kyuubi Gateway or Spark clusters.

  • A CPFS integrated file directory cannot be mounted together with other types of integrated file directories (OSS or NAS). For example, you can mount multiple OSS and NAS directories at the same time, but you cannot mount an OSS directory and a CPFS directory together.

Manage the managed file directory

Upload a file

  1. Go to the Artifacts page.

    1. Log on to the EMR console.

    2. In the left-side navigation pane, select EMR Serverless > Spark.

    3. On the Spark page, click the name of the target workspace.

    4. In the left-side navigation pane of the EMR Serverless Spark page, click Artifacts.

  2. On the Managed File Directory tab, click Upload File.

  3. In the Upload File dialog box, click the upload area to select a local file, or drag a file into the area.

Manage files and folders

On the Managed File Directory tab, you can perform the following operations on existing files and folders:

  • For files, you can perform the following operations:

    • Download: Download the file to your local machine.

    • Copy Address: Copies the file's access path.

    • Delete: Delete the file.

  • For folders, you can perform the following operations: New Folder, Rename Folder, and Delete.

Manage the integrated file directory

Note

After you add an integrated file directory, members with file editing permissions in the workspace can edit files and folders in an integrated OSS file directory from the Artifacts page. Members with data development permissions can read from and write to these files and folders in their data development jobs.

Add a file directory

  1. Go to the Artifacts page.

    1. Log on to the EMR console.

    2. In the left-side navigation pane, select EMR Serverless > Spark.

    3. On the Spark page, click the name of the target workspace.

    4. In the left-side navigation pane of the EMR Serverless Spark page, click Artifacts.

  2. On the Integrated File Directory tab, click Create Folder.

  3. In the Create File Directory dialog box, configure the following parameters and click OK.

    OSS

    The following table describes the parameters for OSS.

    Parameter

    Description

    Name

    The name of the file directory.

    OSS Path

    Select an OSS storage path. Ensure that the workspace's execution role has permissions to access this path.

    Mounted Path

    You can customize this path, but it must be under the/mnt path.

    General-purpose NAS

    The following table describes the parameters for General-purpose NAS.

    Parameter

    Description

    Name

    The name of the file directory.

    File system

    Select a General-purpose NAS file system. Ensure that the workspace's execution role has permissions to access the NAS file system.

    Mount on

    Configure a mount point to access the NAS file system.

    File System Path

    Specify an existing storage path in the NAS file system. If you leave this parameter empty, the root directory is mounted by default.

    Mounted Path

    You can customize this path, but it must be under the/nas path.

    CPFS for Lingjun

    The following table describes the parameters for CPFS for Lingjun.

    Parameter

    Description

    Name

    The name of the file directory.

    File system

    Select a CPFS for AI Computing file system. Ensure that the workspace's execution role has permissions to access the CPFS file system.

    Mount on

    Configure a mount point to access the CPFS file system.

    Mounted Path

    You can customize this path, but it must be under the/cpfs path.

    Read-only

    This option is disabled by default. When enabled, the file directory becomes read-only.

Delete a file directory

Deleting a file directory only removes its association with the corresponding OSS, NAS, or CPFS path. This action does not delete the files at the external path.

  1. On the Integrated File Directory tab, find the directory that you want to delete and click Delete in the Actions column.

  2. Click OK.

Next steps