Before running a job, you can upload resources like files and JARs or add file directories to EMR Serverless Spark. This allows your jobs to seamlessly access these resources at runtime.
Managed file directory vs. integrated file directory
Managed file directory: Upload local files from the console to a service-managed storage space. You can then directly access these files during job execution.
Integrated file directory: Mount external file systems, such as OSS, NAS, or CPFS, to your notebook sessions and jobs for direct access to the files within those directories.
Limitations
Managed file directory
The maximum size for a single uploaded file is 500 MB.
Integrated file directory
A single workspace supports a maximum of 10 integrated file directories.
You cannot mount integrated file directories for jobs submitted through Kyuubi Gateway or Spark clusters.
A CPFS integrated file directory cannot be mounted together with other types of integrated file directories (OSS or NAS). For example, you can mount multiple OSS and NAS directories at the same time, but you cannot mount an OSS directory and a CPFS directory together.
Manage the managed file directory
Upload a file
Go to the Artifacts page.
Log on to the EMR console.
In the left-side navigation pane, select .
On the Spark page, click the name of the target workspace.
In the left-side navigation pane of the EMR Serverless Spark page, click Artifacts.
On the Managed File Directory tab, click Upload File.
In the Upload File dialog box, click the upload area to select a local file, or drag a file into the area.
Manage files and folders
On the Managed File Directory tab, you can perform the following operations on existing files and folders:
For files, you can perform the following operations:
Download: Download the file to your local machine.
Copy Address: Copies the file's access path.
Delete: Delete the file.
For folders, you can perform the following operations: New Folder, Rename Folder, and Delete.
Manage the integrated file directory
After you add an integrated file directory, members with file editing permissions in the workspace can edit files and folders in an integrated OSS file directory from the Artifacts page. Members with data development permissions can read from and write to these files and folders in their data development jobs.
Add a file directory
Go to the Artifacts page.
Log on to the EMR console.
In the left-side navigation pane, select .
On the Spark page, click the name of the target workspace.
In the left-side navigation pane of the EMR Serverless Spark page, click Artifacts.
On the Integrated File Directory tab, click Create Folder.
In the Create File Directory dialog box, configure the following parameters and click OK.
OSS
The following table describes the parameters for OSS.
Parameter
Description
Name
The name of the file directory.
OSS Path
Select an OSS storage path. Ensure that the workspace's execution role has permissions to access this path.
Mounted Path
You can customize this path, but it must be under the
/mntpath.General-purpose NAS
The following table describes the parameters for General-purpose NAS.
Parameter
Description
Name
The name of the file directory.
File system
Select a General-purpose NAS file system. Ensure that the workspace's execution role has permissions to access the NAS file system.
Mount on
Configure a mount point to access the NAS file system.
File System Path
Specify an existing storage path in the NAS file system. If you leave this parameter empty, the root directory is mounted by default.
Mounted Path
You can customize this path, but it must be under the
/naspath.CPFS for Lingjun
The following table describes the parameters for CPFS for Lingjun.
Parameter
Description
Name
The name of the file directory.
File system
Select a CPFS for AI Computing file system. Ensure that the workspace's execution role has permissions to access the CPFS file system.
Mount on
Configure a mount point to access the CPFS file system.
Mounted Path
You can customize this path, but it must be under the
/cpfspath.Read-only
This option is disabled by default. When enabled, the file directory becomes read-only.
Delete a file directory
Deleting a file directory only removes its association with the corresponding OSS, NAS, or CPFS path. This action does not delete the files at the external path.
On the Integrated File Directory tab, find the directory that you want to delete and click Delete in the Actions column.
Click OK.
Next steps
Use the managed file directory: After uploading files, you can use them as dependencies or data sources in your batch or streaming task.
Use the integrated file directory: Once a directory is integrated, you can mount it when configuring notebook sessions, a batch or streaming task, or a Livy gateway task.