Configure Python/Shell task datasets
Usage limits
You must enable the unstructured data feature before you can add datasets.
Only Python and Shell tasks in Basic projects support adding datasets. You can add up to 5 datasets to each task.
Procedure
In the top navigation bar of the Dataphin homepage, choose Development > Data Development.
In the top navigation bar of the Development page, select a project (in Dev-Prod mode, you need to select an environment).
In the navigation pane on the left, choose Data Processing > Compute Task.
In the compute task list, click the target Python or Shell task to open its task tab.
Click Properties in the right sidebar to open the Properties panel. In the Dataset section, click Add Dataset and configure the following parameters.
Dataset: Select a file dataset or hybrid dataset in the current project.
Version: Select any version of the selected dataset.
NoteYou cannot select the same dataset version more than once.
Mount Path: This field is automatically populated with the mount path of the selected dataset version. You can modify this path. The mount path must start with
/mnt/data/. When you reference multiple datasets, their mount paths cannot be nested. For example, you cannot use two paths if one is a subdirectory of the other, such as/mnt/data/Aand/mnt/data/A/B.Read-only: By default, this option is disabled, providing read-write access. Enable this option to make the dataset read-only.
You can view or delete the added datasets.
View: Click the View icon to open the dataset editing page. On this page, you can view the dataset information, modify its basic information, and edit its versions. For more details on editing, see Dataset.
Delete: Click the Delete icon to remove the dataset.
When you reference a dataset, Dataphin mounts the appropriate path from the task's runtime environment—the development path in a development environment or the production path in a production environment—to the specified mount path. The mount path then functions as a local path in your code.