Manage dataflow tasks
This topic describes how to create and manage CPFS dataflow tasks and view task reports in the NAS console.
Prerequisites
-
A CPFS fileset has been created. For more information, see Create a fileset.
-
A dataflow has been created. For more information, see Create a dataflow.
Tasks
-
Task types
-
Based on the data operations they perform, tasks are classified into three types: import, export, and evict.
Type
Description
Import
Imports data from source storage to a CPFS file system.
-
Import type: You can import two types of data: Metadata and Data (MetaAndData).
-
Metadata: Imports only the metadata of files.
-
Data: Imports both the metadata and data of files.
-
-
Import path: The path of a file in the OSS bucket. A dataflow task imports a file to the fileset based on its path in the OSS bucket.
-
If an imported file or directory does not have POSIX metadata attributes, the default owner is root and the default permission is 0770.
Export
Exports a specified directory or file from a dataflow fileset to an OSS bucket.
-
Export path: The path of a file or directory in the CPFS file system. A dataflow task exports a file to the bucket based on its path in the fileset.
-
Empty directories, hard links, and symbolic links cannot be exported to OSS.
-
Metadata export: You can export the CreateTime, ModifyTime, Ownership, and Permission attributes of a file to an OSS bucket. However, the ChangeTime attribute is not exported to OSS.
WarningCPFS exports metadata to the custom metadata of an OSS bucket. The metadata is named
x-oss-meta-afm-xxx. Do not delete or modify this metadata. Otherwise, file system metadata errors may occur.
Evict
Releases the data of a file on a CPFS file system. After a file is evicted, only its metadata is kept on the CPFS file system. You can still see the file, but its data blocks are cleared and no longer use storage space. When you access the file, its data is loaded on demand from the source storage, such as OSS.
NoteBefore you evict a file, make sure that the latest version of the file is available in the OSS bucket.
-
-
Based on the initiator, tasks are classified as user tasks or system tasks.
Type
Description
user task
A dataflow task that you create in the console or by calling the CreateDataFlowTask API operation.
-
You can query user tasks on the pane in the console.
-
When a user task is complete, a task report is generated and saved to the .dataflow_report directory in the CPFS file system.
system task
A task that is automatically generated by CPFS after you enable Automatic Metadata Update. This task synchronizes updated file metadata from an OSS bucket to CPFS.
-
System tasks are automatically generated at the specified Metadata Refresh Interval (minutes) to synchronize updated file metadata from the OSS bucket.
-
You can query system tasks on the pane in the console.
-
System tasks do not generate task reports.
-
-
-
Task execution scope
The scope of a task can be a directory or a specified file list (EntryList). If the scope is a directory, the task traverses all files in the directory tree.
Create a dataflow task
-
Log on to the NAS console.
-
In the left-side navigation pane, choose File System > File System List.
-
In the top navigation bar, select a region.
-
On the File System List page, click the name of the file system.
-
On the details page of the file system, click Dataflow.
-
On the Dataflow tab, find the dataflow that you want to manage and click Task Management in the Actions column.
-
In the Task Management pane, click Create Job.
-
In the Create Job pane, create a task of the required type and configure its parameters.
Import data
Parameter
Description
Data Type
Select the type of data to import.
-
Data: Imports both the data blocks and metadata of files.
-
Metadata: Imports only the metadata of files.
If you import only file metadata, you can query only the filenames. When you access the file, its data is loaded from the source storage on demand.
Specify OSS Object Prefix Subdirectory
Select the directory or file list for the dataflow task.
-
Import Objects from OSS: The specified OSS directory must start and end with a forward slash (/).
-
Import Listed Objects: Each line in the file specifies the path of a file in the OSS bucket. Directories are not supported.
Export data
-
Empty directories, hard links, and symbolic links cannot be exported to an OSS bucket.
-
You can export the CreateTime, ModifyTime, Ownership, and Permission attributes of a file to an OSS bucket. However, the ChangeTime attribute is not exported to OSS.
-
CPFS exports metadata to the custom metadata of an OSS bucket. The metadata is named
x-oss-meta-afm-xxx. Do not delete or modify this metadata. Otherwise, file system metadata errors may occur.Parameter
Description
Specify CPFS Subdirectory
Select the directory or file list for the dataflow task.
-
Export Files from CPFS: The directory must start and end with a forward slash (/) and must be the path of the directory in the CPFS file system.
-
Export Listed Files: Each line in the file specifies the path of a file in the CPFS file system. Directories are not supported.
-
Evict data
Parameter
Description
Delete File
Select the directory or file list for the dataflow task.
-
Delete Files from CPFS: The directory must start and end with a forward slash (
/). -
Delete Listed Files: Each line in the file specifies the path of a file in the CPFS file system. Directories are not supported.
-
-
Review the configuration and click OK.
NoteWhile a user-created dataflow task is running, automatic updates for that dataflow are suspended.
View a task report
-
Log on to the NAS console.
-
In the left-side navigation pane, choose File System > File System List.
-
In the top navigation bar, select a region.
-
On the File System List page, click the name of the file system.
-
On the details page of the file system, click Dataflow.
-
On the Dataflow tab, find the dataflow that you want to manage and click Task Management in the Actions column.
-
In the Task Management pane, find the task for which you want to view the report and click in the Actions column.
-
Obtain the full path of the task report and download it.
Note-
Task reports are generated only for user tasks. System tasks do not generate task reports.
-
You can view the task report after the user task is complete. The report is saved to the .dataflow_report directory in the CPFS file system.
The following code provides a sample task report:
SUMMARY,dataflowId,taskId,userId,fsId,startDate,endDate,total,succ,skip,failed,throughput_MBps FILE,path,status,size SUMMARY,df-0001,task-0001,1001,cpfs-1234,1632477577,1632477677,18,10,1,7,0.01 FILE,test1/object1,cached,131072 FILE,test1/object2,cached,131072Category
Field
Description
Task statistics (SUMMARY)
dataflowId
The dataflow ID.
taskId
The task ID.
userId
The user ID.
fsId
The file system ID.
startDate
The time when the task started, in seconds since the UNIX epoch.
endDate
The time when the task ended, in seconds since the UNIX epoch.
total
The total number of files that the task processed.
succ
The number of files that were successfully processed.
skip
The number of files that were skipped. For example, files that were already imported in an import task.
failed
The number of files that failed to be processed.
throughput_MBps
The average throughput during task execution, in MB/s.
File information (FILE)
path
The path of the file in the fileset.
status
The file status.
-
cached: The file is imported or exported.
-
uncached: The file is not imported.
-
dirty: The file was modified in the CPFS file system and has not been exported.
-
NA: The file does not exist.
size
The file size, in bytes.
-
Related operations
|
Actions |
Description |
Steps |
|
View a task |
You can view the configuration and running status of a dataflow task in the console. |
|
|
Cancel a task |
You can cancel a running dataflow task in the console. |
|
|
Copy a task |
You can rerun a completed task by copying it. |
|