Manage dataflow tasks

更新时间:
复制 MD 格式

This topic describes how to create and manage CPFS dataflow tasks and view task reports in the NAS console.

Prerequisites

Tasks

  • Task types

    • Based on the data operations they perform, tasks are classified into three types: import, export, and evict.

      Type

      Description

      Import

      Imports data from source storage to a CPFS file system.

      • Import type: You can import two types of data: Metadata and Data (MetaAndData).

        • Metadata: Imports only the metadata of files.

        • Data: Imports both the metadata and data of files.

      • Import path: The path of a file in the OSS bucket. A dataflow task imports a file to the fileset based on its path in the OSS bucket.

      • If an imported file or directory does not have POSIX metadata attributes, the default owner is root and the default permission is 0770.

      Export

      Exports a specified directory or file from a dataflow fileset to an OSS bucket.

      • Export path: The path of a file or directory in the CPFS file system. A dataflow task exports a file to the bucket based on its path in the fileset.

      • Empty directories, hard links, and symbolic links cannot be exported to OSS.

      • Metadata export: You can export the CreateTime, ModifyTime, Ownership, and Permission attributes of a file to an OSS bucket. However, the ChangeTime attribute is not exported to OSS.

        Warning

        CPFS exports metadata to the custom metadata of an OSS bucket. The metadata is named x-oss-meta-afm-xxx. Do not delete or modify this metadata. Otherwise, file system metadata errors may occur.

      Evict

      Releases the data of a file on a CPFS file system. After a file is evicted, only its metadata is kept on the CPFS file system. You can still see the file, but its data blocks are cleared and no longer use storage space. When you access the file, its data is loaded on demand from the source storage, such as OSS.

      Note

      Before you evict a file, make sure that the latest version of the file is available in the OSS bucket.

    • Based on the initiator, tasks are classified as user tasks or system tasks.

      Type

      Description

      user task

      A dataflow task that you create in the console or by calling the CreateDataFlowTask API operation.

      • You can query user tasks on the Dataflow > Task Management pane in the console.

      • When a user task is complete, a task report is generated and saved to the .dataflow_report directory in the CPFS file system.

      system task

      A task that is automatically generated by CPFS after you enable Automatic Metadata Update. This task synchronizes updated file metadata from an OSS bucket to CPFS.

      • System tasks are automatically generated at the specified Metadata Refresh Interval (minutes) to synchronize updated file metadata from the OSS bucket.

      • You can query system tasks on the Dataflow > Task Management pane in the console.

      • System tasks do not generate task reports.

  • Task execution scope

    The scope of a task can be a directory or a specified file list (EntryList). If the scope is a directory, the task traverses all files in the directory tree.

Create a dataflow task

  1. Log on to the NAS console.

  2. In the left-side navigation pane, choose File System > File System List.

  3. In the top navigation bar, select a region.

  4. On the File System List page, click the name of the file system.

  5. On the details page of the file system, click Dataflow.

  6. On the Dataflow tab, find the dataflow that you want to manage and click Task Management in the Actions column.

  7. In the Task Management pane, click Create Job.

  8. In the Create Job pane, create a task of the required type and configure its parameters.

    Import data

    Parameter

    Description

    Data Type

    Select the type of data to import.

    • Data: Imports both the data blocks and metadata of files.

    • Metadata: Imports only the metadata of files.

      If you import only file metadata, you can query only the filenames. When you access the file, its data is loaded from the source storage on demand.

    Specify OSS Object Prefix Subdirectory

    Select the directory or file list for the dataflow task.

    • Import Objects from OSS: The specified OSS directory must start and end with a forward slash (/).

    • Import Listed Objects: Each line in the file specifies the path of a file in the OSS bucket. Directories are not supported.

    Export data

    • Empty directories, hard links, and symbolic links cannot be exported to an OSS bucket.

    • You can export the CreateTime, ModifyTime, Ownership, and Permission attributes of a file to an OSS bucket. However, the ChangeTime attribute is not exported to OSS.

    • CPFS exports metadata to the custom metadata of an OSS bucket. The metadata is named x-oss-meta-afm-xxx. Do not delete or modify this metadata. Otherwise, file system metadata errors may occur.

      Parameter

      Description

      Specify CPFS Subdirectory

      Select the directory or file list for the dataflow task.

      • Export Files from CPFS: The directory must start and end with a forward slash (/) and must be the path of the directory in the CPFS file system.

      • Export Listed Files: Each line in the file specifies the path of a file in the CPFS file system. Directories are not supported.

    Evict data

    Parameter

    Description

    Delete File

    Select the directory or file list for the dataflow task.

    • Delete Files from CPFS: The directory must start and end with a forward slash (/).

    • Delete Listed Files: Each line in the file specifies the path of a file in the CPFS file system. Directories are not supported.

  9. Review the configuration and click OK.

    Note

    While a user-created dataflow task is running, automatic updates for that dataflow are suspended.

View a task report

  1. Log on to the NAS console.

  2. In the left-side navigation pane, choose File System > File System List.

  3. In the top navigation bar, select a region.

  4. On the File System List page, click the name of the file system.

  5. On the details page of the file system, click Dataflow.

  6. On the Dataflow tab, find the dataflow that you want to manage and click Task Management in the Actions column.

  7. In the Task Management pane, find the task for which you want to view the report and click Report in the Actions column.

  8. Obtain the full path of the task report and download it.

    Note
    • Task reports are generated only for user tasks. System tasks do not generate task reports.

    • You can view the task report after the user task is complete. The report is saved to the .dataflow_report directory in the CPFS file system.

    The following code provides a sample task report:

    SUMMARY,dataflowId,taskId,userId,fsId,startDate,endDate,total,succ,skip,failed,throughput_MBps
    FILE,path,status,size
    
    SUMMARY,df-0001,task-0001,1001,cpfs-1234,1632477577,1632477677,18,10,1,7,0.01
    FILE,test1/object1,cached,131072
    FILE,test1/object2,cached,131072

    Category

    Field

    Description

    Task statistics (SUMMARY)

    dataflowId

    The dataflow ID.

    taskId

    The task ID.

    userId

    The user ID.

    fsId

    The file system ID.

    startDate

    The time when the task started, in seconds since the UNIX epoch.

    endDate

    The time when the task ended, in seconds since the UNIX epoch.

    total

    The total number of files that the task processed.

    succ

    The number of files that were successfully processed.

    skip

    The number of files that were skipped. For example, files that were already imported in an import task.

    failed

    The number of files that failed to be processed.

    throughput_MBps

    The average throughput during task execution, in MB/s.

    File information (FILE)

    path

    The path of the file in the fileset.

    status

    The file status.

    • cached: The file is imported or exported.

    • uncached: The file is not imported.

    • dirty: The file was modified in the CPFS file system and has not been exported.

    • NA: The file does not exist.

    size

    The file size, in bytes.

Related operations

Actions

Description

Steps

View a task

You can view the configuration and running status of a dataflow task in the console.

  1. On the Dataflow tab, find the dataflow that you want to manage and click Task Management in the Actions column.

  2. In the Task Management pane, view the details of the target task.

Cancel a task

You can cancel a running dataflow task in the console.

  1. On the Dataflow tab, find the dataflow that you want to manage and click Task Management in the Actions column.

  2. In the Task Management pane, find the target task and click Cancel in the Actions column.

  3. Confirm the task that you want to cancel and click OK.

Copy a task

You can rerun a completed task by copying it.

  1. On the Dataflow tab, find the dataflow that you want to manage and click Task Management in the Actions column.

  2. In the Task Management pane, find the target task and choose in the Actions column.

  3. Confirm the task that you want to copy and click OK.