ALIYUN::PAI::Dataset

更新时间:
复制 MD 格式

Creates a PAI dataset for machine learning data management.

Syntax

{
  "Type": "ALIYUN::PAI::Dataset",
  "Properties": {
    "Options": String,
    "Description": String,
    "Accessibility": String,
    "DatasetName": String,
    "SourceType": String,
    "SourceId": String,
    "DataSourceType": String,
    "WorkspaceId": String,
    "DataType": String,
    "Uri": String,
    "Property": String
  }
}

Properties

Property Name

Type

Required

Update allowed

Description

Constraints

Options

String

No

Yes

An extension field.

When using the dataset with DLC, set mountPath to the default mount path. Example:

{ "mountPath": "/mnt/data/" }

Description

String

No

Yes

The description.

Data used for annotation.

Accessibility

String

No

Yes

The dataset visibility.

Valid values:

  • PRIVATE (default): Visible only to you and the workspace administrator.

  • PUBLIC: Visible to all workspace users.

DatasetName

String

Yes

Yes

The name of the dataset.

Naming rules:

  • Starts with a letter, digit, or Chinese character.

  • Can contain underscores (_) and hyphens (-).

  • Length: 1–127 characters.

SourceType

String

No

No

The type of the data source.

Valid values:

  • USER (default): User-created dataset.

  • ITAG: iTag annotation platform.

  • PAI_PUBLIC_DATASET: PAI public dataset.

SourceId

String

No

No

The ID of the data source.

Values:

  • If SourceType is USER, specify a custom value.

  • If SourceType is ITAG, set to the iTag task ID.

  • If SourceType is PAI_PUBLIC_DATASET, leave empty.

DataSourceType

String

Yes

No

The type of the data source.

Valid values:

  • NAS: Alibaba Cloud NAS.

  • OSS: Alibaba Cloud Object Storage Service.

WorkspaceId

String

Yes

No

The workspace ID.

None

DataType

String

No

No

The type of the dataset.

Valid values:

  • COMMON (default): General purpose.

  • PIC: Image.

  • TEXT: Text.

  • VIDEO: Video.

  • AUDIO: Audio.

Uri

String

Yes

No

The data URI.

Values:

  • For an OSS data source: oss://bucket.endpoint/object

  • For a NAS data source:

    • General-purpose NAS: nas://<nasfisid>.region/subpath/to/dir/

    • For CPFS 1.0:

      nas://<cpfs-fsid>.region/subpath/to/dir/

    • For CPFS 2.0:

      nas://<cpfs-fsid>.region/<protocolserviceid>/

    Note

    CPFS 1.0 fsid format: CPFS-<8 ASCII characters>. CPFS 2.0 fsid format: CPFS-<16 ASCII characters>.

Property

String

Yes

No

The property of the dataset.

Valid values:

  • FILE: A file.

  • DIRECTORY: A directory.

Return values

Fn::GetAtt

  • Options: The extension field.

  • Description: The dataset description.

  • Accessibility: The dataset visibility.

  • SourceId: The source ID.

  • CreateTime: The creation time.

  • SourceType: The source type.

  • WorkspaceId: The workspace ID.

  • Uri: The data URI.

  • GmtModifiedTime: The last modification time.

  • DatasetId: The dataset ID.

  • OwnerId: The Alibaba Cloud account ID.

  • DatasetName: The dataset name.

  • UserId: The user ID.

  • DataSourceType: The data source type.

  • DataType: The data type.

  • Property: The dataset property.

Examples

YAML format

ROSTemplateFormatVersion: '2015-09-01'
Parameters:
  DataSourceType:
    AllowedValues:
    - OSS
    - NAS
    Description: 'The type of the data source. Valid values:
      - OSS: Alibaba Cloud Object Storage Service (OSS).
      - NAS: Alibaba Cloud NAS.'
    Type: String
  DatasetName:
    Description: 'The name of the dataset. The naming convention is as follows:
      - Must start with a lowercase letter, an uppercase letter, a digit, or a Chinese character.
      - Can contain underscores (_) or hyphens (-).
      - The length must be 1 to 127 characters.'
    Type: String
  Property:
    AllowedValues:
    - FILE
    - DIRECTORY
    Description: 'The property of the dataset. Valid values:
      - FILE: A file.
      - DIRECTORY: A directory.'
    Type: String
  Uri:
    Description: 'The URI configuration. The format varies based on the data source type. Examples:
      - If the data source type is OSS: ''oss://bucket.endpoint/object''
      - If the data source type is NAS:
      For a General-purpose NAS file system: ''nas://<nasfisid>.region/subpath/to/dir/''
      For CPFS 1.0: ''nas://<cpfs-fsid>.region/subpath/to/dir/''
      For CPFS 2.0: ''nas://<cpfs-fsid>.region/<protocolserviceid>/''
      Note: CPFS 1.0 and CPFS 2.0 are distinguished by the format of the fsid. The format for CPFS 1.0 is `CPFS-<8 ASCII characters>`. The format for CPFS 2.0 is `CPFS-<16 ASCII characters>`.'
    Type: String
  WorkspaceId:
    Description: 'The ID of the workspace where the dataset resides. For more information about how to obtain the workspace ID, see [ListWorkspaces](~~449124~~).
      If you do not specify this parameter, the default workspace is used. If the default workspace does not exist, an error is reported.'
    Type: String
Resources:
  ExtensionResource:
    Properties:
      DataSourceType:
        Ref: DataSourceType
      DatasetName:
        Ref: DatasetName
      Property:
        Ref: Property
      Uri:
        Ref: Uri
      WorkspaceId:
        Ref: WorkspaceId
    Type: ALIYUN::PAI::Dataset
Outputs:
  Accessibility:
    Description: The visibility of the workspace.
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - Accessibility
  CreateTime:
    Description: The time when the resource was created.
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - CreateTime
  DataSourceType:
    Description: The type of the data source.
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - DataSourceType
  DataType:
    Description: The type of the dataset. The default value is COMMON.
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - DataType
  DatasetId:
    Description: The ID of the resource.
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - DatasetId
  DatasetName:
    Description: The name of the dataset.
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - DatasetName
  Description:
    Description: The custom description of the dataset.
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - Description
  GmtModifiedTime:
    Description: The time when the dataset was last updated.
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - GmtModifiedTime
  Options:
    Description: The extension field, in the JSON string format.
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - Options
  OwnerId:
    Description: The ID of the Alibaba Cloud account.
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - OwnerId
  Property:
    Description: The property of the dataset.
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - Property
  SourceId:
    Description: The ID of the data source.
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - SourceId
  SourceType:
    Description: The type of the data source. The default value is USER.
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - SourceType
  Uri:
    Description: The URI configuration.
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - Uri
  UserId:
    Description: The ID of the user who owns the dataset.
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - UserId
  WorkspaceId:
    Description: 'The ID of the workspace where the dataset resides. For more information about how to obtain the workspace ID, see [ListWorkspaces](~~449124~~).'
    Value:
      Fn::GetAtt:
      - ExtensionResource
      - WorkspaceId

JSON format

{
  "ROSTemplateFormatVersion": "2015-09-01",
  "Parameters": {
    "DataSourceType": {
      "AllowedValues": [
        "OSS",
        "NAS"
      ],
      "Description": "The type of the data source. Valid values:\n- OSS: Alibaba Cloud Object Storage Service (OSS).\n- NAS: Alibaba Cloud NAS.",
      "Type": "String"
    },
    "DatasetName": {
      "Description": "The name of the dataset. The naming convention is as follows:\n- Must start with a lowercase letter, an uppercase letter, a digit, or a Chinese character.\n- Can contain underscores (_) or hyphens (-).\n- The length must be 1 to 127 characters.",
      "Type": "String"
    },
    "Property": {
      "AllowedValues": [
        "FILE",
        "DIRECTORY"
      ],
      "Description": "The property of the dataset. Valid values:\n- FILE: A file.\n- DIRECTORY: A directory.",
      "Type": "String"
    },
    "Uri": {
      "Description": "The URI configuration. The format varies based on the data source type. Examples:\n- If the data source type is OSS: 'oss://bucket.endpoint/object'\n- If the data source type is NAS:\nFor a General-purpose NAS file system: 'nas://<nasfisid>.region/subpath/to/dir/'\nFor CPFS 1.0: 'nas://<cpfs-fsid>.region/subpath/to/dir/'\nFor CPFS 2.0: 'nas://<cpfs-fsid>.region/<protocolserviceid>/'\nNote: CPFS 1.0 and CPFS 2.0 are distinguished by the format of the fsid. The format for CPFS 1.0 is `CPFS-<8 ASCII characters>`. The format for CPFS 2.0 is `CPFS-<16 ASCII characters>`.",
      "Type": "String"
    },
    "WorkspaceId": {
      "Description": "The ID of the workspace where the dataset resides. For more information about how to obtain the workspace ID, see [ListWorkspaces](~~449124~~).\nIf you do not specify this parameter, the default workspace is used. If the default workspace does not exist, an error is reported.",
      "Type": "String"
    }
  },
  "Resources": {
    "ExtensionResource": {
      "Properties": {
        "DataSourceType": {
          "Ref": "DataSourceType"
        },
        "DatasetName": {
          "Ref": "DatasetName"
        },
        "Property": {
          "Ref": "Property"
        },
        "Uri": {
          "Ref": "Uri"
        },
        "WorkspaceId": {
          "Ref": "WorkspaceId"
        }
      },
      "Type": "ALIYUN::PAI::Dataset"
    }
  },
  "Outputs": {
    "Accessibility": {
      "Description": "The visibility of the workspace.",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "Accessibility"
        ]
      }
    },
    "CreateTime": {
      "Description": "The time when the resource was created.",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "CreateTime"
        ]
      }
    },
    "DataSourceType": {
      "Description": "The type of the data source.",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "DataSourceType"
        ]
      }
    },
    "DataType": {
      "Description": "The type of the dataset. The default value is COMMON.",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "DataType"
        ]
      }
    },
    "DatasetId": {
      "Description": "The ID of the resource.",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "DatasetId"
        ]
      }
    },
    "DatasetName": {
      "Description": "The name of the dataset.",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "DatasetName"
        ]
      }
    },
    "Description": {
      "Description": "The custom description of the dataset.",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "Description"
        ]
      }
    },
    "GmtModifiedTime": {
      "Description": "The time when the dataset was last updated.",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "GmtModifiedTime"
        ]
      }
    },
    "Options": {
      "Description": "The extension field, in the JSON string format.",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "Options"
        ]
      }
    },
    "OwnerId": {
      "Description": "The ID of the Alibaba Cloud account.",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "OwnerId"
        ]
      }
    },
    "Property": {
      "Description": "The property of the dataset.",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "Property"
        ]
      }
    },
    "SourceId": {
      "Description": "The ID of the data source.",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "SourceId"
        ]
      }
    },
    "SourceType": {
      "Description": "The type of the data source. The default value is USER.",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "SourceType"
        ]
      }
    },
    "Uri": {
      "Description": "The URI configuration.",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "Uri"
        ]
      }
    },
    "UserId": {
      "Description": "The ID of the user who owns the dataset.",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "UserId"
        ]
      }
    },
    "WorkspaceId": {
      "Description": "The ID of the workspace where the dataset resides. For more information about how to obtain the workspace ID, see [ListWorkspaces](~~449124~~).",
      "Value": {
        "Fn::GetAtt": [
          "ExtensionResource",
          "WorkspaceId"
        ]
      }
    }
  }
}