Create and use MaxCompute tables

更新时间:
复制 MD 格式

You can create and manage MaxCompute tables using either MaxCompute SQL statements or the visual tools provided in DataWorks. This topic focuses on the visual method, which is more convenient than using SQL.

Usage notes

  • Principles for table operations

    All operations on MaxCompute tables must follow basic MaxCompute requirements. For example, you cannot delete fields after a table is created. For more information, see Usage limits for MaxCompute table operations.

  • Changes to physical table properties

    You can change the physical properties of a MaxCompute table only by running MaxCompute SQL commands. For more information, see Table operations. Due to permission controls on DataWorks, a RAM user may encounter errors when operating on tables in some scenarios. For more information, see Details of MaxCompute data permission control.

  • Metadata update latency

    Metadata operations may have a delay. If a table that you created, updated, or deleted does not appear as expected (for example, a deleted table is still visible), go to Data Map and refresh the table metadata to manually synchronize the result.

Prerequisites

A MaxCompute computing resource is bound to your workspace.

Go to DataStudio

Log on to the DataWorks console. In the target region, click Data Development and O&M > Data Development in the left-side navigation pane. Select a workspace from the drop-down list and click Go to Data Development.

Create a MaxCompute table

Entry points

You can create a table in the Table Management directory in DataWorks or create a table for a specific business process.

  • Create from Table Management

    In the left-side navigation pane of DataStudio, choose Table Management and click the 新建 icon to create a table. If Table Management is not displayed in the left-side navigation pane, see Customize the UI to add it.

  • Create from a business process

    DataWorks uses business processes to organize and manage code, and you can create tables relevant to them. If you do not have a business process, create one first. For more information, see Business processes. If a business process exists, you can right-click the business process and choose Create Table > MaxCompute > Table.

    Important

    If you cannot find the button, make sure that a MaxCompute computing resource is bound to the workspace.

Configure basic table information

After you create a table, the table editing page appears. On this page, DataWorks supports two methods for configuring table information: DDL mode and GUI mode.

  • Use DDL mode to configure the table

    On the table editing page, click the ddl icon and use a standard MaxCompute table creation statement to generate the table schema. After you use Data Definition Language (DDL) to generate the schema, DataWorks automatically populates the configurations on the page. You can use this method if you are accustomed to writing code for tasks.

    Important
    • The table name is defined before you go to the table editing page. You cannot change the table name in the DDL statement. Otherwise, an error is reported.

    • DDL mode allows you to define only the physical properties of a table using commands. After the DDL statement runs and populates the page, you can edit the table's business properties on the table editing page.

  • Use the GUI to configure the table

    Configure the table properties based on the instructions on the table editing page. You can use this method if you want to perform convenient visual operations. The following example describes how to configure table information in GUI mode.

    1. Configure General. The following table describes the main parameters.

      Parameter

      Description

      Display Name

      The display name of the table.

      Theme

      In DataWorks, a theme functions like a folder, defining a table's level-1 and level-2 location. You can classify tables based on business purposes and add the tables of the same type to the same folder.

      Note

      Level-1 and level-2 themes are presented as folders on the table management page in DataWorks to facilitate table management. You can quickly find the current table by theme on the Table Management page. If no themes are available, you can create one. For more information, see Define table themes.

    2. Physical Model. The following table describes the parameters.

      Parameter

      Description

      Layer

      Specifies the data warehouse layer to which the table belongs. Layers help define and manage the data warehouse structure. The layers can be divided into an Operational Data Store (ODS) layer, a Dimension (DIM) layer, a Data Warehouse Detail (DWD) layer, a Data Warehouse Summary (DWS) layer, and an Application Data Service (ADS) layer. You can add the table to a proper layer based on its business purpose.

      Note

      You can also customize layers. For more information, see Manage layers.

      Physical category

      The physical category of the table. This parameter is used to classify tables in more detail from a business perspective. The categories include Basic Business Layer, Advanced Business Layer, and Others.

      Note

      You can also customize physical categories. For more information, see Manage categories.

      Lifecycle

      The lifecycle of the table.

      • For more information about the lifecycle of a MaxCompute table, see lifecycle.

      • For more information about lifecycle operations for a MaxCompute table, see lifecycle operations.

      Partition Type

      Specifies whether the table is a partitioned table or a non-partitioned table. For more information about partitioned and non-partitioned tables in MaxCompute, see partition.

      Table Type

      Specifies whether the table is an internal table or an external table. Internal tables contain data managed by MaxCompute, which typically results in faster queries. External tables reference data in other locations, allowing you to query the data without importing it, which saves storage space. For more information about internal and external tables in MaxCompute, see Table.

    3. Schema. The Design the table schema section contains areas for field definition and partition configuration. The field definition area provides the Add Field, Move Up, and Move Down buttons. The columns in this area include Field Name, Display Name, Length/Settings, Description, and Actions (Save and Cancel). In the partition configuration area, you can click Add Partition to add a partition key and set its Date Partition Format, such as yyyymmdd, and Date Partition Granularity. The following table describes the main parameters.

      Parameter

      Description

      Field Type

      The data type of the field. You can select only data types supported by MaxCompute from the drop-down list. For more information about MaxCompute data types, see Data type editions.

      Field Security Level

      The security level of the field. This parameter is available only when label-based access control is enabled for the MaxCompute project. For more information about field security levels in MaxCompute, see Label-based access control.

      Primary Key

      Specifies the business primary key. Note that MaxCompute tables do not enforce primary keys; this parameter is for business-level management and data modeling purposes only.

Commit and publish the table

After defining the table schema, you must commit the table to the development environment and then publish it to the production environment.

Note

The process of committing and publishing MaxCompute tables varies based on the workspace mode.

  • In a workspace that runs in basic mode, you need to commit tables only to the production environment.

  • In a workspace that runs in standard mode, only users who are assigned the Workspace Administrator or O&M Personnel role can publish tables to the production environment. To allow users to publish tables to the production environment, you must grant them the required permissions. For more information about how to grant permissions, see Add workspace members and manage their roles and permissions.

Actions

Description

Load from Development Environment

Loads the development table's information onto the current page.

Important

You can perform this operation only after the table is committed to the development environment. After you perform this operation, the information about the table created in the development environment overwrites the table information on the current page.

Commit to Development Environment

Commits the table to the development environment of DataWorks. This operation creates the table in the MaxCompute project that is associated with the computing resource of the development environment. After the table is committed, you can perform the following operations:

Load from Production Environment

Loads the information about the production table from the production environment to the current page.

Important

You can perform this operation only after the table is committed to the production environment. After you perform this operation, the information about the table created in the production environment overwrites the table information on the current page.

Commit to Production Environment

Commits the table to the production environment of DataWorks. This operation creates the table in the MaxCompute project that corresponds to the MaxCompute computing resource added to the production environment. After the table is committed, you can perform the following operations:

Write to and export from tables

You can write data to and export data from MaxCompute tables as described in the following sections.

Write data

You can write data to a MaxCompute table using a data integration task or a MaxCompute node task. In addition, DataWorks allows you to use the Upload Data feature to import local data into a MaxCompute table.

Export data

After you query data in DataStudio, you can click Download on the query result page to export the data to your local computer. By default, you can download a maximum of 10,000 records at a time. Tenant administrators and tenant security administrators can go to the Data Query and Analysis Control page in the Security Center to set the maximum number of records that can be downloaded at a time.

Important

To export more than 10,000 records, use the MaxCompute client and run the Tunnel command.

Query data in a MaxCompute table

You can use an ODPS SQL node or the ad hoc query feature to run SQL statements and query data in a MaxCompute table.

Default permissions

Workspaces in basic mode lack fine-grained permission control and environment isolation. The following examples assume a workspace in standard mode.

After a RAM user is added to a workspace as a member, the RAM user is granted the following preset data access permissions:

Permission type

Description

Permissions on MaxCompute development projects

DataWorks maps a workspace-level preset role to a MaxCompute engine role in the development environment. By default, a RAM user with this role has permissions on the corresponding development project but has no permissions on the production project.

Permissions on MaxCompute production projects

RAM users with a scheduling access identity have high-level permissions on MaxCompute projects in the production environment. Other RAM users have no production permissions by default. To operate on a production table, go to the Security Center to apply for permissions.

DataWorks provides a default approval process and allows administrators to customize approval processes.

For more information about data access control for MaxCompute, see Details of MaxCompute data permission control.

Data access behavior

MaxCompute supports cross-project table queries. Therefore, you can specify a project name on the DataStudio page to query production data across projects in a DataWorks workspace. The following table describes the methods for accessing tables across projects and the accounts used for execution on different pages.

Note
  • You can view the binding information of computing resources in Data Development to learn about the MaxCompute computing resources added to different environments and the execution accounts configured for the environments. For more information, see Manage computing resources.

  • By default, in a standard mode workspace, tasks in the development environment run using the personal identity of the task executor. In the production environment, tasks run using the scheduling access identity, which is a designated Alibaba Cloud account. For more information, see Bind a MaxCompute computing resource.

Code example

Development environment

Production environment

Access a development table in a development project:

select col1 from projectname_dev.tablename;

The task executor accesses the development table using their personal Alibaba Cloud account.

  • If a RAM user runs the task, the personal Alibaba Cloud account of the RAM user is used to access the development table.

  • If an Alibaba Cloud account runs the task, the Alibaba Cloud account is used to access the development table.

The scheduling access identity is used to access the development table.

Access a production table in a production project:

select col1 from projectname.tablename;

The task executor accesses the production table using their personal Alibaba Cloud account.

Note

Due to production data security control, a personal Alibaba Cloud account does not have permissions to access production tables by default. You must go to the Security Center to apply for the permissions. DataWorks provides a default approval process and allows administrators to customize approval processes.

The scheduling access identity is used to access the production table.

Run a statement in an environment, such as the development environment, to access a table in the corresponding environment, such as a development table:

select col1 from tablename;

When the statement is run in the development environment, the personal Alibaba Cloud account of the task executor is used to access the destination table in the development engine.

When the statement is run in the production environment, the scheduling access identity is used to access the destination table in the production engine.

View MaxCompute data assets

You can view MaxCompute data assets as described in the following sections.

View production tables in a tenant

In the Tenant Tables module of Data Development, you can view all production tables in the current region that belong to your Alibaba Cloud account.

View metadata

You can go to DataMap to view the details and metadata of MaxCompute tables.

View table lineage

On the table details page in DataMap, you can view the upstream and downstream lineage of the table.

Note

To locate the tasks that operate on the table in a workspace, you can use the code search feature.

Manage tables in batches

You can manage MaxCompute tables in batches as described in the following sections.

Delete tables in batches

You can go to My Data in DataMap to batch-delete tables that you own.

Note

Due to production data security control, you cannot run commands to directly delete production tables. To delete a table that you do not own, go to the Security Center to apply for table permissions.

Change table owners in batches

You can go to My Data in DataMap to change the owners of tables that you own in batches.

Modify table lifecycles in batches

You can go to My Data in DataMap to modify the lifecycles of tables that you own in batches.

Note

You can modify the lifecycle of a single table in Table Management. For more information, see Create a MaxCompute table.