You can create and manage MaxCompute tables using either MaxCompute SQL statements or the visual tools provided in DataWorks. This topic focuses on the visual method, which is more convenient than using SQL.
Usage notes
-
Principles for table operations
All operations on MaxCompute tables must follow basic MaxCompute requirements. For example, you cannot delete fields after a table is created. For more information, see Usage limits for MaxCompute table operations.
-
Changes to physical table properties
You can change the physical properties of a MaxCompute table only by running MaxCompute SQL commands. For more information, see Table operations. Due to permission controls on DataWorks, a RAM user may encounter errors when operating on tables in some scenarios. For more information, see Details of MaxCompute data permission control.
-
Metadata update latency
Metadata operations may have a delay. If a table that you created, updated, or deleted does not appear as expected (for example, a deleted table is still visible), go to Data Map and refresh the table metadata to manually synchronize the result.
Prerequisites
A MaxCompute computing resource is bound to your workspace.
Go to DataStudio
Log on to the DataWorks console. In the target region, click in the left-side navigation pane. Select a workspace from the drop-down list and click Go to Data Development.
Create a MaxCompute table
Entry points
You can create a table in the Table Management directory in DataWorks or create a table for a specific business process.
-
Create from Table Management
In the left-side navigation pane of DataStudio, choose Table Management and click the
icon to create a table. If Table Management is not displayed in the left-side navigation pane, see Customize the UI to add it. -
Create from a business process
DataWorks uses business processes to organize and manage code, and you can create tables relevant to them. If you do not have a business process, create one first. For more information, see Business processes. If a business process exists, you can right-click the business process and choose .
ImportantIf you cannot find the button, make sure that a MaxCompute computing resource is bound to the workspace.
Configure basic table information
After you create a table, the table editing page appears. On this page, DataWorks supports two methods for configuring table information: DDL mode and GUI mode.
-
Use DDL mode to configure the table
On the table editing page, click the
icon and use a standard MaxCompute table creation statement to generate the table schema. After you use Data Definition Language (DDL) to generate the schema, DataWorks automatically populates the configurations on the page. You can use this method if you are accustomed to writing code for tasks.Important-
The table name is defined before you go to the table editing page. You cannot change the table name in the DDL statement. Otherwise, an error is reported.
-
DDL mode allows you to define only the physical properties of a table using commands. After the DDL statement runs and populates the page, you can edit the table's business properties on the table editing page.
-
-
Use the GUI to configure the table
Configure the table properties based on the instructions on the table editing page. You can use this method if you want to perform convenient visual operations. The following example describes how to configure table information in GUI mode.
-
Configure General. The following table describes the main parameters.
Parameter
Description
Display Name
The display name of the table.
Theme
In DataWorks, a theme functions like a folder, defining a table's level-1 and level-2 location. You can classify tables based on business purposes and add the tables of the same type to the same folder.
NoteLevel-1 and level-2 themes are presented as folders on the table management page in DataWorks to facilitate table management. You can quickly find the current table by theme on the Table Management page. If no themes are available, you can create one. For more information, see Define table themes.
-
Physical Model. The following table describes the parameters.
Parameter
Description
Layer
Specifies the data warehouse layer to which the table belongs. Layers help define and manage the data warehouse structure. The layers can be divided into an Operational Data Store (ODS) layer, a Dimension (DIM) layer, a Data Warehouse Detail (DWD) layer, a Data Warehouse Summary (DWS) layer, and an Application Data Service (ADS) layer. You can add the table to a proper layer based on its business purpose.
NoteYou can also customize layers. For more information, see Manage layers.
Physical category
The physical category of the table. This parameter is used to classify tables in more detail from a business perspective. The categories include Basic Business Layer, Advanced Business Layer, and Others.
NoteYou can also customize physical categories. For more information, see Manage categories.
Lifecycle
The lifecycle of the table.
-
For more information about the lifecycle of a MaxCompute table, see lifecycle.
-
For more information about lifecycle operations for a MaxCompute table, see lifecycle operations.
Partition Type
Specifies whether the table is a partitioned table or a non-partitioned table. For more information about partitioned and non-partitioned tables in MaxCompute, see partition.
Table Type
Specifies whether the table is an internal table or an external table. Internal tables contain data managed by MaxCompute, which typically results in faster queries. External tables reference data in other locations, allowing you to query the data without importing it, which saves storage space. For more information about internal and external tables in MaxCompute, see Table.
-
-
Schema. The Design the table schema section contains areas for field definition and partition configuration. The field definition area provides the Add Field, Move Up, and Move Down buttons. The columns in this area include Field Name, Display Name, Length/Settings, Description, and Actions (Save and Cancel). In the partition configuration area, you can click Add Partition to add a partition key and set its Date Partition Format, such as
yyyymmdd, and Date Partition Granularity. The following table describes the main parameters.Parameter
Description
Field Type
The data type of the field. You can select only data types supported by MaxCompute from the drop-down list. For more information about MaxCompute data types, see Data type editions.
Field Security Level
The security level of the field. This parameter is available only when label-based access control is enabled for the MaxCompute project. For more information about field security levels in MaxCompute, see Label-based access control.
Primary Key
Specifies the business primary key. Note that MaxCompute tables do not enforce primary keys; this parameter is for business-level management and data modeling purposes only.
-
Commit and publish the table
After defining the table schema, you must commit the table to the development environment and then publish it to the production environment.
The process of committing and publishing MaxCompute tables varies based on the workspace mode.
-
In a workspace that runs in basic mode, you need to commit tables only to the production environment.
-
In a workspace that runs in standard mode, only users who are assigned the Workspace Administrator or O&M Personnel role can publish tables to the production environment. To allow users to publish tables to the production environment, you must grant them the required permissions. For more information about how to grant permissions, see Add workspace members and manage their roles and permissions.
|
Actions |
Description |
|
Load from Development Environment |
Loads the development table's information onto the current page. Important
You can perform this operation only after the table is committed to the development environment. After you perform this operation, the information about the table created in the development environment overwrites the table information on the current page. |
|
Commit to Development Environment |
Commits the table to the development environment of DataWorks. This operation creates the table in the MaxCompute project that is associated with the computing resource of the development environment. After the table is committed, you can perform the following operations:
|
|
Load from Production Environment |
Loads the information about the production table from the production environment to the current page. Important
You can perform this operation only after the table is committed to the production environment. After you perform this operation, the information about the table created in the production environment overwrites the table information on the current page. |
|
Commit to Production Environment |
Commits the table to the production environment of DataWorks. This operation creates the table in the MaxCompute project that corresponds to the MaxCompute computing resource added to the production environment. After the table is committed, you can perform the following operations:
|
Write to and export from tables
You can write data to and export data from MaxCompute tables as described in the following sections.
Write data
You can write data to a MaxCompute table using a data integration task or a MaxCompute node task. In addition, DataWorks allows you to use the Upload Data feature to import local data into a MaxCompute table.
Export data
After you query data in DataStudio, you can click Download on the query result page to export the data to your local computer. By default, you can download a maximum of 10,000 records at a time. Tenant administrators and tenant security administrators can go to the Data Query and Analysis Control page in the Security Center to set the maximum number of records that can be downloaded at a time.
To export more than 10,000 records, use the MaxCompute client and run the Tunnel command.
Query data in a MaxCompute table
You can use an ODPS SQL node or the ad hoc query feature to run SQL statements and query data in a MaxCompute table.
Default permissions
Workspaces in basic mode lack fine-grained permission control and environment isolation. The following examples assume a workspace in standard mode.
After a RAM user is added to a workspace as a member, the RAM user is granted the following preset data access permissions:
|
Permission type |
Description |
|
Permissions on MaxCompute development projects |
DataWorks maps a workspace-level preset role to a MaxCompute engine role in the development environment. By default, a RAM user with this role has permissions on the corresponding development project but has no permissions on the production project. |
|
Permissions on MaxCompute production projects |
RAM users with a scheduling access identity have high-level permissions on MaxCompute projects in the production environment. Other RAM users have no production permissions by default. To operate on a production table, go to the Security Center to apply for permissions. DataWorks provides a default approval process and allows administrators to customize approval processes. |
For more information about data access control for MaxCompute, see Details of MaxCompute data permission control.
Data access behavior
MaxCompute supports cross-project table queries. Therefore, you can specify a project name on the DataStudio page to query production data across projects in a DataWorks workspace. The following table describes the methods for accessing tables across projects and the accounts used for execution on different pages.
-
You can view the binding information of computing resources in Data Development to learn about the MaxCompute computing resources added to different environments and the execution accounts configured for the environments. For more information, see Manage computing resources.
-
By default, in a standard mode workspace, tasks in the development environment run using the personal identity of the task executor. In the production environment, tasks run using the scheduling access identity, which is a designated Alibaba Cloud account. For more information, see Bind a MaxCompute computing resource.
|
Code example |
Development environment |
Production environment |
|
Access a development table in a development project:
|
The task executor accesses the development table using their personal Alibaba Cloud account.
|
The scheduling access identity is used to access the development table. |
|
Access a production table in a production project:
|
The task executor accesses the production table using their personal Alibaba Cloud account. Note
Due to production data security control, a personal Alibaba Cloud account does not have permissions to access production tables by default. You must go to the Security Center to apply for the permissions. DataWorks provides a default approval process and allows administrators to customize approval processes. |
The scheduling access identity is used to access the production table. |
|
Run a statement in an environment, such as the development environment, to access a table in the corresponding environment, such as a development table:
|
When the statement is run in the development environment, the personal Alibaba Cloud account of the task executor is used to access the destination table in the development engine. |
When the statement is run in the production environment, the scheduling access identity is used to access the destination table in the production engine. |
View MaxCompute data assets
You can view MaxCompute data assets as described in the following sections.
View production tables in a tenant
In the Tenant Tables module of Data Development, you can view all production tables in the current region that belong to your Alibaba Cloud account.
View metadata
You can go to DataMap to view the details and metadata of MaxCompute tables.
View table lineage
On the table details page in DataMap, you can view the upstream and downstream lineage of the table.
To locate the tasks that operate on the table in a workspace, you can use the code search feature.
Manage tables in batches
You can manage MaxCompute tables in batches as described in the following sections.
Delete tables in batches
You can go to My Data in DataMap to batch-delete tables that you own.
Due to production data security control, you cannot run commands to directly delete production tables. To delete a table that you do not own, go to the Security Center to apply for table permissions.
Change table owners in batches
You can go to My Data in DataMap to change the owners of tables that you own in batches.
Modify table lifecycles in batches
You can go to My Data in DataMap to modify the lifecycles of tables that you own in batches.
You can modify the lifecycle of a single table in Table Management. For more information, see Create a MaxCompute table.