This topic shows you how to read data from a partitioned table with PyODPS.
Prerequisites
Ensure the following requirements are met:
-
MaxCompute is activated.
-
DataWorks is activated.
-
You have a workflow in DataWorks. For more information, see Create a workflow.
Procedure
This example uses a DataWorks workspace in standard mode. When you create the workspace, do not select Join Public Preview of DataStudio . Workspaces in public preview are not compatible with this example.
-
Prepare the test data.
-
Create tables and upload data. For instructions, see Create tables and upload data.
The table schema and source data are as follows.
-
The following DDL statement creates the partitioned table user_detail.
CREATE TABLE IF NOT EXISTS user_detail ( userid BIGINT COMMENT 'user ID', job STRING COMMENT 'job type', education STRING COMMENT 'education level' ) COMMENT 'User information table' PARTITIONED BY (dt STRING COMMENT 'date',region STRING COMMENT 'region'); -
The following DDL statement creates the source data table user_detail_ods.
CREATE TABLE IF NOT EXISTS user_detail_ods ( userid BIGINT COMMENT 'user ID', job STRING COMMENT 'job type', education STRING COMMENT 'education level', dt STRING COMMENT 'date', region STRING COMMENT 'region' ); -
Save the following test data as a user_detail.txt file and upload the file to the user_detail_ods table.
0001,Internet,Bachelor,20190715,beijing 0002,Education,junior college,20190716,beijing 0003,Finance,Master,20190715,shandong 0004,Internet,Master,20190715,beijing
-
-
Write the data from the source data table
user_detail_odsto the partitioned tableuser_detail.-
Log on to the DataWorks console.
-
In the left-side navigation pane, click Workspaces.
-
Find the target workspace. In the Actions column, go to .
-
Right-click the workflow and choose .
-
Enter a name for the node and click Confirm.
-
In the editor for the ODPS SQL node, enter the following code.
INSERT OVERWRITE TABLE user_detail PARTITION (dt, region) SELECT userid, job, education, dt, region FROM user_detail_ods; -
Click Run.
-
-
-
Use PyODPS to read data from the partitioned table.
-
Log on to the DataWorks console.
-
In the left-side navigation pane, click Workspaces.
-
Find the target workspace. In the Actions column, go to .
-
On the Data Development page, right-click the created workflow and choose .
-
Enter a name for the node and click Confirm.
-
In the PyODPS 2 node, enter the following code.
import sys from odps import ODPS reload(sys) print('dt=' + args['dt']) # Set the default system encoding to UTF-8. sys.setdefaultencoding('utf8') # Get the table object. t = o.get_table('user_detail') # Check whether a specific partition exists. print t.exist_partition('dt=20190715,region=beijing') # List all partitions in the table. for partition in t.partitions: print partition.name # Query data by using one of the following three methods. # Method 1: Use open_reader() as a context manager. # The reader is automatically closed when the 'with' block is exited, which ensures proper resource cleanup. with t.open_reader(partition='dt=20190715,region=beijing') as reader1: count = reader1.count print("Query data from the partitioned table by using Method 1:") for record in reader1: print record[0],record[1],record[2] # Method 2: Use open_reader() without a context manager. # This method allows you to access records by column name. print("Query data from the partitioned table by using Method 2:") reader2 = t.open_reader(partition='dt=20190715,region=beijing') for record in reader2: print record["userid"],record["job"],record["education"] # Method 3: Use read_table() on the ODPS object. # This is the most concise option for simple read operations. print("Query data from the partitioned table by using Method 3:") for record in o.read_table('user_detail', partition='dt=20190715,region=beijing'): print record["userid"],record["job"],record["education"] -
Click Run with Parameters.
-
In the Parameters dialog box, set the parameters and click Run.
The parameters are as follows:
-
Resource Group Name: Select Default Resource Group.
-
dt: Enter 20190715.

-
-
View the run results on the Runtime Log tab.
Executing user script with PyODPS 0.8.0 dt=20190715 True dt='20190715',region='beijing' dt='20190715',region='shandong' dt='20190716',region='beijing' Query data from the partitioned table by using Method 1: 4 Internet master 1 Internet bachelor Query data from the partitioned table by using Method 2: 4 Internet master 1 Internet bachelor Query data from the partitioned table by using Method 3: 4 Internet master 1 Internet bachelor
-