This topic describes how to pass parameters in a PyODPS node in DataWorks.
Prerequisites
The following are required:
-
MaxCompute is activated.
-
DataWorks is activated.
-
A workflow is created in DataWorks. For more information, see Create a workflow.
Procedure
This example uses a DataWorks workspace in basic mode. When you create the workspace, do not select the Participate in Public Preview of Data Studio option. This example does not apply to workspaces in public preview.
-
Prepare test data.
-
Create a table and upload data. For more information, see Create a table and upload data.
The following are the table schemas and source data.
-
The table creation statement for the partitioned table user_detail is as follows.
CREATE TABLE IF NOT EXISTS user_detail ( userid BIGINT COMMENT 'User ID', job STRING COMMENT 'Job type', education STRING COMMENT 'Education' ) COMMENT 'User information table' PARTITIONED BY (dt STRING COMMENT 'Date',region STRING COMMENT 'Region'); -
The statement to create the source table user_detail_ods is as follows.
CREATE TABLE IF NOT EXISTS user_detail_ods ( userid BIGINT COMMENT 'User ID', job STRING COMMENT 'Job type', education STRING COMMENT 'Education', dt STRING COMMENT 'Date', region STRING COMMENT 'Region' ); -
Save the test data to the user_detail.txt file. Upload this file to the user_detail_ods table.
0001,Internet,Bachelor,20190715,beijing 0002,Education,Associate Degree,20190716,beijing 0003,Finance,Master,20190715,shandong 0004,Internet,Master,20190715,beijing
-
-
Write data from the source data table
user_detail_odsto the partitioned tableuser_detail.-
Log on to the DataWorks console.
-
In the left-side navigation pane, click Workspace.
-
Find the target workspace and in the Actions column, click .
-
Right-click the workflow and choose .
-
Enter a node name and click OK.
-
In the ODPS SQL node, enter the following code.
INSERT OVERWRITE TABLE user_detail PARTITION (dt, region) SELECT userid, job, education, dt, region FROM user_detail_ods; -
Click Run to write the data.
-
-
-
Use PyODPS to pass parameters.
-
Log on to the DataWorks console.
-
In the left-side navigation pane, click Workspace.
-
Find the target workspace and in the Actions column, click .
-
On the Data Development page, right-click the workflow that you created and choose .
-
Enter a node name and click OK.
-
In the PyODPS 2 node, enter the following code to pass parameters.
import sys reload(sys) print('dt=' + args['dt']) # Change the default system encoding. sys.setdefaultencoding('utf8') # Get the table. t = o.get_table('user_detail') # Receive the passed partition parameter. with t.open_reader(partition='dt=' + args['dt'] + ',region=beijing') as reader1: count = reader1.count print("Querying data from the partitioned table:") for record in reader1: print record[0],record[1],record[2] -
Click Run with Parameters.
import sys reload(sys) print('dt=' + args['dt']) # Change the default system encoding. sys.setdefaultencoding('utf8') # Get the table. t = o.get_table('user_detail') # Receive the passed partition parameter. with t.open_reader(partition='dt=' + args['dt'] + ',region=beijing') as reader1: count = reader1.count print("Querying data from the partitioned table:") for record in reader1: print record[0],record[1],record[2] -
In the Parameters dialog box, configure the parameters and click Run.
Configure the following parameters:
-
Resource Group Name: Select Shared Resource Group.
-
dt: Set to dt=20190715.
-
-
View the run results in the Runtime Log.
Executing user script with PyODPS 0.8.0 dt=20190715 Querying data from the partitioned table: 4 Internet Master 1 Internet Bachelor xxx xxx xxx xxx INFO ===================================================================
-