PyODPS parameter passing

更新时间:
复制 MD 格式

This topic describes how to pass parameters in a PyODPS node in DataWorks.

Prerequisites

The following are required:

Procedure

Note

This example uses a DataWorks workspace in basic mode. When you create the workspace, do not select the Participate in Public Preview of Data Studio option. This example does not apply to workspaces in public preview.

  1. Prepare test data.

    1. Create a table and upload data. For more information, see Create a table and upload data.

      The following are the table schemas and source data.

      • The table creation statement for the partitioned table user_detail is as follows.

        CREATE TABLE IF NOT EXISTS user_detail
        (
        userid    BIGINT COMMENT 'User ID',
        job       STRING COMMENT 'Job type',
        education STRING COMMENT 'Education'
        ) COMMENT 'User information table'
        PARTITIONED BY (dt STRING COMMENT 'Date',region STRING COMMENT 'Region');
      • The statement to create the source table user_detail_ods is as follows.

        CREATE TABLE IF NOT EXISTS user_detail_ods
        (
          userid    BIGINT COMMENT 'User ID',
          job       STRING COMMENT 'Job type',
          education STRING COMMENT 'Education',
          dt STRING COMMENT 'Date',
          region STRING COMMENT 'Region'
        );
      • Save the test data to the user_detail.txt file. Upload this file to the user_detail_ods table.

        0001,Internet,Bachelor,20190715,beijing
        0002,Education,Associate Degree,20190716,beijing
        0003,Finance,Master,20190715,shandong
        0004,Internet,Master,20190715,beijing
    2. Write data from the source data table user_detail_ods to the partitioned table user_detail.

      1. Log on to the DataWorks console.

      2. In the left-side navigation pane, click Workspace.

      3. Find the target workspace and in the Actions column, click Shortcuts > Data Development.

      4. Right-click the workflow and choose Create Node > ODPS SQL.

      5. Enter a node name and click OK.

      6. In the ODPS SQL node, enter the following code.

        INSERT OVERWRITE TABLE user_detail PARTITION (dt, region) 
        SELECT userid, job, education, dt, region FROM user_detail_ods;
      7. Click Run to write the data.

  2. Use PyODPS to pass parameters.

    1. Log on to the DataWorks console.

    2. In the left-side navigation pane, click Workspace.

    3. Find the target workspace and in the Actions column, click Shortcuts > Data Development.

    4. On the Data Development page, right-click the workflow that you created and choose Create Node > PyODPS 2.

    5. Enter a node name and click OK.

    6. In the PyODPS 2 node, enter the following code to pass parameters.

      import sys
      reload(sys)
      print('dt=' + args['dt'])
      # Change the default system encoding.
      sys.setdefaultencoding('utf8')
      # Get the table.
      t = o.get_table('user_detail')
      # Receive the passed partition parameter.
      with t.open_reader(partition='dt=' + args['dt'] + ',region=beijing') as reader1:
          count = reader1.count
      print("Querying data from the partitioned table:")
      for record in reader1:
          print record[0],record[1],record[2]
    7. Click Run with Parameters.

      import sys
      reload(sys)
      print('dt=' + args['dt'])
      # Change the default system encoding.
      sys.setdefaultencoding('utf8')
      # Get the table.
      t = o.get_table('user_detail')
      # Receive the passed partition parameter.
      with t.open_reader(partition='dt=' + args['dt'] + ',region=beijing') as reader1:
          count = reader1.count
      print("Querying data from the partitioned table:")
      for record in reader1:
          print record[0],record[1],record[2]
    8. In the Parameters dialog box, configure the parameters and click Run.

      Configure the following parameters:

      • Resource Group Name: Select Shared Resource Group.

      • dt: Set to dt=20190715.

    9. View the run results in the Runtime Log.

      Executing user script with PyODPS 0.8.0
      dt=20190715
      Querying data from the partitioned table:
      4 Internet  Master
      1 Internet  Bachelor
      xxx xxx xxx xxx INFO ===================================================================