PyODPS的排序

本文为您介绍如何进行PyODPS的排序。

前提条件

请提前完成如下操作:

操作步骤

  1. 创建表并导入数据。

    1. 下载鸢尾花数据集iris.data,重命名为iris.csv

    2. 创建表pyodps_iris并上传数据集iris.csv。操作方法请参见建表并上传数据

      建表语句如下。

      CREATE TABLE if not exists pyodps_iris
      (
      sepallength  DOUBLE comment '片长度(cm)',
      sepalwidth   DOUBLE comment '片宽度(cm)',
      petallength  DOUBLE comment '瓣长度(cm)',
      petalwidth   DOUBLE comment '瓣宽度(cm)',
      name         STRING comment '种类'
      );
  2. 登录DataWorks控制台
  3. 在左侧导航栏上单击工作空间列表

  4. 选择操作列中的快速进入 > 数据开发

  5. 在数据开发页面,右键单击已经创建的业务流程,选择新建节点 > MaxCompute > PyODPS 2

  6. 新建节点对话框,输入节点名称,并单击确认

  7. PyODPS节点输入如下代码实现数据排序。

    from  odps.df import DataFrame
    
    iris = DataFrame(o.get_table('pyodps_iris'))
    
    #排序
    
    print iris.sort('sepalwidth').head(5)
    
    #降序排列两种方式
    
    #设置参数ascending=False;进行降序排列
    print iris.sort('sepalwidth',ascending=False).head(5)
    
    #设置-实现降序排列
    print iris.sort(-iris.sepalwidth).head(5)
    
    #多字段排序
    print iris.sort(['sepalwidth','petallength']).head(5)
    
    #多字段排序时,如果是升序降序不同,ascending参数可以用于传入一个列表,长度必须等同于排序的字段,它们的值都是BOOLEAN类型
    print iris.sort(['sepalwidth','petallength'],ascending=[True,False]).head(5)
    print iris.sort(['sepalwidth',-iris.petallength]).head(5)
  8. 单击运行

  9. 运行日志中查看运行结果。运行日志

    完整的运行结果如下。

    Sql compiled:
    CREATE TABLE tmp_pyodps_d1b06785_dc18_4288_ad34_de860de1be08 LIFECYCLE 1 AS
    SELECT *
    FROM WB_BestPractice_dev.`pyodps_iris` t1
    ORDER BY sepalwidth
    LIMIT 10000
    Instance ID: 20191010061554817gwml0lim
    
       sepallength  sepalwidth  petallength  petalwidth             name
    0          5.0         2.0          3.5         1.0  Iris-versicolor
    1          6.0         2.2          5.0         1.5   Iris-virginica
    2          6.2         2.2          4.5         1.5  Iris-versicolor
    3          6.0         2.2          4.0         1.0  Iris-versicolor
    4          5.5         2.3          4.0         1.3  Iris-versicolor
    Sql compiled:
    CREATE TABLE tmp_pyodps_3cb90bb2_fb95_43fb_ae84_f2b5a27d72dc LIFECYCLE 1 AS
    SELECT *
    FROM WB_BestPractice_dev.`pyodps_iris` t1
    ORDER BY sepalwidth DESC
    LIMIT 10000
    Instance ID: 20191010061601287gs086792
    
    
       sepallength  sepalwidth  petallength  petalwidth         name
    0          5.7         4.4          1.5         0.4  Iris-setosa
    1          5.5         4.2          1.4         0.2  Iris-setosa
    2          5.2         4.1          1.5         0.1  Iris-setosa
    3          5.8         4.0          1.2         0.2  Iris-setosa
    4          5.4         3.9          1.3         0.4  Iris-setosa
    Sql compiled:
    CREATE TABLE tmp_pyodps_97b080bb_e014_48e8_a310_4b45fcd6a2ed LIFECYCLE 1 AS
    SELECT *
    FROM WB_BestPractice_dev.`pyodps_iris` t1
    ORDER BY sepalwidth DESC
    LIMIT 10000
    Instance ID: 20191010061606927g6emz192
    
       sepallength  sepalwidth  petallength  petalwidth         name
    0          5.7         4.4          1.5         0.4  Iris-setosa
    1          5.5         4.2          1.4         0.2  Iris-setosa
    2          5.2         4.1          1.5         0.1  Iris-setosa
    3          5.8         4.0          1.2         0.2  Iris-setosa
    4          5.4         3.9          1.3         0.4  Iris-setosa
    Sql compiled:
    CREATE TABLE tmp_pyodps_6fe37b6e_6705_4052_b733_211eb9bd16ac LIFECYCLE 1 AS
    SELECT *
    FROM WB_BestPractice_dev.`pyodps_iris` t1
    ORDER BY sepalwidth, petallength
    LIMIT 10000
    Instance ID: 20191010061611714gn586792
    
       sepallength  sepalwidth  petallength  petalwidth             name
    0          5.0         2.0          3.5         1.0  Iris-versicolor
    1          6.0         2.2          4.0         1.0  Iris-versicolor
    2          6.2         2.2          4.5         1.5  Iris-versicolor
    3          6.0         2.2          5.0         1.5   Iris-virginica
    4          4.5         2.3          1.3         0.3      Iris-setosa
    Sql compiled:
    CREATE TABLE tmp_pyodps_a52c805c_94a1_4a75_a6af_4fc9ed06ae68 LIFECYCLE 1 AS
    SELECT *
    FROM WB_BestPractice_dev.`pyodps_iris` t1
    ORDER BY sepalwidth, petallength DESC
    LIMIT 10000
    Instance ID: 20191010061616553gw3m9592
    
       sepallength  sepalwidth  petallength  petalwidth             name
    0          5.0         2.0          3.5         1.0  Iris-versicolor
    1          6.0         2.2          5.0         1.5   Iris-virginica
    2          6.2         2.2          4.5         1.5  Iris-versicolor
    3          6.0         2.2          4.0         1.0  Iris-versicolor
    4          6.3         2.3          4.4         1.3  Iris-versicolor
    Sql compiled:
    CREATE TABLE tmp_pyodps_aac5538e_9b40_4078_b3c6_852b99c663c1 LIFECYCLE 1 AS
    SELECT *
    FROM WB_BestPractice_dev.`pyodps_iris` t1
    ORDER BY sepalwidth, petallength DESC
    LIMIT 10000
    Instance ID: 20191010061621329gvmkc292
    
    
       sepallength  sepalwidth  petallength  petalwidth             name
    0          5.0         2.0          3.5         1.0  Iris-versicolor
    1          6.0         2.2          5.0         1.5   Iris-virginica
    2          6.2         2.2          4.5         1.5  Iris-versicolor
    3          6.0         2.2          4.0         1.0  Iris-versicolor
    4          6.3         2.3          4.4         1.3  Iris-versicolor