本文介绍如何将TPC-H 100GB测试数据分别导入AnalyticDB for MySQL、MySQL、Presto、Spark以及Impala中。

下表列出了TPC-H测试数据集中的表数据条数。

表名 数据条数
customer 15000000
lineitem 600037902
nation 25
orders 150000000
part 20000000
partsupp 80000000
region 5
supplier 1000000

AnalyticDB for MySQL/MySQL

  • 在AnalyticDB for MySQL和MySQL中,使用LOAD DATA导入dbgen生成的文件。
    LOAD DATA LOCAL INFILE 'customer.tbl' INTO TABLE CUSTOMER 
    FIELDS TERMINATED BY '|'  LINES TERMINATED BY '\r\n';
    LOAD DATA LOCAL INFILE 'orders.tbl'   INTO TABLE ORDERS   
    FIELDS TERMINATED BY '|'  LINES TERMINATED BY '\r\n';
    LOAD DATA LOCAL INFILE 'lineitem.tbl' INTO TABLE LINEITEM 
    FIELDS TERMINATED BY '|'  LINES TERMINATED BY '\r\n';
    LOAD DATA LOCAL INFILE 'nation.tbl'   INTO TABLE NATION   
    FIELDS TERMINATED BY '|'  LINES TERMINATED BY '\r\n';
    LOAD DATA LOCAL INFILE 'partsupp.tbl' INTO TABLE PARTSUPP 
    FIELDS TERMINATED BY '|'  LINES TERMINATED BY '\r\n';
    LOAD DATA LOCAL NFILE 'part.tbl'     INTO TABLE PART     
    FIELDS TERMINATED BY '|'   LINES TERMINATED BY '\r\n';
    LOAD DATA LOCAL INFILE 'region.tbl'   INTO TABLE REGION   
    FIELDS TERMINATED BY '|'  LINES TERMINATED BY '\r\n';
    LOAD DATA LOCAL INFILE 'supplier.tbl' INTO TABLE SUPPLIER 
    FIELDS TERMINATED BY '|'  LINES TERMINATED BY '\r\n';
  • 在AnalyticDB for MySQL中,还可以通过OSS外表方式导入测试数据,请参见将OSS数据导入AnalyticDB for MySQL

Presto/Spark/Impala

在Presto、Spark以及Impala中,可以通过外表方式导入测试数据。

  • CUSTOMER表
    insert overwrite table customer
    select
        c_custkey,
            c_name,
            c_address,
            c_nationkey,
            c_phone,
            c_acctbal,
            c_mktsegment,
            c_comment
    from ${source_db}.customer;
  • LINEITEM表
    insert overwrite table lineitem
    select
        l_orderkey,
        l_partkey,
        l_suppkey,
        l_linenumber,
        l_quantity,
        l_extendedprice,
        l_discount,
        l_tax,
        l_returnflag,
        l_linestatus,
        l_shipdate,
        l_commitdate,
        l_receiptdate,
        l_shipinstruct,
        l_shipmode,
        l_comment
    from ${source_db}.lineitem;
  • NATION表
    insert overwrite table nation
    select
            n_nationkey,
            n_name,
            n_regionkey,
            n_comment
    from ${source_db}.nation;
  • ORDERS表
    insert overwrite table orders
    select
            o_orderkey,
            o_custkey,
            o_orderstatus,
            o_totalprice,
            o_orderdate,
            o_orderpriority,
            o_clerk,
            o_shippriority,
            o_comment
    from ${source_db}.orders;
  • PART表
    insert overwrite table part
    select
            p_partkey,
            p_name,
            p_mfgr,
            p_brand,
            p_type,
            p_size,
            p_container,
            p_retailprice,
            p_comment
    from ${source_db}.part;
  • PARTSUPP表
    insert overwrite table partsupp
    select
            ps_partkey,
            ps_suppkey,
            ps_availqty,
            ps_supplycost,
            ps_comment
    from ${source_db}.partsupp;
  • REGION表
    insert overwrite table region
    select
            r_regionkey,
            r_name,
            r_comment
    from ${source_db}.region;
  • SUPPLIER表
    insert overwrite table supplier
    select
        s_suppkey,
        s_name,
        s_address,
        s_nationkey,
        s_phone,
        s_acctbal,
        s_comment
    from ${source_db}.supplier;