导入测试数据

更新时间:

本文介绍如何构建TPC-DS测试数据,并将TPC-DS 1000GB测试数据分别导入AnalyticDB for MySQL中。

说明

本文的TPC-DS的实现基于TPC-DS的基准测试,并不能与已发布的TPC-DS基准测试结果相比较,本文中的测试并不符合TPC-DS基准测试的所有要求。

下表列出了TPC-DS测试数据集中的表数据条数。

表名

数据条数

store_sales

2,879,987,999

catalog_sales

1,439,980,416

web_sales

720,000,376

store_returns

287,999,764

catalog_returns

143,996,756

inventory

783,000,000

web_returns

71,997,522

customer

12,000,000

customer_address

6,000,000

item

300,000

customer_demographics

1,920,800

date_dim

73,049

time_dim

86,400

catalog_page

30,000

web_page

3,000

store

1,002

promotion

1,500

household_demographics

7,200

web_site

54

call_center

42

reason

65

warehouse

20

ship_mode

20

income_band

20

OSS外表导入(推荐)

重要

以下流程仅适用于企业版、基础版及湖仓版集群。

  1. 创建外部数据库

    CREATE EXTERNAL DATABASE IF NOT EXISTS external_tpcds;
  2. 创建二十四张外表。

    说明

    AnalyticDB for MySQL提供了TPC-DS测试数据存放的OSS路径。您需要根据集群所在地域,替换LOCATION参数指定的OSS路径。

    不同地域所对应的OSS路径

    地域

    路径

    华东1(杭州)

    oss://dataset-cn-hangzhou-external/TPCDS/1TB

    华北3(张家口)

    oss://dataset-cn-zhangjiakou-external/TPCDS/1TB

    华北2(北京)

    oss://dataset-cn-beijing-external/TPCDS/1TB

    华东2(上海)

    oss://dataset-cn-shanghai-external/TPCDS/1TB

    华南1(深圳)

    oss://dataset-cn-shenzhen-external/TPCDS/1TB

    华北1(青岛)

    oss://dataset-cn-qingdao-external/TPCDS/1TB

    华南3(广州)

    oss://dataset-cn-guangzhou-external/TPCDS/1TB

    中国香港

    oss://dataset-cn-hongkong-external/TPCDS/1TB

    新加坡

    oss://dataset-ap-southeast-1-external/TPCDS/1TB

    马来西亚(吉隆坡)

    oss://dataset-ap-southeast-3-external/TPCDS/1TB

    日本(东京)

    oss://dataset-ap-northeast-1-external/TPCDS/1TB

    印度尼西亚(雅加达)

    oss://dataset-ap-southeast-5-external/TPCDS/1TB

    德国(法兰克福)

    oss://dataset-eu-central-1-external/TPCDS/1TB

    美国(硅谷)

    oss://dataset-us-west-1-external/TPCDS/1TB/

    英国(伦敦)

    oss://dataset-eu-west-1-external/TPCDS/1TB

    美国(弗吉尼亚)

    oss://dataset-us-east-1-external/TPCDS/1TB

    CREATE EXTERNAl TABLE external_tpcds.call_center
    (
      cc_call_center_sk             BIGINT not null,
      cc_call_center_id             CHAR(16) not null,
      cc_rec_start_date             DATE,
      cc_rec_end_date               DATE,
      cc_closed_date_sk             BIGINT,
      cc_open_date_sk               BIGINT,
      cc_name                       VARCHAR(50),
      cc_class                      VARCHAR(50),
      cc_employees                  INT,
      cc_sq_ft                      INT,
      cc_hours                      CHAR(20),
      cc_manager                    VARCHAR(40),
      cc_mkt_id                     INT,
      cc_mkt_class                  CHAR(50),
      cc_mkt_desc                   VARCHAR(100),
      cc_market_manager             VARCHAR(40),
      cc_division                   INT,
      cc_division_name              VARCHAR(50),
      cc_company                    INT,
      cc_company_name               CHAR(50),
      cc_street_number              CHAR(10),
      cc_street_name                VARCHAR(60),
      cc_street_type                CHAR(15),
      cc_suite_number               CHAR(10),
      cc_city                       VARCHAR(60),
      cc_county                     VARCHAR(30),
      cc_state                      CHAR(2),
      cc_zip                        CHAR(10),
      cc_country                    VARCHAR(20),
      cc_gmt_offset                 DECIMAL(5,2),
      cc_tax_percentage             DECIMAL(5,2),
      dummy varchar
    )ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/call_center';
    
    
    CREATE EXTERNAl TABLE external_tpcds.catalog_page
    (
      cp_catalog_page_sk     BIGINT not null,
      cp_catalog_page_id     VARCHAR(16) not null,
      cp_start_date_sk   BIGINT,
      cp_end_date_sk     BIGINT,
      cp_department      VARCHAR(50),
      cp_catalog_number  INT,
      cp_catalog_page_number INT,
      cp_description     VARCHAR(100),
      cp_type        VARCHAR(100),
      dummy varchar
    )ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/catalog_page';
    
    CREATE EXTERNAl TABLE external_tpcds.catalog_returns
    (
      cr_returned_date_sk        BIGINT,
      cr_returned_time_sk        BIGINT,
      cr_item_sk             BIGINT not null,
      cr_refunded_customer_sk    BIGINT,
      cr_refunded_cdemo_sk       BIGINT,
      cr_refunded_hdemo_sk       BIGINT,
      cr_refunded_addr_sk        BIGINT,
      cr_returning_customer_sk   BIGINT,
      cr_returning_cdemo_sk      BIGINT,
      cr_returning_hdemo_sk      BIGINT,
      cr_returning_addr_sk       BIGINT,
      cr_call_center_sk      BIGINT,
      cr_catalog_page_sk         BIGINT ,
      cr_ship_mode_sk        BIGINT ,
      cr_warehouse_sk        BIGINT ,
      cr_reason_sk           BIGINT ,
      cr_order_number        BIGINT not null,
      cr_return_quantity         INT,
      cr_return_amount       DECIMAL(7,2),
      cr_return_tax          DECIMAL(7,2),
      cr_return_amt_inc_tax      DECIMAL(7,2),
      cr_fee             DECIMAL(7,2),
      cr_return_ship_cost        DECIMAL(7,2),
      cr_refunded_cash       DECIMAL(7,2),
      cr_reversed_charge         DECIMAL(7,2),
      cr_store_credit        DECIMAL(7,2),
      cr_net_loss            DECIMAL(7,2),
      dummy varchar
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/catalog_returns';
    
    CREATE EXTERNAl TABLE external_tpcds.catalog_sales
    (
      cs_sold_date_sk           BIGINT,
      cs_sold_time_sk           BIGINT,
      cs_ship_date_sk           BIGINT,
      cs_bill_customer_sk       BIGINT,
      cs_bill_cdemo_sk          BIGINT,
      cs_bill_hdemo_sk          BIGINT,
      cs_bill_addr_sk           BIGINT,
      cs_ship_customer_sk       BIGINT,
      cs_ship_cdemo_sk          BIGINT,
      cs_ship_hdemo_sk          BIGINT,
      cs_ship_addr_sk           BIGINT,
      cs_call_center_sk         BIGINT,
      cs_catalog_page_sk        BIGINT,
      cs_ship_mode_sk           BIGINT,
      cs_warehouse_sk           BIGINT,
      cs_item_sk                BIGINT not null,
      cs_promo_sk               BIGINT,
      cs_order_number           BIGINT not null,
      cs_quantity               INT,
      cs_wholesale_cost         DECIMAL(7,2),
      cs_list_price             DECIMAL(7,2),
      cs_sales_price            DECIMAL(7,2),
      cs_ext_discount_amt       DECIMAL(7,2),
      cs_ext_sales_price        DECIMAL(7,2),
      cs_ext_wholesale_cost     DECIMAL(7,2),
      cs_ext_list_price         DECIMAL(7,2),
      cs_ext_tax                DECIMAL(7,2),
      cs_coupon_amt             DECIMAL(7,2),
      cs_ext_ship_cost          DECIMAL(7,2),
      cs_net_paid               DECIMAL(7,2),
      cs_net_paid_inc_tax       DECIMAL(7,2),
      cs_net_paid_inc_ship      DECIMAL(7,2),
      cs_net_paid_inc_ship_tax  DECIMAL(7,2),
      cs_net_profit             DECIMAL(7,2),
      dummy varchar
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/catalog_sales';
    
    CREATE EXTERNAl TABLE external_tpcds.customer
    (
      c_customer_sk         BIGINT NOT NULL,
      c_customer_id         CHAR(16) NOT NULL,
      c_current_cdemo_sk        BIGINT,
      c_current_hdemo_sk        BIGINT,
      c_current_addr_sk         BIGINT,
      c_first_shipto_date_sk    BIGINT,
      c_first_sales_date_sk     BIGINT,
      c_salutation          CHAR(10),
      c_first_name          CHAR(20),
      c_last_name           CHAR(30),
      c_preferred_cust_flag     char(1),
      c_birth_day           INT,
      c_birth_month         INT,
      c_birth_year          INT,
      c_birth_country       VARCHAR(20),
      c_login           CHAR(13),
      c_email_address       CHAR(50),
      c_last_review_date_sk     BIGINT,
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/customer';
    
    CREATE EXTERNAl TABLE external_tpcds.customer_address
    (
      ca_address_sk      BIGINT NOT NULL,
      ca_address_id      VARCHAR(16) NOT NULL,
      ca_street_number   VARCHAR(10),
      ca_street_name     VARCHAR(60),
      ca_street_type     VARCHAR(15),
      ca_suite_number    VARCHAR(10),
      ca_city        VARCHAR(60),
      ca_county      VARCHAR(30),
      ca_state       VARCHAR(2),
      ca_zip         VARCHAR(10),
      ca_country         VARCHAR(20),
      ca_gmt_offset      DECIMAL(5,2),
      ca_location_type   VARCHAR(20),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/customer_address';
    
    CREATE EXTERNAl TABLE external_tpcds.customer_demographics
    (
      cd_demo_sk                BIGINT not null,
      cd_gender                 char(1),
      cd_marital_status         char(1),
      cd_education_status       char(20),
      cd_purchase_estimate      INT,
      cd_credit_rating          char(10),
      cd_dep_count              INT,
      cd_dep_employed_count     INT,
      cd_dep_college_count      INT,
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/customer_demographics';
    
    CREATE EXTERNAl TABLE external_tpcds.date_dim
    (
      d_date_sk                 BIGINT not null,
      d_date_id                 CHAR(16) not null,
      d_date                    DATE,
      d_month_seq               INT,
      d_week_seq                INT,
      d_quarter_seq             INT,
      d_year                    INT,
      d_dow                     INT,
      d_moy                     INT,
      d_dom                     INT,
      d_qoy                     INT,
      d_fy_year                 INT,
      d_fy_quarter_seq          INT,
      d_fy_week_seq             INT,
      d_day_name                CHAR(9),
      d_quarter_name            CHAR(6),
      d_holiday                 CHAR(1),
      d_weekend                 CHAR(1),
      d_following_holiday       CHAR(1),
      d_first_dom               INT,
      d_last_dom                INT,
      d_same_day_ly             INT,
      d_same_day_lq             INT,
      d_current_day             CHAR(1),
      d_current_week            CHAR(1),
      d_current_month           CHAR(1),
      d_current_quarter         CHAR(1),
      d_current_year            CHAR(1),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/date_dim';
    
    CREATE EXTERNAl TABLE external_tpcds.household_demographics
    (
      hd_demo_sk                BIGINT not null,
      hd_income_band_sk         BIGINT,
      hd_buy_potential          CHAR(15),
      hd_dep_count              INT,
      hd_vehicle_count          INT,
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/household_demographics';
    
    CREATE EXTERNAl TABLE external_tpcds.income_band
    (
      ib_income_band_sk         BIGINT not null,
      ib_lower_bound            INT,
      ib_upper_bound            INT,
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/income_band';
    
    CREATE EXTERNAl TABLE external_tpcds.inventory
    (
      inv_date_sk               BIGINT not null,
      inv_item_sk               BIGINT not null,
      inv_warehouse_sk          BIGINT not null,
      inv_quantity_on_hand      INT,
      dummy varchar
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/inventory';
    
    
    CREATE EXTERNAl TABLE external_tpcds.item
    (
      i_item_sk                 BIGINT not null,
      i_item_id                 CHAR(16) not null,
      i_rec_start_date          DATE,
      i_rec_end_date            DATE,
      i_item_desc               VARCHAR(200),
      i_current_price           DECIMAL(7,2),
      i_wholesale_cost          DECIMAL(7,2),
      i_brand_id                INT,
      i_brand                   CHAR(50),
      i_class_id                INT,
      i_class                   CHAR(50),
      i_category_id             INT,
      i_category                CHAR(50),
      i_manufact_id             INT,
      i_manufact                CHAR(50),
      i_size                    CHAR(20),
      i_formulation             CHAR(20),
      i_color                   CHAR(20),
      i_units                   CHAR(10),
      i_container               CHAR(10),
      i_manager_id              INT,
      i_product_name            char(50),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/item';
    
    CREATE EXTERNAl TABLE external_tpcds.promotion
    (
      p_promo_sk                BIGINT not null,
      p_promo_id                CHAR(16) not null,
      p_start_date_sk           BIGINT,
      p_end_date_sk             BIGINT,
      p_item_sk                 BIGINT,
      p_cost                    DECIMAL(15,2),
      p_response_target         INT,
      p_promo_name              CHAR(50),
      p_channel_dmail           CHAR(1),
      p_channel_email           CHAR(1),
      p_channel_catalog         CHAR(1),
      p_channel_tv              CHAR(1),
      p_channel_radio           CHAR(1),
      p_channel_press           CHAR(1),
      p_channel_event           CHAR(1),
      p_channel_demo            CHAR(1),
      p_channel_details         VARCHAR(100),
      p_purpose                 CHAR(15),
      p_discount_active         CHAR(1),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/promotion';
    
    CREATE EXTERNAl TABLE external_tpcds.reason
    (
      r_reason_sk     BIGINT not null,
      r_reason_id     CHAR(16) not null,
      r_reason_desc   CHAR(100),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/reason';
    
    CREATE EXTERNAl TABLE external_tpcds.ship_mode
    (
      sm_ship_mode_sk           BIGINT,
      sm_ship_mode_id           CHAR(16) not null,
      sm_type                   CHAR(30),
      sm_code                   CHAR(10),
      sm_carrier                CHAR(20),
      sm_contract               CHAR(20),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/ship_mode';
    
    CREATE EXTERNAl TABLE external_tpcds.store_returns
    (
      sr_returned_date_sk       BIGINT,
      sr_return_time_sk         BIGINT,
      sr_item_sk                BIGINT not null,
      sr_customer_sk            BIGINT,
      sr_cdemo_sk               BIGINT,
      sr_hdemo_sk               BIGINT,
      sr_addr_sk                BIGINT,
      sr_store_sk               BIGINT,
      sr_reason_sk              BIGINT,
      sr_ticket_number          BIGINT not null,
      sr_return_quantity        INT,
      sr_return_amt             DECIMAL(7,2),
      sr_return_tax             DECIMAL(7,2),
      sr_return_amt_inc_tax     DECIMAL(7,2),
      sr_fee                    DECIMAL(7,2),
      sr_return_ship_cost       DECIMAL(7,2),
      sr_refunded_cash          DECIMAL(7,2),
      sr_reversed_charge        DECIMAL(7,2),
      sr_store_credit           DECIMAL(7,2),
      sr_net_loss               DECIMAL(7,2),
      dummy varchar
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/store_returns';
    
    CREATE EXTERNAl TABLE external_tpcds.store_sales
    (
      ss_sold_date_sk           BIGINT,
      ss_sold_time_sk           BIGINT,
      ss_item_sk                BIGINT not null,
      ss_customer_sk            BIGINT,
      ss_cdemo_sk               BIGINT,
      ss_hdemo_sk               BIGINT,
      ss_addr_sk                BIGINT,
      ss_store_sk               BIGINT,
      ss_promo_sk               BIGINT,
      ss_ticket_number          BIGINT not null,
      ss_quantity               INT,
      ss_wholesale_cost         DECIMAL(7,2),
      ss_list_price             DECIMAL(7,2),
      ss_sales_price            DECIMAL(7,2),
      ss_ext_discount_amt       DECIMAL(7,2),
      ss_ext_sales_price        DECIMAL(7,2),
      ss_ext_wholesale_cost     DECIMAL(7,2),
      ss_ext_list_price         DECIMAL(7,2),
      ss_ext_tax                DECIMAL(7,2),
      ss_coupon_amt             DECIMAL(7,2),
      ss_net_paid               DECIMAL(7,2),
      ss_net_paid_inc_tax       DECIMAL(7,2),
      ss_net_profit             DECIMAL(7,2),
      dummy varchar
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/store_sales';
    
    CREATE EXTERNAl TABLE external_tpcds.store
    (
      s_store_sk                BIGINT not null,
      s_store_id                CHAR(16) not null,
      s_rec_start_date          DATE,
      s_rec_end_date            DATE,
      s_closed_date_sk          BIGINT,
      s_store_name              VARCHAR(50),
      s_number_employees        INT,
      s_floor_space             INT,
      s_hours                   CHAR(20),
      s_manager                 VARCHAR(40),
      s_market_id               INT,
      s_geography_class         VARCHAR(100),
      s_market_desc             VARCHAR(100),
      s_market_manager          VARCHAR(40),
      s_division_id             INT,
      s_division_name           VARCHAR(50),
      s_company_id              INT,
      s_company_name            VARCHAR(50),
      s_street_number           VARCHAR(10),
      s_street_name             VARCHAR(60),
      s_street_type             CHAR(15),
      s_suite_number            CHAR(10),
      s_city                    VARCHAR(60),
      s_county                  VARCHAR(30),
      s_state                   CHAR(2),
      s_zip                     CHAR(10),
      s_country                 VARCHAR(20),
      s_gmt_offset              DECIMAL(5,2),
      s_tax_percentage          DECIMAL(5,2),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/store';
    
    CREATE EXTERNAl TABLE external_tpcds.time_dim
    (
      t_time_sk                 BIGINT not null,
      t_time_id                 CHAR(16) not null,
      t_time                    INT,
      t_hour                    INT,
      t_minute                  INT,
      t_second                  INT,
      t_am_pm                   CHAR(2),
      t_shift                   CHAR(20),
      t_sub_shift               CHAR(20),
      t_meal_time               CHAR(20),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/time_dim';
    
    CREATE EXTERNAl TABLE external_tpcds.warehouse
    (
      w_warehouse_sk            BIGINT not null,
      w_warehouse_id            CHAR(16) not null,
      w_warehouse_name          VARCHAR(20),
      w_warehouse_sq_ft         INT,
      w_street_number           CHAR(10),
      w_street_name             VARCHAR(60),
      w_street_type             CHAR(15),
      w_suite_number            CHAR(10),
      w_city                    VARCHAR(60),
      w_county                  VARCHAR(30),
      w_state                   CHAR(2),
      w_zip                     CHAR(10),
      w_country                 VARCHAR(20),
      w_gmt_offset              DECIMAL(5,2),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/warehouse';
    
    CREATE EXTERNAl TABLE external_tpcds.web_page
    (
      wp_web_page_sk            BIGINT not null,
      wp_web_page_id            CHAR(16) not null,
      wp_rec_start_date         DATE,
      wp_rec_end_date           DATE,
      wp_creation_date_sk       BIGINT,
      wp_access_date_sk         BIGINT,
      wp_autogen_flag           CHAR(1),
      wp_customer_sk            BIGINT,
      wp_url                    VARCHAR(100),
      wp_type                   CHAR(50),
      wp_char_count             INT,
      wp_link_count             INT,
      wp_image_count            INT,
      wp_max_ad_count           INT,
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/web_page';
    
    CREATE EXTERNAl TABLE external_tpcds.web_returns
    (
      wr_returned_date_sk       BIGINT,
      wr_returned_time_sk       BIGINT,
      wr_item_sk                BIGINT not null,
      wr_refunded_customer_sk   BIGINT,
      wr_refunded_cdemo_sk      BIGINT,
      wr_refunded_hdemo_sk      BIGINT,
      wr_refunded_addr_sk       BIGINT,
      wr_returning_customer_sk  BIGINT,
      wr_returning_cdemo_sk     BIGINT,
      wr_returning_hdemo_sk     BIGINT,
      wr_returning_addr_sk      BIGINT,
      wr_web_page_sk            BIGINT,
      wr_reason_sk              BIGINT,
      wr_order_number           BIGINT not null,
      wr_return_quantity        INT,
      wr_return_amt             DECIMAL(7,2),
      wr_return_tax             DECIMAL(7,2),
      wr_return_amt_inc_tax     DECIMAL(7,2),
      wr_fee                    DECIMAL(7,2),
      wr_return_ship_cost       DECIMAL(7,2),
      wr_refunded_cash          DECIMAL(7,2),
      wr_reversed_charge        DECIMAL(7,2),
      wr_account_credit         DECIMAL(7,2),
      wr_net_loss               DECIMAL(7,2),
      dummy varchar
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/web_returns';
    
    CREATE EXTERNAl TABLE external_tpcds.web_sales
    (
      ws_sold_date_sk           BIGINT,
      ws_sold_time_sk           BIGINT,
      ws_ship_date_sk           BIGINT,
      ws_item_sk                BIGINT not null,
      ws_bill_customer_sk       BIGINT,
      ws_bill_cdemo_sk          BIGINT,
      ws_bill_hdemo_sk          BIGINT,
      ws_bill_addr_sk           BIGINT,
      ws_ship_customer_sk       BIGINT,
      ws_ship_cdemo_sk          BIGINT,
      ws_ship_hdemo_sk          BIGINT,
      ws_ship_addr_sk           BIGINT,
      ws_web_page_sk            BIGINT,
      ws_web_site_sk            BIGINT,
      ws_ship_mode_sk           BIGINT,
      ws_warehouse_sk           BIGINT,
      ws_promo_sk               BIGINT,
      ws_order_number           BIGINT not null,
      ws_quantity               INT,
      ws_wholesale_cost         DECIMAL(7,2),
      ws_list_price             DECIMAL(7,2),
      ws_sales_price            DECIMAL(7,2),
      ws_ext_discount_amt       DECIMAL(7,2),
      ws_ext_sales_price        DECIMAL(7,2),
      ws_ext_wholesale_cost     DECIMAL(7,2),
      ws_ext_list_price         DECIMAL(7,2),
      ws_ext_tax                DECIMAL(7,2),
      ws_coupon_amt             DECIMAL(7,2),
      ws_ext_ship_cost          DECIMAL(7,2),
      ws_net_paid               DECIMAL(7,2),
      ws_net_paid_inc_tax       DECIMAL(7,2),
      ws_net_paid_inc_ship      DECIMAL(7,2),
      ws_net_paid_inc_ship_tax  DECIMAL(7,2),
      ws_net_profit             DECIMAL(7,2),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/web_sales';
    
    CREATE EXTERNAl TABLE external_tpcds.web_site
    (
      web_site_sk               BIGINT not null,
      web_site_id               CHAR(16) not null,
      web_rec_start_date        DATE,
      web_rec_end_date          DATE,
      web_name                  VARCHAR(50),
      web_open_date_sk          BIGINT,
      web_close_date_sk         BIGINT,
      web_class                 VARCHAR(50),
      web_manager               VARCHAR(40),
      web_mkt_id                INT,
      web_mkt_class             VARCHAR(50),
      web_mkt_desc              VARCHAR(100),
      web_market_manager        VARCHAR(40),
      web_company_id            INT,
      web_company_name          CHAR(50),
      web_street_number         CHAR(10),
      web_street_name           VARCHAR(60),
      web_street_type           CHAR(15),
      web_suite_number          CHAR(10),
      web_city                  VARCHAR(60),
      web_county                VARCHAR(30),
      web_state                 CHAR(2),
      web_zip                   CHAR(10),
      web_country               VARCHAR(20),
      web_gmt_offset            DECIMAL(5,2),
      web_tax_percentage        DECIMAL(5,2),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/web_site';
  3. 将外表中的数据导入构建测试表文档所创建的内表中。

    INSERT OVERWRITE INTO promotion
    SELECT *
    FROM external_tpcds.promotion;
    
    INSERT INTO web_site
    SELECT *
    FROM external_tpcds.web_site;
    
    INSERT OVERWRITE INTO web_sales
    SELECT *
    FROM external_tpcds.web_sales;
    
    INSERT OVERWRITE INTO web_returns
    SELECT *
    FROM external_tpcds.web_returns;
    
    INSERT OVERWRITE INTO web_page
    SELECT *
    FROM external_tpcds.web_page;
    
    INSERT INTO warehouse
    SELECT *
    FROM external_tpcds.warehouse;
    
    INSERT OVERWRITE INTO time_dim
    SELECT *
    FROM external_tpcds.time_dim;
    
    INSERT OVERWRITE INTO store_sales
    SELECT *
    FROM external_tpcds.store_sales;
    
    INSERT OVERWRITE INTO store_returns
    SELECT *
    FROM external_tpcds.store_returns;
    
    INSERT INTO store
    SELECT *
    FROM external_tpcds.store;
    
    
    INSERT OVERWRITE INTO household_demographics
    SELECT *
    FROM external_tpcds.household_demographics;
    
    INSERT INTO ship_mode
    SELECT *
    FROM external_tpcds.ship_mode;
    
    INSERT INTO reason
    SELECT *
    FROM external_tpcds.reason;
    
    INSERT INTO call_center
    SELECT *
    FROM external_tpcds.call_center;
    
    INSERT OVERWRITE INTO item
    SELECT *
    FROM external_tpcds.item;
    
    INSERT OVERWRITE INTO inventory
    SELECT *
    FROM external_tpcds.inventory;
    
    INSERT INTO income_band
    SELECT *
    FROM external_tpcds.income_band;
    
    INSERT INTO date_dim
    SELECT *
    FROM external_tpcds.date_dim;
    
    INSERT OVERWRITE INTO customer_demographics
    SELECT *
    FROM external_tpcds.customer_demographics;
    
    INSERT OVERWRITE INTO customer_address
    SELECT *
    FROM external_tpcds.customer_address;
    
    INSERT OVERWRITE INTO customer
    SELECT *
    FROM external_tpcds.customer;
    
    INSERT OVERWRITE INTO catalog_sales
    SELECT *
    FROM external_tpcds.catalog_sales;
    
    INSERT OVERWRITE INTO catalog_returns
    SELECT *
    FROM external_tpcds.catalog_returns;
    
    INSERT OVERWRITE INTO catalog_page
    SELECT *
    FROM external_tpcds.catalog_page;
  4. 收集统计信息。

    ANALYZE TABLE call_center UPDATE HISTOGRAM ;
    ANALYZE TABLE catalog_page UPDATE HISTOGRAM ;
    ANALYZE TABLE catalog_returns UPDATE HISTOGRAM ;
    ANALYZE TABLE catalog_sales UPDATE HISTOGRAM ;
    ANALYZE TABLE customer UPDATE HISTOGRAM ;
    ANALYZE TABLE customer_address UPDATE HISTOGRAM ;
    ANALYZE TABLE customer_demographics UPDATE HISTOGRAM ;
    ANALYZE TABLE date_dim UPDATE HISTOGRAM ;
    ANALYZE TABLE household_demographics UPDATE HISTOGRAM ;
    ANALYZE TABLE income_band UPDATE HISTOGRAM ;
    ANALYZE TABLE inventory UPDATE HISTOGRAM ;
    ANALYZE TABLE item UPDATE HISTOGRAM ;
    ANALYZE TABLE promotion UPDATE HISTOGRAM ;
    ANALYZE TABLE reason UPDATE HISTOGRAM ;
    ANALYZE TABLE ship_mode UPDATE HISTOGRAM ;
    ANALYZE TABLE store UPDATE HISTOGRAM ;
    ANALYZE TABLE store_returns UPDATE HISTOGRAM ;
    ANALYZE TABLE store_sales UPDATE HISTOGRAM ;
    ANALYZE TABLE time_dim UPDATE HISTOGRAM ;
    ANALYZE TABLE warehouse UPDATE HISTOGRAM ;
    ANALYZE TABLE web_page UPDATE HISTOGRAM ;
    ANALYZE TABLE web_returns UPDATE HISTOGRAM ;
    ANALYZE TABLE web_sales UPDATE HISTOGRAM ;
    ANALYZE TABLE web_site UPDATE HISTOGRAM ;
    说明

    查询优化器(Query Optimizer)将查询转换为执行计划,并交给执行引擎执行。执行计划的质量会影响查询的性能。统计信息作为查询优化器的输入,可以帮助查询优化器生成高质量的执行计划。因此,在导入数据后,您需收集所有表的直方图信息,以获得最佳性能。若您想了解统计信息的更多内容,请参见统计信息

LOAD DATA导入

  1. 构建数据。

    1. TPC官网下载TPC-DS标准的数据生成工具DSDGEN,编译后生成二进制可执行文件dsdgen。

    2. 创建存放数据文件的目录。

      mkdir data1tb
    3. 构建测试数据。

      ./dsdgen -sc 1000 -dir data1tb -TERMINATE N

      参数说明如下:

      参数

      说明

      示例

      -sc

      测试数据量的大小。10表示10GB,1000表示1000GB(1TB)。

      1000

      -dir

      生成的数据文件写入的目录。

      data1tb

      -TERMINATE

      每行最后是否加字段分隔符。取值如下:

      • N:每行最后不加字段分隔符。

      • Y:每行最后添加字段分隔符。比如分隔符|。

      N

      -PARALLEL

      一共分成几个chunk。

      一条语句只能生成一个chunk。因此设置了几个,就要执行几次。

      5

      -CHILD

      当前命令生成第几个chunk。

      1

      例如,要构建1TB的测试数据,5个chunk来运行。

      mkdir data1tb_5
      
      ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 1
      ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 2
      ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 3
      ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 4
      ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 5

      dsdgen命令生成的测试数据文件是文本格式,字段分隔符默认是管道符 |,一行一条数据记录。

      call_center.dat
      catalog_page.dat
      catalog_returns.dat
      catalog_sales.dat
      customer_address.dat
      customer.dat
      customer_demographics.dat
      date_dim.dat
      dbgen_version.dat
      household_demographics.dat
      income_band.dat
      inventory.dat
      item.dat
      promotion.dat
      reason.dat
      ship_mode.dat
      store.dat
      store_returns.dat
      store_sales.dat
      time_dim.dat
      warehouse.dat
      web_page.dat
      web_returns.dat
      web_sales.dat
      web_site.dat

      更多DSDGEN的使用方法,请参见tpc-ds

  2. 对数据做兼容加工处理。

    如果表的字段没有设置默认值,生成的字段值默认是NULL。

    使用管道符“|”作为字段分隔符的时候,“a,NULL,c,d,NULL”导出到文本文件是“a||c|d|”的格式,在使用LOAD DATA的方法导入AnalyticDB的时候,会产生报错提示导入失败,所以需要对NULL值做一些处理。

    • 把NULL替换为0,INT、BIGINT、VARCHAR、date字段类型的都会替换

      • #!/bin/bash
        # 用0来替换第一个字段的NULL值,把^|替换成0|
        # 用0来替换中间字段的NULL值, 把||替换成|0|
        # 用0来替换最后一个字段的NULL值,把|$替换成|0
        for s_f in `ls *dat`
        do
            echo "$s_f"
            i=1
            while [ `egrep '\|\||^\||\|$' $s_f |wc -l` -gt 0 ]
            do 
                echo $i
                sed 's/^|/0|/g;s/||/|0|/g;s/|$/|0/g' -i $s_f
                ((i++))
            done
        done
    • 把date字段的0值替换成 0000-00-00

      • 
        1092|AAAAAAAACEEAAAAA|2001-10-27|0|Manufa....
        16252|AAAAAAAAMHPDAAAA|0|1999-10-27|0|7.94|0|1001....
        16252|AAAAAAAAMHPDAAAA|0|0|0|7.94|0|1001..
        
        for s_f in item.dat store.dat web_page.dat web_site.dat call_center.dat
        do
        # 处理第一、第二个date都是NULL的
        sed 's/^\([A-Za-z0-9]*|[A-Za-z0-9]*\)|0|0|\(.*\)/\1|0000-00-00|0000-00-00|\2/' -i $s_f
        
        # 处理第二个date是NULL的
        sed 's/^\([0-9A-Za-z]*|[A-Za-z0-9]*|[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}\)|0|\(.*\)/\1|0000-00-00|\2/' -i $s_f
        
        # 处理第一个date是NULL的
        sed 's/^\([0-9A-Za-z]*|[A-Za-z0-9]*\)|0|\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}|.*\)/\1|0000-00-00|\2/' -i $s_f
        
        done
  3. 使用LOAD DATA LOCAL INFILE的方式把dsdgen生成的数据文本导入到AnalyticDB for MySQL

    说明
    • 如果是在Linux环境运行生成的数据文本,每行的结束符是'\n'。

    • 如果是在Windows环境运行生成的数据文本,每行的结束符是'\r\n'。

    LOAD DATA LOCAL INFILE 'call_center.dat' INTO
    TABLE call_center FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'catalog_page.dat' INTO
    TABLE catalog_page FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'catalog_returns.dat'
    INTO TABLE catalog_returns FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'catalog_sales.dat'
    INTO TABLE catalog_sales FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'customer_address.dat'
    INTO TABLE customer_address FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'customer.dat' INTO
    TABLE customer FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE
    'customer_demographics.dat' INTO TABLE customer_demographics FIELDS TERMINATED
    BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'date_dim.dat' INTO
    TABLE date_dim FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'dbgen_version.dat'
    INTO TABLE dbgen_version FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE
    'household_demographics.dat' INTO TABLE household_demographics FIELDS
    TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'income_band.dat' INTO
    TABLE income_band FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'inventory.dat' INTO
    TABLE inventory FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'item.dat' INTO TABLE
    item FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'promotion.dat' INTO
    TABLE promotion FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'reason.dat' INTO TABLE
    reason FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'ship_mode.dat' INTO
    TABLE ship_mode FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'store.dat' INTO TABLE
    store FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'store_returns.dat'
    INTO TABLE store_returns FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'store_sales.dat' INTO
    TABLE store_sales FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'time_dim.dat' INTO
    TABLE time_dim FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'warehouse.dat' INTO
    TABLE warehouse FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'web_page.dat' INTO
    TABLE web_page FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'web_returns.dat' INTO
    TABLE web_returns FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'web_sales.dat' INTO
    TABLE web_sales FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'web_site.dat' INTO
    TABLE web_site FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
  4. 收集统计信息。

    ANALYZE TABLE call_center UPDATE HISTOGRAM ;
    ANALYZE TABLE catalog_page UPDATE HISTOGRAM ;
    ANALYZE TABLE catalog_returns UPDATE HISTOGRAM ;
    ANALYZE TABLE catalog_sales UPDATE HISTOGRAM ;
    ANALYZE TABLE customer UPDATE HISTOGRAM ;
    ANALYZE TABLE customer_address UPDATE HISTOGRAM ;
    ANALYZE TABLE customer_demographics UPDATE HISTOGRAM ;
    ANALYZE TABLE date_dim UPDATE HISTOGRAM ;
    ANALYZE TABLE household_demographics UPDATE HISTOGRAM ;
    ANALYZE TABLE income_band UPDATE HISTOGRAM ;
    ANALYZE TABLE inventory UPDATE HISTOGRAM ;
    ANALYZE TABLE item UPDATE HISTOGRAM ;
    ANALYZE TABLE promotion UPDATE HISTOGRAM ;
    ANALYZE TABLE reason UPDATE HISTOGRAM ;
    ANALYZE TABLE ship_mode UPDATE HISTOGRAM ;
    ANALYZE TABLE store UPDATE HISTOGRAM ;
    ANALYZE TABLE store_returns UPDATE HISTOGRAM ;
    ANALYZE TABLE store_sales UPDATE HISTOGRAM ;
    ANALYZE TABLE time_dim UPDATE HISTOGRAM ;
    ANALYZE TABLE warehouse UPDATE HISTOGRAM ;
    ANALYZE TABLE web_page UPDATE HISTOGRAM ;
    ANALYZE TABLE web_returns UPDATE HISTOGRAM ;
    ANALYZE TABLE web_sales UPDATE HISTOGRAM ;
    ANALYZE TABLE web_site UPDATE HISTOGRAM ;
    说明

    查询优化器(Query Optimizer)将查询转换为执行计划,并交给执行引擎执行。执行计划的质量会影响查询的性能。统计信息作为查询优化器的输入,可以帮助查询优化器生成高质量的执行计划。因此,在导入数据后,您需收集所有表的直方图信息,以获得最佳性能。若您想了解统计信息的更多内容,请参见统计信息