本文介绍如何构建TPC-DS测试数据,并将TPC-DS 1000GB测试数据分别导入AnalyticDB for MySQL中。
本文的TPC-DS的实现基于TPC-DS的基准测试,并不能与已发布的TPC-DS基准测试结果相比较,本文中的测试并不符合TPC-DS基准测试的所有要求。
下表列出了TPC-DS测试数据集中的表数据条数。
表名 | 数据条数 |
store_sales | 2,879,987,999 |
catalog_sales | 1,439,980,416 |
web_sales | 720,000,376 |
store_returns | 287,999,764 |
catalog_returns | 143,996,756 |
inventory | 783,000,000 |
web_returns | 71,997,522 |
customer | 12,000,000 |
customer_address | 6,000,000 |
item | 300,000 |
customer_demographics | 1,920,800 |
date_dim | 73,049 |
time_dim | 86,400 |
catalog_page | 30,000 |
web_page | 3,000 |
store | 1,002 |
promotion | 1,500 |
household_demographics | 7,200 |
web_site | 54 |
call_center | 42 |
reason | 65 |
warehouse | 20 |
ship_mode | 20 |
income_band | 20 |
OSS外表导入(推荐)
以下流程仅适用于企业版、基础版及湖仓版集群。
创建外部数据库
CREATE EXTERNAL DATABASE IF NOT EXISTS external_tpcds;
创建二十四张外表。
说明AnalyticDB for MySQL提供了TPC-DS测试数据存放的OSS路径。您需要根据集群所在地域,替换
LOCATION
参数指定的OSS路径。CREATE EXTERNAl TABLE external_tpcds.call_center ( cc_call_center_sk BIGINT not null, cc_call_center_id CHAR(16) not null, cc_rec_start_date DATE, cc_rec_end_date DATE, cc_closed_date_sk BIGINT, cc_open_date_sk BIGINT, cc_name VARCHAR(50), cc_class VARCHAR(50), cc_employees INT, cc_sq_ft INT, cc_hours CHAR(20), cc_manager VARCHAR(40), cc_mkt_id INT, cc_mkt_class CHAR(50), cc_mkt_desc VARCHAR(100), cc_market_manager VARCHAR(40), cc_division INT, cc_division_name VARCHAR(50), cc_company INT, cc_company_name CHAR(50), cc_street_number CHAR(10), cc_street_name VARCHAR(60), cc_street_type CHAR(15), cc_suite_number CHAR(10), cc_city VARCHAR(60), cc_county VARCHAR(30), cc_state CHAR(2), cc_zip CHAR(10), cc_country VARCHAR(20), cc_gmt_offset DECIMAL(5,2), cc_tax_percentage DECIMAL(5,2), dummy varchar )ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/call_center'; CREATE EXTERNAl TABLE external_tpcds.catalog_page ( cp_catalog_page_sk BIGINT not null, cp_catalog_page_id VARCHAR(16) not null, cp_start_date_sk BIGINT, cp_end_date_sk BIGINT, cp_department VARCHAR(50), cp_catalog_number INT, cp_catalog_page_number INT, cp_description VARCHAR(100), cp_type VARCHAR(100), dummy varchar )ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/catalog_page'; CREATE EXTERNAl TABLE external_tpcds.catalog_returns ( cr_returned_date_sk BIGINT, cr_returned_time_sk BIGINT, cr_item_sk BIGINT not null, cr_refunded_customer_sk BIGINT, cr_refunded_cdemo_sk BIGINT, cr_refunded_hdemo_sk BIGINT, cr_refunded_addr_sk BIGINT, cr_returning_customer_sk BIGINT, cr_returning_cdemo_sk BIGINT, cr_returning_hdemo_sk BIGINT, cr_returning_addr_sk BIGINT, cr_call_center_sk BIGINT, cr_catalog_page_sk BIGINT , cr_ship_mode_sk BIGINT , cr_warehouse_sk BIGINT , cr_reason_sk BIGINT , cr_order_number BIGINT not null, cr_return_quantity INT, cr_return_amount DECIMAL(7,2), cr_return_tax DECIMAL(7,2), cr_return_amt_inc_tax DECIMAL(7,2), cr_fee DECIMAL(7,2), cr_return_ship_cost DECIMAL(7,2), cr_refunded_cash DECIMAL(7,2), cr_reversed_charge DECIMAL(7,2), cr_store_credit DECIMAL(7,2), cr_net_loss DECIMAL(7,2), dummy varchar ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/catalog_returns'; CREATE EXTERNAl TABLE external_tpcds.catalog_sales ( cs_sold_date_sk BIGINT, cs_sold_time_sk BIGINT, cs_ship_date_sk BIGINT, cs_bill_customer_sk BIGINT, cs_bill_cdemo_sk BIGINT, cs_bill_hdemo_sk BIGINT, cs_bill_addr_sk BIGINT, cs_ship_customer_sk BIGINT, cs_ship_cdemo_sk BIGINT, cs_ship_hdemo_sk BIGINT, cs_ship_addr_sk BIGINT, cs_call_center_sk BIGINT, cs_catalog_page_sk BIGINT, cs_ship_mode_sk BIGINT, cs_warehouse_sk BIGINT, cs_item_sk BIGINT not null, cs_promo_sk BIGINT, cs_order_number BIGINT not null, cs_quantity INT, cs_wholesale_cost DECIMAL(7,2), cs_list_price DECIMAL(7,2), cs_sales_price DECIMAL(7,2), cs_ext_discount_amt DECIMAL(7,2), cs_ext_sales_price DECIMAL(7,2), cs_ext_wholesale_cost DECIMAL(7,2), cs_ext_list_price DECIMAL(7,2), cs_ext_tax DECIMAL(7,2), cs_coupon_amt DECIMAL(7,2), cs_ext_ship_cost DECIMAL(7,2), cs_net_paid DECIMAL(7,2), cs_net_paid_inc_tax DECIMAL(7,2), cs_net_paid_inc_ship DECIMAL(7,2), cs_net_paid_inc_ship_tax DECIMAL(7,2), cs_net_profit DECIMAL(7,2), dummy varchar ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/catalog_sales'; CREATE EXTERNAl TABLE external_tpcds.customer ( c_customer_sk BIGINT NOT NULL, c_customer_id CHAR(16) NOT NULL, c_current_cdemo_sk BIGINT, c_current_hdemo_sk BIGINT, c_current_addr_sk BIGINT, c_first_shipto_date_sk BIGINT, c_first_sales_date_sk BIGINT, c_salutation CHAR(10), c_first_name CHAR(20), c_last_name CHAR(30), c_preferred_cust_flag char(1), c_birth_day INT, c_birth_month INT, c_birth_year INT, c_birth_country VARCHAR(20), c_login CHAR(13), c_email_address CHAR(50), c_last_review_date_sk BIGINT, dummy varchar ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/customer'; CREATE EXTERNAl TABLE external_tpcds.customer_address ( ca_address_sk BIGINT NOT NULL, ca_address_id VARCHAR(16) NOT NULL, ca_street_number VARCHAR(10), ca_street_name VARCHAR(60), ca_street_type VARCHAR(15), ca_suite_number VARCHAR(10), ca_city VARCHAR(60), ca_county VARCHAR(30), ca_state VARCHAR(2), ca_zip VARCHAR(10), ca_country VARCHAR(20), ca_gmt_offset DECIMAL(5,2), ca_location_type VARCHAR(20), dummy varchar ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/customer_address'; CREATE EXTERNAl TABLE external_tpcds.customer_demographics ( cd_demo_sk BIGINT not null, cd_gender char(1), cd_marital_status char(1), cd_education_status char(20), cd_purchase_estimate INT, cd_credit_rating char(10), cd_dep_count INT, cd_dep_employed_count INT, cd_dep_college_count INT, dummy varchar ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/customer_demographics'; CREATE EXTERNAl TABLE external_tpcds.date_dim ( d_date_sk BIGINT not null, d_date_id CHAR(16) not null, d_date DATE, d_month_seq INT, d_week_seq INT, d_quarter_seq INT, d_year INT, d_dow INT, d_moy INT, d_dom INT, d_qoy INT, d_fy_year INT, d_fy_quarter_seq INT, d_fy_week_seq INT, d_day_name CHAR(9), d_quarter_name CHAR(6), d_holiday CHAR(1), d_weekend CHAR(1), d_following_holiday CHAR(1), d_first_dom INT, d_last_dom INT, d_same_day_ly INT, d_same_day_lq INT, d_current_day CHAR(1), d_current_week CHAR(1), d_current_month CHAR(1), d_current_quarter CHAR(1), d_current_year CHAR(1), dummy varchar ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/date_dim'; CREATE EXTERNAl TABLE external_tpcds.household_demographics ( hd_demo_sk BIGINT not null, hd_income_band_sk BIGINT, hd_buy_potential CHAR(15), hd_dep_count INT, hd_vehicle_count INT, dummy varchar ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/household_demographics'; CREATE EXTERNAl TABLE external_tpcds.income_band ( ib_income_band_sk BIGINT not null, ib_lower_bound INT, ib_upper_bound INT, dummy varchar ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/income_band'; CREATE EXTERNAl TABLE external_tpcds.inventory ( inv_date_sk BIGINT not null, inv_item_sk BIGINT not null, inv_warehouse_sk BIGINT not null, inv_quantity_on_hand INT, dummy varchar ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/inventory'; CREATE EXTERNAl TABLE external_tpcds.item ( i_item_sk BIGINT not null, i_item_id CHAR(16) not null, i_rec_start_date DATE, i_rec_end_date DATE, i_item_desc VARCHAR(200), i_current_price DECIMAL(7,2), i_wholesale_cost DECIMAL(7,2), i_brand_id INT, i_brand CHAR(50), i_class_id INT, i_class CHAR(50), i_category_id INT, i_category CHAR(50), i_manufact_id INT, i_manufact CHAR(50), i_size CHAR(20), i_formulation CHAR(20), i_color CHAR(20), i_units CHAR(10), i_container CHAR(10), i_manager_id INT, i_product_name char(50), dummy varchar ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/item'; CREATE EXTERNAl TABLE external_tpcds.promotion ( p_promo_sk BIGINT not null, p_promo_id CHAR(16) not null, p_start_date_sk BIGINT, p_end_date_sk BIGINT, p_item_sk BIGINT, p_cost DECIMAL(15,2), p_response_target INT, p_promo_name CHAR(50), p_channel_dmail CHAR(1), p_channel_email CHAR(1), p_channel_catalog CHAR(1), p_channel_tv CHAR(1), p_channel_radio CHAR(1), p_channel_press CHAR(1), p_channel_event CHAR(1), p_channel_demo CHAR(1), p_channel_details VARCHAR(100), p_purpose CHAR(15), p_discount_active CHAR(1), dummy varchar ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/promotion'; CREATE EXTERNAl TABLE external_tpcds.reason ( r_reason_sk BIGINT not null, r_reason_id CHAR(16) not null, r_reason_desc CHAR(100), dummy varchar ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/reason'; CREATE EXTERNAl TABLE external_tpcds.ship_mode ( sm_ship_mode_sk BIGINT, sm_ship_mode_id CHAR(16) not null, sm_type CHAR(30), sm_code CHAR(10), sm_carrier CHAR(20), sm_contract CHAR(20), dummy varchar ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/ship_mode'; CREATE EXTERNAl TABLE external_tpcds.store_returns ( sr_returned_date_sk BIGINT, sr_return_time_sk BIGINT, sr_item_sk BIGINT not null, sr_customer_sk BIGINT, sr_cdemo_sk BIGINT, sr_hdemo_sk BIGINT, sr_addr_sk BIGINT, sr_store_sk BIGINT, sr_reason_sk BIGINT, sr_ticket_number BIGINT not null, sr_return_quantity INT, sr_return_amt DECIMAL(7,2), sr_return_tax DECIMAL(7,2), sr_return_amt_inc_tax DECIMAL(7,2), sr_fee DECIMAL(7,2), sr_return_ship_cost DECIMAL(7,2), sr_refunded_cash DECIMAL(7,2), sr_reversed_charge DECIMAL(7,2), sr_store_credit DECIMAL(7,2), sr_net_loss DECIMAL(7,2), dummy varchar ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/store_returns'; CREATE EXTERNAl TABLE external_tpcds.store_sales ( ss_sold_date_sk BIGINT, ss_sold_time_sk BIGINT, ss_item_sk BIGINT not null, ss_customer_sk BIGINT, ss_cdemo_sk BIGINT, ss_hdemo_sk BIGINT, ss_addr_sk BIGINT, ss_store_sk BIGINT, ss_promo_sk BIGINT, ss_ticket_number BIGINT not null, ss_quantity INT, ss_wholesale_cost DECIMAL(7,2), ss_list_price DECIMAL(7,2), ss_sales_price DECIMAL(7,2), ss_ext_discount_amt DECIMAL(7,2), ss_ext_sales_price DECIMAL(7,2), ss_ext_wholesale_cost DECIMAL(7,2), ss_ext_list_price DECIMAL(7,2), ss_ext_tax DECIMAL(7,2), ss_coupon_amt DECIMAL(7,2), ss_net_paid DECIMAL(7,2), ss_net_paid_inc_tax DECIMAL(7,2), ss_net_profit DECIMAL(7,2), dummy varchar ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/store_sales'; CREATE EXTERNAl TABLE external_tpcds.store ( s_store_sk BIGINT not null, s_store_id CHAR(16) not null, s_rec_start_date DATE, s_rec_end_date DATE, s_closed_date_sk BIGINT, s_store_name VARCHAR(50), s_number_employees INT, s_floor_space INT, s_hours CHAR(20), s_manager VARCHAR(40), s_market_id INT, s_geography_class VARCHAR(100), s_market_desc VARCHAR(100), s_market_manager VARCHAR(40), s_division_id INT, s_division_name VARCHAR(50), s_company_id INT, s_company_name VARCHAR(50), s_street_number VARCHAR(10), s_street_name VARCHAR(60), s_street_type CHAR(15), s_suite_number CHAR(10), s_city VARCHAR(60), s_county VARCHAR(30), s_state CHAR(2), s_zip CHAR(10), s_country VARCHAR(20), s_gmt_offset DECIMAL(5,2), s_tax_percentage DECIMAL(5,2), dummy varchar ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/store'; CREATE EXTERNAl TABLE external_tpcds.time_dim ( t_time_sk BIGINT not null, t_time_id CHAR(16) not null, t_time INT, t_hour INT, t_minute INT, t_second INT, t_am_pm CHAR(2), t_shift CHAR(20), t_sub_shift CHAR(20), t_meal_time CHAR(20), dummy varchar ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/time_dim'; CREATE EXTERNAl TABLE external_tpcds.warehouse ( w_warehouse_sk BIGINT not null, w_warehouse_id CHAR(16) not null, w_warehouse_name VARCHAR(20), w_warehouse_sq_ft INT, w_street_number CHAR(10), w_street_name VARCHAR(60), w_street_type CHAR(15), w_suite_number CHAR(10), w_city VARCHAR(60), w_county VARCHAR(30), w_state CHAR(2), w_zip CHAR(10), w_country VARCHAR(20), w_gmt_offset DECIMAL(5,2), dummy varchar ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/warehouse'; CREATE EXTERNAl TABLE external_tpcds.web_page ( wp_web_page_sk BIGINT not null, wp_web_page_id CHAR(16) not null, wp_rec_start_date DATE, wp_rec_end_date DATE, wp_creation_date_sk BIGINT, wp_access_date_sk BIGINT, wp_autogen_flag CHAR(1), wp_customer_sk BIGINT, wp_url VARCHAR(100), wp_type CHAR(50), wp_char_count INT, wp_link_count INT, wp_image_count INT, wp_max_ad_count INT, dummy varchar ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/web_page'; CREATE EXTERNAl TABLE external_tpcds.web_returns ( wr_returned_date_sk BIGINT, wr_returned_time_sk BIGINT, wr_item_sk BIGINT not null, wr_refunded_customer_sk BIGINT, wr_refunded_cdemo_sk BIGINT, wr_refunded_hdemo_sk BIGINT, wr_refunded_addr_sk BIGINT, wr_returning_customer_sk BIGINT, wr_returning_cdemo_sk BIGINT, wr_returning_hdemo_sk BIGINT, wr_returning_addr_sk BIGINT, wr_web_page_sk BIGINT, wr_reason_sk BIGINT, wr_order_number BIGINT not null, wr_return_quantity INT, wr_return_amt DECIMAL(7,2), wr_return_tax DECIMAL(7,2), wr_return_amt_inc_tax DECIMAL(7,2), wr_fee DECIMAL(7,2), wr_return_ship_cost DECIMAL(7,2), wr_refunded_cash DECIMAL(7,2), wr_reversed_charge DECIMAL(7,2), wr_account_credit DECIMAL(7,2), wr_net_loss DECIMAL(7,2), dummy varchar ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/web_returns'; CREATE EXTERNAl TABLE external_tpcds.web_sales ( ws_sold_date_sk BIGINT, ws_sold_time_sk BIGINT, ws_ship_date_sk BIGINT, ws_item_sk BIGINT not null, ws_bill_customer_sk BIGINT, ws_bill_cdemo_sk BIGINT, ws_bill_hdemo_sk BIGINT, ws_bill_addr_sk BIGINT, ws_ship_customer_sk BIGINT, ws_ship_cdemo_sk BIGINT, ws_ship_hdemo_sk BIGINT, ws_ship_addr_sk BIGINT, ws_web_page_sk BIGINT, ws_web_site_sk BIGINT, ws_ship_mode_sk BIGINT, ws_warehouse_sk BIGINT, ws_promo_sk BIGINT, ws_order_number BIGINT not null, ws_quantity INT, ws_wholesale_cost DECIMAL(7,2), ws_list_price DECIMAL(7,2), ws_sales_price DECIMAL(7,2), ws_ext_discount_amt DECIMAL(7,2), ws_ext_sales_price DECIMAL(7,2), ws_ext_wholesale_cost DECIMAL(7,2), ws_ext_list_price DECIMAL(7,2), ws_ext_tax DECIMAL(7,2), ws_coupon_amt DECIMAL(7,2), ws_ext_ship_cost DECIMAL(7,2), ws_net_paid DECIMAL(7,2), ws_net_paid_inc_tax DECIMAL(7,2), ws_net_paid_inc_ship DECIMAL(7,2), ws_net_paid_inc_ship_tax DECIMAL(7,2), ws_net_profit DECIMAL(7,2), dummy varchar ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/web_sales'; CREATE EXTERNAl TABLE external_tpcds.web_site ( web_site_sk BIGINT not null, web_site_id CHAR(16) not null, web_rec_start_date DATE, web_rec_end_date DATE, web_name VARCHAR(50), web_open_date_sk BIGINT, web_close_date_sk BIGINT, web_class VARCHAR(50), web_manager VARCHAR(40), web_mkt_id INT, web_mkt_class VARCHAR(50), web_mkt_desc VARCHAR(100), web_market_manager VARCHAR(40), web_company_id INT, web_company_name CHAR(50), web_street_number CHAR(10), web_street_name VARCHAR(60), web_street_type CHAR(15), web_suite_number CHAR(10), web_city VARCHAR(60), web_county VARCHAR(30), web_state CHAR(2), web_zip CHAR(10), web_country VARCHAR(20), web_gmt_offset DECIMAL(5,2), web_tax_percentage DECIMAL(5,2), dummy varchar ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/web_site';
将外表中的数据导入构建测试表文档所创建的内表中。
INSERT OVERWRITE INTO promotion SELECT * FROM external_tpcds.promotion; INSERT INTO web_site SELECT * FROM external_tpcds.web_site; INSERT OVERWRITE INTO web_sales SELECT * FROM external_tpcds.web_sales; INSERT OVERWRITE INTO web_returns SELECT * FROM external_tpcds.web_returns; INSERT OVERWRITE INTO web_page SELECT * FROM external_tpcds.web_page; INSERT INTO warehouse SELECT * FROM external_tpcds.warehouse; INSERT OVERWRITE INTO time_dim SELECT * FROM external_tpcds.time_dim; INSERT OVERWRITE INTO store_sales SELECT * FROM external_tpcds.store_sales; INSERT OVERWRITE INTO store_returns SELECT * FROM external_tpcds.store_returns; INSERT INTO store SELECT * FROM external_tpcds.store; INSERT OVERWRITE INTO household_demographics SELECT * FROM external_tpcds.household_demographics; INSERT INTO ship_mode SELECT * FROM external_tpcds.ship_mode; INSERT INTO reason SELECT * FROM external_tpcds.reason; INSERT INTO call_center SELECT * FROM external_tpcds.call_center; INSERT OVERWRITE INTO item SELECT * FROM external_tpcds.item; INSERT OVERWRITE INTO inventory SELECT * FROM external_tpcds.inventory; INSERT INTO income_band SELECT * FROM external_tpcds.income_band; INSERT INTO date_dim SELECT * FROM external_tpcds.date_dim; INSERT OVERWRITE INTO customer_demographics SELECT * FROM external_tpcds.customer_demographics; INSERT OVERWRITE INTO customer_address SELECT * FROM external_tpcds.customer_address; INSERT OVERWRITE INTO customer SELECT * FROM external_tpcds.customer; INSERT OVERWRITE INTO catalog_sales SELECT * FROM external_tpcds.catalog_sales; INSERT OVERWRITE INTO catalog_returns SELECT * FROM external_tpcds.catalog_returns; INSERT OVERWRITE INTO catalog_page SELECT * FROM external_tpcds.catalog_page;
收集统计信息。
ANALYZE TABLE call_center UPDATE HISTOGRAM ; ANALYZE TABLE catalog_page UPDATE HISTOGRAM ; ANALYZE TABLE catalog_returns UPDATE HISTOGRAM ; ANALYZE TABLE catalog_sales UPDATE HISTOGRAM ; ANALYZE TABLE customer UPDATE HISTOGRAM ; ANALYZE TABLE customer_address UPDATE HISTOGRAM ; ANALYZE TABLE customer_demographics UPDATE HISTOGRAM ; ANALYZE TABLE date_dim UPDATE HISTOGRAM ; ANALYZE TABLE household_demographics UPDATE HISTOGRAM ; ANALYZE TABLE income_band UPDATE HISTOGRAM ; ANALYZE TABLE inventory UPDATE HISTOGRAM ; ANALYZE TABLE item UPDATE HISTOGRAM ; ANALYZE TABLE promotion UPDATE HISTOGRAM ; ANALYZE TABLE reason UPDATE HISTOGRAM ; ANALYZE TABLE ship_mode UPDATE HISTOGRAM ; ANALYZE TABLE store UPDATE HISTOGRAM ; ANALYZE TABLE store_returns UPDATE HISTOGRAM ; ANALYZE TABLE store_sales UPDATE HISTOGRAM ; ANALYZE TABLE time_dim UPDATE HISTOGRAM ; ANALYZE TABLE warehouse UPDATE HISTOGRAM ; ANALYZE TABLE web_page UPDATE HISTOGRAM ; ANALYZE TABLE web_returns UPDATE HISTOGRAM ; ANALYZE TABLE web_sales UPDATE HISTOGRAM ; ANALYZE TABLE web_site UPDATE HISTOGRAM ;
说明查询优化器(Query Optimizer)将查询转换为执行计划,并交给执行引擎执行。执行计划的质量会影响查询的性能。统计信息作为查询优化器的输入,可以帮助查询优化器生成高质量的执行计划。因此,在导入数据后,您需收集所有表的直方图信息,以获得最佳性能。若您想了解统计信息的更多内容,请参见统计信息。
LOAD DATA导入
构建数据。
从TPC官网下载TPC-DS标准的数据生成工具DSDGEN,编译后生成二进制可执行文件dsdgen。
创建存放数据文件的目录。
mkdir data1tb
构建测试数据。
./dsdgen -sc 1000 -dir data1tb -TERMINATE N
参数说明如下:
参数
说明
示例
-sc
测试数据量的大小。10表示10GB,1000表示1000GB(1TB)。
1000
-dir
生成的数据文件写入的目录。
data1tb
-TERMINATE
每行最后是否加字段分隔符。取值如下:
N:每行最后不加字段分隔符。
Y:每行最后添加字段分隔符。比如分隔符|。
N
-PARALLEL
一共分成几个chunk。
一条语句只能生成一个chunk。因此设置了几个,就要执行几次。
5
-CHILD
当前命令生成第几个chunk。
1
例如,要构建1TB的测试数据,5个chunk来运行。
mkdir data1tb_5 ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 1 ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 2 ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 3 ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 4 ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 5
dsdgen命令生成的测试数据文件是文本格式,字段分隔符默认是管道符 |,一行一条数据记录。
call_center.dat catalog_page.dat catalog_returns.dat catalog_sales.dat customer_address.dat customer.dat customer_demographics.dat date_dim.dat dbgen_version.dat household_demographics.dat income_band.dat inventory.dat item.dat promotion.dat reason.dat ship_mode.dat store.dat store_returns.dat store_sales.dat time_dim.dat warehouse.dat web_page.dat web_returns.dat web_sales.dat web_site.dat
更多DSDGEN的使用方法,请参见tpc-ds。
对数据做兼容加工处理。
如果表的字段没有设置默认值,生成的字段值默认是NULL。
使用管道符“|”作为字段分隔符的时候,“a,NULL,c,d,NULL”导出到文本文件是“a||c|d|”的格式,在使用LOAD DATA的方法导入AnalyticDB的时候,会产生报错提示导入失败,所以需要对NULL值做一些处理。
把NULL替换为0,INT、BIGINT、VARCHAR、date字段类型的都会替换
#!/bin/bash # 用0来替换第一个字段的NULL值,把^|替换成0| # 用0来替换中间字段的NULL值, 把||替换成|0| # 用0来替换最后一个字段的NULL值,把|$替换成|0 for s_f in `ls *dat` do echo "$s_f" i=1 while [ `egrep '\|\||^\||\|$' $s_f |wc -l` -gt 0 ] do echo $i sed 's/^|/0|/g;s/||/|0|/g;s/|$/|0/g' -i $s_f ((i++)) done done
把date字段的0值替换成 0000-00-00
1092|AAAAAAAACEEAAAAA|2001-10-27|0|Manufa.... 16252|AAAAAAAAMHPDAAAA|0|1999-10-27|0|7.94|0|1001.... 16252|AAAAAAAAMHPDAAAA|0|0|0|7.94|0|1001.. for s_f in item.dat store.dat web_page.dat web_site.dat call_center.dat do # 处理第一、第二个date都是NULL的 sed 's/^\([A-Za-z0-9]*|[A-Za-z0-9]*\)|0|0|\(.*\)/\1|0000-00-00|0000-00-00|\2/' -i $s_f # 处理第二个date是NULL的 sed 's/^\([0-9A-Za-z]*|[A-Za-z0-9]*|[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}\)|0|\(.*\)/\1|0000-00-00|\2/' -i $s_f # 处理第一个date是NULL的 sed 's/^\([0-9A-Za-z]*|[A-Za-z0-9]*\)|0|\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}|.*\)/\1|0000-00-00|\2/' -i $s_f done
使用LOAD DATA LOCAL INFILE的方式把dsdgen生成的数据文本导入到AnalyticDB for MySQL。
说明如果是在Linux环境运行生成的数据文本,每行的结束符是'\n'。
如果是在Windows环境运行生成的数据文本,每行的结束符是'\r\n'。
LOAD DATA LOCAL INFILE 'call_center.dat' INTO TABLE call_center FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'; LOAD DATA LOCAL INFILE 'catalog_page.dat' INTO TABLE catalog_page FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'; LOAD DATA LOCAL INFILE 'catalog_returns.dat' INTO TABLE catalog_returns FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'; LOAD DATA LOCAL INFILE 'catalog_sales.dat' INTO TABLE catalog_sales FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'; LOAD DATA LOCAL INFILE 'customer_address.dat' INTO TABLE customer_address FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'; LOAD DATA LOCAL INFILE 'customer.dat' INTO TABLE customer FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'; LOAD DATA LOCAL INFILE 'customer_demographics.dat' INTO TABLE customer_demographics FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'; LOAD DATA LOCAL INFILE 'date_dim.dat' INTO TABLE date_dim FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'; LOAD DATA LOCAL INFILE 'dbgen_version.dat' INTO TABLE dbgen_version FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'; LOAD DATA LOCAL INFILE 'household_demographics.dat' INTO TABLE household_demographics FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'; LOAD DATA LOCAL INFILE 'income_band.dat' INTO TABLE income_band FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'; LOAD DATA LOCAL INFILE 'inventory.dat' INTO TABLE inventory FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'; LOAD DATA LOCAL INFILE 'item.dat' INTO TABLE item FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'; LOAD DATA LOCAL INFILE 'promotion.dat' INTO TABLE promotion FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'; LOAD DATA LOCAL INFILE 'reason.dat' INTO TABLE reason FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'; LOAD DATA LOCAL INFILE 'ship_mode.dat' INTO TABLE ship_mode FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'; LOAD DATA LOCAL INFILE 'store.dat' INTO TABLE store FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'; LOAD DATA LOCAL INFILE 'store_returns.dat' INTO TABLE store_returns FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'; LOAD DATA LOCAL INFILE 'store_sales.dat' INTO TABLE store_sales FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'; LOAD DATA LOCAL INFILE 'time_dim.dat' INTO TABLE time_dim FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'; LOAD DATA LOCAL INFILE 'warehouse.dat' INTO TABLE warehouse FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'; LOAD DATA LOCAL INFILE 'web_page.dat' INTO TABLE web_page FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'; LOAD DATA LOCAL INFILE 'web_returns.dat' INTO TABLE web_returns FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'; LOAD DATA LOCAL INFILE 'web_sales.dat' INTO TABLE web_sales FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'; LOAD DATA LOCAL INFILE 'web_site.dat' INTO TABLE web_site FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
收集统计信息。
ANALYZE TABLE call_center UPDATE HISTOGRAM ; ANALYZE TABLE catalog_page UPDATE HISTOGRAM ; ANALYZE TABLE catalog_returns UPDATE HISTOGRAM ; ANALYZE TABLE catalog_sales UPDATE HISTOGRAM ; ANALYZE TABLE customer UPDATE HISTOGRAM ; ANALYZE TABLE customer_address UPDATE HISTOGRAM ; ANALYZE TABLE customer_demographics UPDATE HISTOGRAM ; ANALYZE TABLE date_dim UPDATE HISTOGRAM ; ANALYZE TABLE household_demographics UPDATE HISTOGRAM ; ANALYZE TABLE income_band UPDATE HISTOGRAM ; ANALYZE TABLE inventory UPDATE HISTOGRAM ; ANALYZE TABLE item UPDATE HISTOGRAM ; ANALYZE TABLE promotion UPDATE HISTOGRAM ; ANALYZE TABLE reason UPDATE HISTOGRAM ; ANALYZE TABLE ship_mode UPDATE HISTOGRAM ; ANALYZE TABLE store UPDATE HISTOGRAM ; ANALYZE TABLE store_returns UPDATE HISTOGRAM ; ANALYZE TABLE store_sales UPDATE HISTOGRAM ; ANALYZE TABLE time_dim UPDATE HISTOGRAM ; ANALYZE TABLE warehouse UPDATE HISTOGRAM ; ANALYZE TABLE web_page UPDATE HISTOGRAM ; ANALYZE TABLE web_returns UPDATE HISTOGRAM ; ANALYZE TABLE web_sales UPDATE HISTOGRAM ; ANALYZE TABLE web_site UPDATE HISTOGRAM ;
说明查询优化器(Query Optimizer)将查询转换为执行计划,并交给执行引擎执行。执行计划的质量会影响查询的性能。统计信息作为查询优化器的输入,可以帮助查询优化器生成高质量的执行计划。因此,在导入数据后,您需收集所有表的直方图信息,以获得最佳性能。若您想了解统计信息的更多内容,请参见统计信息。