本文介绍如何将TPC-DS测试数据导入AnalyticDB MySQL中。
对数据做兼容加工处理
如果表的字段没有设置默认值,生成的字段值默认是NULL。
使用管道符“|”作为字段分隔符的时候,“a,NULL,c,d,NULL”导出到文本文件是“a||c|d|”的格式,在使用LOAD DATA的方法导入AnalyticDB的时候,会产生报错提示导入失败,所以需要对NULL值做一些处理。
- 把NULL替换为0,int、bigint、varchar、date字段类型的都会替换
#!/bin/bash # 用0来替换第一个字段的NULL值,把^|替换成0| # 用0来替换中间字段的NULL值, 把||替换成|0| # 用0来替换最后一个字段的NULL值,把|$替换成|0 for s_f in `ls *dat` do echo "$s_f" i=1 while [ `egrep '\|\||^\||\|$' $s_f |wc -l` -gt 0 ] do echo $i sed 's/^|/0|/g;s/||/|0|/g;s/|$/|0/g' -i $s_f ((i++)) done done
- 把date字段的0值替换成 0000-00-00
1092|AAAAAAAACEEAAAAA|2001-10-27|0|Manufa.... 16252|AAAAAAAAMHPDAAAA|0|1999-10-27|0|7.94|0|1001.... 16252|AAAAAAAAMHPDAAAA|0|0|0|7.94|0|1001.. for s_f in item.dat store.dat web_page.dat web_site.dat call_center.dat do # 处理第一、第二个date都是NULL的 sed 's/^\([A-Za-z0-9]*|[A-Za-z0-9]*\)|0|0|\(.*\)/\1|0000-00-00|0000-00-00|\2/' -i $s_f # 处理第二个date是NULL的 sed 's/^\([0-9A-Za-z]*|[A-Za-z0-9]*|[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}\)|0|\(.*\)/\1|0000-00-00|\2/' -i $s_f # 处理第一个date是NULL的 sed 's/^\([0-9A-Za-z]*|[A-Za-z0-9]*\)|0|\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}|.*\)/\1|0000-00-00|\2/' -i $s_f done
导入数据到AnalyticDB MySQL
使用LOAD DATA LOCAL INFILE的方式把dsdgen生成的数据文本导入到AnalyticDB MySQL。
说明
- 如果是在Linux环境运行生成的数据文本,每行的结束符是'\n'。
- 如果是在Windows环境运行生成的数据文本,每行的结束符是'\r\n'。
LOAD DATA LOCAL INFILE 'call_center.dat' INTO
TABLE call_center FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'catalog_page.dat' INTO
TABLE catalog_page FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'catalog_returns.dat'
INTO TABLE catalog_returns FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'catalog_sales.dat'
INTO TABLE catalog_sales FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'customer_address.dat'
INTO TABLE customer_address FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'customer.dat' INTO
TABLE customer FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE
'customer_demographics.dat' INTO TABLE customer_demographics FIELDS TERMINATED
BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'date_dim.dat' INTO
TABLE date_dim FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'dbgen_version.dat'
INTO TABLE dbgen_version FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE
'household_demographics.dat' INTO TABLE household_demographics FIELDS
TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'income_band.dat' INTO
TABLE income_band FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'inventory.dat' INTO
TABLE inventory FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'item.dat' INTO TABLE
item FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'promotion.dat' INTO
TABLE promotion FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'reason.dat' INTO TABLE
reason FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'ship_mode.dat' INTO
TABLE ship_mode FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'store.dat' INTO TABLE
store FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'store_returns.dat'
INTO TABLE store_returns FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'store_sales.dat' INTO
TABLE store_sales FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'time_dim.dat' INTO
TABLE time_dim FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'warehouse.dat' INTO
TABLE warehouse FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'web_page.dat' INTO
TABLE web_page FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'web_returns.dat' INTO
TABLE web_returns FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'web_sales.dat' INTO
TABLE web_sales FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'web_site.dat' INTO
TABLE web_site FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
在AnalyticDB MySQL中,还可以通过OSS外表方式导入测试数据,请参见通过外表导入OSS数据。
下表列出了TPC-DS测试数据集中的表数据条数。
表名 | 数据条数 |
---|---|
store_sales | 2,879,987,999 |
catalog_sales | 1,439,980,416 |
web_sales | 720,000,376 |
store_returns | 287,999,764 |
catalog_returns | 143,996,756 |
inventory | 783,000,000 |
web_returns | 71,997,522 |
customer | 12,000,000 |
customer_address | 6,000,000 |
item | 300,000 |
customer_demographics | 1,920,800 |
date_dim | 73,049 |
time_dim | 86,400 |
catalog_page | 30,000 |
web_page | 3,000 |
store | 1,002 |
promotion | 1,500 |
household_demographics | 7,200 |
web_site | 54 |
call_center | 42 |
reason | 65 |
warehouse | 20 |
ship_mode | 20 |
income_band | 20 |