SSB Benchmark性能测试
本文介绍如何使用Star Schema Benchmark(SSB)对云原生数据仓库 AnalyticDB PostgreSQL 版进行性能测试,并提供测试结果供参考。
SSB简介
Star Schema Benchmark(SSB)是学术界和工业界广泛应用的一种星型模型测试集。该测试集为比较各种OLAP产品的基础性能指标提供了便利。SSB通过将星型模型扁平化为一个宽表(Flat Table),转变为一种单表测试基准,适用于评估列存引擎的聚合查询性能。更多信息,请参见SSB论文。
测试说明
-
本文以100 GB数据规模为例进行测试,推荐使用8个segment节点(每节点2C16G)。
-
如需测试1 TB数据规模,推荐使用48个segment节点(每节点2C16G),通过更多的计算节点发挥MPP架构优势。
测试结果
基于SSB 100 GB数据集,运行13条标准测试SQL,测试结果如下:
|
Query |
执行时间(秒) |
|
Q1.1 |
0.17 |
|
Q1.2 |
0.03 |
|
Q1.3 |
0.20 |
|
Q2.1 |
0.69 |
|
Q2.2 |
0.51 |
|
Q2.3 |
0.38 |
|
Q3.1 |
0.97 |
|
Q3.2 |
0.74 |
|
Q3.3 |
0.42 |
|
Q3.4 |
0.04 |
|
Q4.1 |
1.11 |
|
Q4.2 |
0.39 |
|
Q4.3 |
0.26 |
|
合计 |
5.91 |
前提条件
测试步骤
步骤一:创建实例
-
创建用于测试的AnalyticDB PostgreSQL版实例。具体操作,请参见创建实例。
本文中100 GB测试所使用的实例规格如下:
配置项
配置值
实例资源类型
存储弹性模式
实例系列
高可用
引擎版本
7.0标准版
节点规格(Segment)
2C16G
节点数量
8个
存储磁盘类型
ESSD云盘PL1
节点存储容量
200 GB
数据库内核版本
v7.4.2.1及以上
-
将ECS实例的IP地址添加到AnalyticDB PostgreSQL版实例的白名单中。
如测试100 GB,推荐8个节点(每节点2C16G);如测试1 TB,推荐48个节点(每节点2C16G)。
推荐:鉴于不同可用区资源限制,现阶段测试推荐张家口系列、杭州可用区J、北京可用区I这几个资源相对充足的可用区进行测试。
实例创建成功后,可控制台点击升级到最新版本v7.4.2.1及以上。
步骤二:生成测试数据
-
下载SSB官方数据生成工具并编译。
git clone https://github.com/vadimtk/ssb-dbgen.git cd ssb-dbgen make -
生成SSB 100 GB数据。
重要-
-s参数为数据量,以GB为单位。 -
-C参数为文件块的数量,强烈建议设置为集群Segment(计算节点)数量的整数倍。例如100 GB测试推荐设置为16,1 TB测试推荐设置为96。
在ssb-dbgen目录下执行以下脚本并行生成数据:
--生成100GB SSB Benchmark for((i=1;i<=16;i++)); do ./dbgen -s 100 -S $i -C 16 -f -T l & if [ $(($i%16)) -eq 0 ] # 一次并行生成16个子文件 then wait echo $i fi done ./dbgen -s 100 -T p ./dbgen -s 100 -T c ./dbgen -s 100 -T s -
-
将生成的数据上传至OSS。
如果您是第一次使用OSS,请参见命令行工具ossutil 1.0完成ossutil命令行工具的安装与配置,然后执行以下命令将数据上传至OSS:
./ossutil64 cp -r <测试数据所在的路径> <OSS Bucket路径> --exclude "*.tbl*"示例:
./ossutil64 cp -r /mnt/dbqgen/ oss://adbpg --exclude "*.tbl*"
步骤三:创建数据表
SSB测试最终使用的Flat表是由多张基础表JOIN生成的。首先创建基础表。
CREATE TABLE customer
(
C_CUSTKEY int,
C_NAME text,
C_ADDRESS text,
C_CITY text,
C_NATION text,
C_REGION text,
C_PHONE text,
C_MKTSEGMENT text
)
using beam
with(compresstype=auto, compresslevel=1)
DISTRIBUTED BY(C_CUSTKEY)
ORDER BY (C_CUSTKEY);
CREATE TABLE lineorder
(
LO_ORDERKEY bigint,
LO_LINENUMBER int,
LO_CUSTKEY int,
LO_PARTKEY int,
LO_SUPPKEY int,
LO_ORDERDATE Date,
LO_ORDERPRIORITY text,
LO_SHIPPRIORITY int,
LO_QUANTITY int,
LO_EXTENDEDPRICE int,
LO_ORDTOTALPRICE int,
LO_DISCOUNT int,
LO_REVENUE int,
LO_SUPPLYCOST int,
LO_TAX int,
LO_COMMITDATE Date,
LO_SHIPMODE text
)
using beam
with(compresstype=auto,compresslevel=1)
DISTRIBUTED BY(LO_ORDERKEY)
ORDER BY (LO_ORDERDATE, LO_ORDERKEY);
CREATE TABLE part
(
P_PARTKEY int,
P_NAME text,
P_MFGR text,
P_CATEGORY text,
P_BRAND text,
P_COLOR text,
P_TYPE text,
P_SIZE int,
P_CONTAINER text
)
using beam
with(compresstype=auto,compresslevel=1)
DISTRIBUTED BY(P_PARTKEY)
ORDER BY (P_PARTKEY);
CREATE TABLE supplier
(
S_SUPPKEY int,
S_NAME text,
S_ADDRESS text,
S_CITY text,
S_NATION text,
S_REGION text,
S_PHONE text
)
using beam
with(compresstype=auto,compresslevel=1)
DISTRIBUTED BY(S_SUPPKEY)
ORDER BY (S_SUPPKEY);
步骤四:创建OSS外表并导入基础数据
创建OSS外表用于导入OSS上的测试数据。更多信息,请参见使用OSS Foreign Table进行数据湖分析。
CREATE SERVER oss_serv
FOREIGN DATA WRAPPER oss_fdw
OPTIONS (
endpoint 'oss-xxx.aliyuncs.com',
bucket 'xxxx'
);
CREATE USER MAPPING FOR PUBLIC SERVER oss_serv OPTIONS ( id 'xxxx', key 'xxx' );
CREATE FOREIGN TABLE ext_customer
(
C_CUSTKEY int,
C_NAME text,
C_ADDRESS text,
C_CITY text,
C_NATION text,
C_REGION text,
C_PHONE text,
C_MKTSEGMENT text
)
server oss_serv
options (
prefix 'ssb100g/customer_',
format 'csv',
DELIMITER '|'
);
CREATE FOREIGN TABLE ext_lineorder
(
LO_ORDERKEY bigint,
LO_LINENUMBER int,
LO_CUSTKEY int,
LO_PARTKEY int,
LO_SUPPKEY int,
LO_ORDERDATE Date,
LO_ORDERPRIORITY text,
LO_SHIPPRIORITY int,
LO_QUANTITY int,
LO_EXTENDEDPRICE int,
LO_ORDTOTALPRICE int,
LO_DISCOUNT int,
LO_REVENUE int,
LO_SUPPLYCOST int,
LO_TAX int,
LO_COMMITDATE Date,
LO_SHIPMODE text
)
server oss_serv
options (
prefix 'ssb100g/lineorder_',
format 'csv',
DELIMITER '|'
);
CREATE FOREIGN TABLE ext_part
(
P_PARTKEY int,
P_NAME text,
P_MFGR text,
P_CATEGORY text,
P_BRAND text,
P_COLOR text,
P_TYPE text,
P_SIZE int,
P_CONTAINER text
)
server oss_serv
options (
prefix 'ssb100g/part_',
format 'csv',
DELIMITER '|'
);
CREATE FOREIGN TABLE ext_supplier
(
S_SUPPKEY int,
S_NAME text,
S_ADDRESS text,
S_CITY text,
S_NATION text,
S_REGION text,
S_PHONE text
)
server oss_serv
options (
prefix 'ssb100g/supplier_',
format 'csv',
DELIMITER '|'
);
--导入数据
insert into part select * from ext_part;
insert into supplier select * from ext_supplier;
insert into customer select * from ext_customer;
insert into lineorder select * from ext_lineorder;
--Drop external table
drop external table ext_part;
drop external table ext_supplier;
drop external table ext_customer;
drop external table ext_lineorder;
步骤五:生成Flat宽表
将4张基础表通过JOIN生成SSB Flat宽表lineorder_flat。
-
创建
lineorder_flat表并导入数据。CREATE TABLE lineorder_flat ( LO_ORDERKEY bigint not null, LO_LINENUMBER int not null, LO_CUSTKEY int not null, LO_PARTKEY int not null, LO_SUPPKEY int not null, LO_ORDERDATE Date not null, LO_ORDERPRIORITY text not null, LO_SHIPPRIORITY int not null, LO_QUANTITY int not null, LO_EXTENDEDPRICE int not null, LO_ORDTOTALPRICE int not null, LO_DISCOUNT int not null, LO_REVENUE int not null, LO_SUPPLYCOST int not null, LO_TAX int not null, LO_COMMITDATE Date not null, LO_SHIPMODE text not null, C_NAME text not null, C_ADDRESS text not null, C_CITY text not null, C_NATION text not null, C_REGION text not null, C_PHONE text not null, C_MKTSEGMENT text not null, S_NAME text not null, S_ADDRESS text not null, S_CITY text not null, S_NATION text not null, S_REGION text not null, S_PHONE text not null, P_NAME text not null, P_MFGR text not null, P_CATEGORY text not null, P_BRAND text not null, P_COLOR text not null, P_TYPE text not null, P_SIZE int not null, P_CONTAINER text not null ) using beam with(compresstype=auto, compresslevel=1) DISTRIBUTED BY(LO_ORDERKEY) ORDER BY(LO_ORDERDATE, LO_ORDERKEY); INSERT INTO lineorder_flat SELECT l.LO_ORDERKEY AS LO_ORDERKEY, l.LO_LINENUMBER AS LO_LINENUMBER, l.LO_CUSTKEY AS LO_CUSTKEY, l.LO_PARTKEY AS LO_PARTKEY, l.LO_SUPPKEY AS LO_SUPPKEY, l.LO_ORDERDATE AS LO_ORDERDATE, l.LO_ORDERPRIORITY AS LO_ORDERPRIORITY, l.LO_SHIPPRIORITY AS LO_SHIPPRIORITY, l.LO_QUANTITY AS LO_QUANTITY, l.LO_EXTENDEDPRICE AS LO_EXTENDEDPRICE, l.LO_ORDTOTALPRICE AS LO_ORDTOTALPRICE, l.LO_DISCOUNT AS LO_DISCOUNT, l.LO_REVENUE AS LO_REVENUE, l.LO_SUPPLYCOST AS LO_SUPPLYCOST, l.LO_TAX AS LO_TAX, l.LO_COMMITDATE AS LO_COMMITDATE, l.LO_SHIPMODE AS LO_SHIPMODE, c.C_NAME AS C_NAME, c.C_ADDRESS AS C_ADDRESS, c.C_CITY AS C_CITY, c.C_NATION AS C_NATION, c.C_REGION AS C_REGION, c.C_PHONE AS C_PHONE, c.C_MKTSEGMENT AS C_MKTSEGMENT, s.S_NAME AS S_NAME, s.S_ADDRESS AS S_ADDRESS, s.S_CITY AS S_CITY, s.S_NATION AS S_NATION, s.S_REGION AS S_REGION, s.S_PHONE AS S_PHONE, p.P_NAME AS P_NAME, p.P_MFGR AS P_MFGR, p.P_CATEGORY AS P_CATEGORY, p.P_BRAND AS P_BRAND, p.P_COLOR AS P_COLOR, p.P_TYPE AS P_TYPE, p.P_SIZE AS P_SIZE, p.P_CONTAINER AS P_CONTAINER FROM lineorder AS l INNER JOIN customer AS c ON c.C_CUSTKEY = l.LO_CUSTKEY INNER JOIN supplier AS s ON s.S_SUPPKEY = l.LO_SUPPKEY INNER JOIN part AS p ON p.P_PARTKEY = l.LO_PARTKEY; ANALYZE lineorder_flat; FLUSH lineorder_flat; -
收集统计信息并刷盘。
ANALYZE lineorder_flat; FLUSH lineorder_flat; -
等待后台任务完成。执行以下查询,当不返回结果时即可进行测试。
select datname, query, state from pg_stat_activity where state <> 'idle' and backend_type like '%autovacuum%';
100 GB数据集各表数据量参考如下:
|
表名 |
行数 |
|
customer |
3,000,000 |
|
lineorder |
600,037,902 |
|
part |
1,400,000 |
|
supplier |
200,000 |
|
lineorder_flat |
600,037,902 |
步骤六:执行开箱配置
ALTER DATABASE your_db set laser.enable_vectorized_engine to on;
ALTER DATABASE your_db set statement_mem = '16GB';
步骤七:运行测试
SSB基准测试包含13条标准SQL查询(附录:SSB测试SQL),分为4组(Q1、Q2、Q3、Q4)。
-
测试目录结构如下:
benchmark/ ├── benchmark.sh └── query/ ├── q1.sql ├── q2.sql ├── ... └── q13.sql-
将以下测试脚本保存为
benchmark/benchmark.sh。#!/bin/bash usage() { cat <<EOF Usage: $0 <database> <round> <host> Params: database 必填,数据库名 round 必填,执行轮数(正整数) host 必填,数据库主机地址,如 127.0.0.1 或 hostname Examples: # 在本机执行 adbpg,执行 3 轮 $0 tpcds_db 3 127.0.0.1 # 在远程主机执行 1 轮 $0 tpcds_db 1 10.0.0.5 EOF } # 参数: $1=database, $2=round, $3=host DATABASE="$1" ROUND="${2:-1}" HOST="$3" # 参数检查 if [ -z "$DATABASE" ] || [ -z "$ROUND" ] || [ -z "$HOST" ]; then usage exit 1 fi echo "Database: $DATABASE" echo "Round: $ROUND" echo "Host: $HOST" HOME_PATH=$(pwd) QUERY_PATH="$HOME_PATH/query/" LOG_PATH="$HOME_PATH/log/" mkdir -p "$LOG_PATH" timestamp=$(date +%Y%m%d_%H%M%S) log_file="$HOME_PATH/tpcds_query_rt_${timestamp}.csv" echo "query,cost,total_cost" > "$log_file" total_cost=0 for ((i=1; i<=99; i++)); do last_cost=0 for ((r=1; r<=ROUND; r++)); do echo "begin run TPC-DS Q${i} round ${r}/$ROUND on adbpg (db: $DATABASE, host: $HOST), $(date)" begin_time=$(date +%s.%N) psql -h "$HOST" -d "$DATABASE" -f "$QUERY_PATH/q$i.sql" > "$LOG_PATH/log${i}_r${r}.out" rc=$? end_time=$(date +%s.%N) last_cost=$(awk "BEGIN {print $end_time - $begin_time}") if [ $rc -ne 0 ] ; then printf "run TPC-DS Q%s round %s on adbpg fail, cost: %.2f, %s\n" "$i" "$r" "$last_cost" "$(date)" else printf "run TPC-DS Q%s round %s on adbpg succ, cost: %.2f, %s\n" "$i" "$r" "$last_cost" "$(date)" fi done cost=$last_cost total_cost=$(awk "BEGIN {print $total_cost + $cost}") printf "TPC-DS Q%s final (round %d): cost: %.2f, totalCost: %.2f, %s\n" "$i" "$ROUND" "$cost" "$total_cost" "$(date)" printf "Q%s,%.2f,%.2f\n" "$i" "$cost" "$total_cost" >> "$log_file" done -
将13条测试SQL(附录:SSB测试SQL)分别保存为
q1.sql~q13.sql,存放在benchmark/query/目录下。
-
-
进入benchmark目录,执行以下命令运行测试。
sh benchmark.sh <database> 2 <host>-
<database>:数据库名称。 -
2:执行轮数,取最后一轮的结果。 -
<host>:数据库主机地址。
-
附录:SSB测试SQL
以下为SSB 13条标准测试SQL。
-
Q1.1
SELECT sum(lo_extendedprice * lo_discount) AS revenue FROM lineorder_flat WHERE lo_orderdate >= '1993-01-01' and lo_orderdate <= '1993-12-31' AND lo_discount BETWEEN 1 AND 3 AND lo_quantity < 25; -
Q1.2
SELECT sum(lo_extendedprice * lo_discount) AS revenue FROM lineorder_flat WHERE lo_orderdate >= '1994-01-01' and lo_orderdate <= '1994-01-31' AND lo_discount BETWEEN 4 AND 6 AND lo_quantity BETWEEN 26 AND 35; -
Q1.3
SELECT sum(lo_extendedprice * lo_discount) AS revenue FROM lineorder_flat WHERE extract(week from lo_orderdate) = 6 AND lo_orderdate >= '1994-01-01' and lo_orderdate <= '1994-12-31' AND lo_discount BETWEEN 5 AND 7 AND lo_quantity BETWEEN 26 AND 35; -
Q2.1
SELECT sum(lo_revenue), extract(year from lo_orderdate) AS year, p_brand FROM lineorder_flat WHERE p_category = 'MFGR#12' AND s_region = 'AMERICA' GROUP BY year, p_brand ORDER BY year, p_brand LIMIT 100; -
Q2.2
SELECT sum(lo_revenue), extract(year from lo_orderdate) AS year, p_brand FROM lineorder_flat WHERE p_brand >= 'MFGR#2221' AND p_brand <= 'MFGR#2228' AND s_region = 'ASIA' GROUP BY year, p_brand ORDER BY year, p_brand LIMIT 100; -
Q2.3
SELECT sum(lo_revenue), extract(year from lo_orderdate) AS year, p_brand FROM lineorder_flat WHERE p_brand = 'MFGR#2239' AND s_region = 'EUROPE' GROUP BY year, p_brand ORDER BY year, p_brand LIMIT 100; -
Q3.1
SELECT c_nation, s_nation, extract(year from lo_orderdate) AS year, sum(lo_revenue) AS revenue FROM lineorder_flat WHERE c_region = 'ASIA' AND s_region = 'ASIA' AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31' GROUP BY c_nation, s_nation, year ORDER BY year ASC, revenue DESC LIMIT 100; -
Q3.2
SELECT c_city, s_city, extract(year from lo_orderdate) AS year, sum(lo_revenue) AS revenue FROM lineorder_flat WHERE c_nation = 'UNITED STATES' AND s_nation = 'UNITED STATES' AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31' GROUP BY c_city, s_city, year ORDER BY year ASC, revenue DESC LIMIT 100; -
Q3.3
SELECT c_city, s_city, extract(year from lo_orderdate) AS year, sum(lo_revenue) AS revenue FROM lineorder_flat WHERE c_city in ( 'UNITED KI1' ,'UNITED KI5') AND s_city in ( 'UNITED KI1' ,'UNITED KI5') AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31' GROUP BY c_city, s_city, year ORDER BY year ASC, revenue DESC LIMIT 100; -
Q3.4
SELECT c_city, s_city, extract(year from lo_orderdate) AS year, sum(lo_revenue) AS revenue FROM lineorder_flat WHERE c_city in ('UNITED KI1', 'UNITED KI5') AND s_city in ( 'UNITED KI1', 'UNITED KI5') AND lo_orderdate >= '1997-12-01' AND lo_orderdate <= '1997-12-31' GROUP BY c_city, s_city, year ORDER BY year ASC, revenue DESC LIMIT 100; -
Q4.1
SELECT EXTRACT(YEAR FROM lo_orderdate) AS YEAR, c_nation, SUM(lo_revenue - lo_supplycost) AS profit FROM lineorder_flat WHERE c_region = 'AMERICA' AND s_region = 'AMERICA' AND p_mfgr IN ('MFGR#1', 'MFGR#2') GROUP BY YEAR, c_nation ORDER BY YEAR ASC,c_nation ASC LIMIT 100; -
Q4.2
SELECT extract(year from lo_orderdate) AS year, s_nation, p_category, sum(lo_revenue - lo_supplycost) AS profit FROM lineorder_flat WHERE c_region = 'AMERICA' AND s_region = 'AMERICA' AND lo_orderdate >= '1997-01-01' and lo_orderdate <= '1998-12-31' AND p_mfgr in ( 'MFGR#1' , 'MFGR#2') GROUP BY year, s_nation, p_category ORDER BY year ASC, s_nation ASC, p_category ASC LIMIT 100; -
Q4.3
SELECT extract(year from lo_orderdate) AS year, s_city, p_brand, sum(lo_revenue - lo_supplycost) AS profit FROM lineorder_flat WHERE s_nation = 'UNITED STATES' AND lo_orderdate >= '1997-01-01' and lo_orderdate <= '1998-12-31' AND p_category = 'MFGR#14' GROUP BY year, s_city, p_brand ORDER BY year ASC, s_city ASC, p_brand ASC LIMIT 100;