SSB Benchmark性能测试

更新时间:
复制 MD 格式

本文介绍如何使用Star Schema Benchmark(SSB)对云原生数据仓库 AnalyticDB PostgreSQL 版进行性能测试,并提供测试结果供参考。

SSB简介

Star Schema Benchmark(SSB)是学术界和工业界广泛应用的一种星型模型测试集。该测试集为比较各种OLAP产品的基础性能指标提供了便利。SSB通过将星型模型扁平化为一个宽表(Flat Table),转变为一种单表测试基准,适用于评估列存引擎的聚合查询性能。更多信息,请参见SSB论文

测试说明

  • 本文以100 GB数据规模为例进行测试,推荐使用8segment节点(每节点2C16G)。

  • 如需测试1 TB数据规模,推荐使用48segment节点(每节点2C16G),通过更多的计算节点发挥MPP架构优势。

测试结果

基于SSB 100 GB数据集,运行13条标准测试SQL,测试结果如下:

Query

执行时间(秒)

Q1.1

0.17

Q1.2

0.03

Q1.3

0.20

Q2.1

0.69

Q2.2

0.51

Q2.3

0.38

Q3.1

0.97

Q3.2

0.74

Q3.3

0.42

Q3.4

0.04

Q4.1

1.11

Q4.2

0.39

Q4.3

0.26

合计

5.91

前提条件

测试步骤

步骤一:创建实例

  1. 创建用于测试的AnalyticDB PostgreSQL实例。具体操作,请参见创建实例

    本文中100 GB测试所使用的实例规格如下:

    配置项

    配置值

    实例资源类型

    存储弹性模式

    实例系列

    高可用

    引擎版本

    7.0标准版

    节点规格(Segment)

    2C16G

    节点数量

    8

    存储磁盘类型

    ESSD云盘PL1

    节点存储容量

    200 GB

    数据库内核版本

    v7.4.2.1及以上

  2. ECS实例的IP地址添加到AnalyticDB PostgreSQL实例的白名单中。

说明
  • 如测试100 GB,推荐8个节点(每节点2C16G);如测试1 TB,推荐48个节点(每节点2C16G)。

  • 推荐:鉴于不同可用区资源限制,现阶段测试推荐张家口系列、杭州可用区J、北京可用区I这几个资源相对充足的可用区进行测试。

  • 实例创建成功后,可控制台点击升级到最新版本v7.4.2.1及以上。

步骤二:生成测试数据

  1. 下载SSB官方数据生成工具并编译。

    git clone https://github.com/vadimtk/ssb-dbgen.git
    cd ssb-dbgen
    make
  2. 生成SSB 100 GB数据。

    重要
    • -s参数为数据量,以GB为单位。

    • -C参数为文件块的数量,强烈建议设置为集群Segment(计算节点)数量的整数倍。例如100 GB测试推荐设置为16,1 TB测试推荐设置为96。

    ssb-dbgen目录下执行以下脚本并行生成数据:

    --生成100GB SSB Benchmark
    for((i=1;i<=16;i++));
    do
    ./dbgen -s 100 -S $i -C 16 -f -T l &
    if [ $(($i%16)) -eq 0 ] # 一次并行生成16个子文件
    then
            wait
            echo $i
    fi
    done
    
    ./dbgen -s 100 -T p
    ./dbgen -s 100 -T c
    ./dbgen -s 100 -T s
  3. 将生成的数据上传至OSS。

    如果您是第一次使用OSS,请参见命令行工具ossutil 1.0完成ossutil命令行工具的安装与配置,然后执行以下命令将数据上传至OSS:

    ./ossutil64 cp -r <测试数据所在的路径> <OSS Bucket路径> --exclude "*.tbl*"

    示例:

    ./ossutil64 cp -r /mnt/dbqgen/ oss://adbpg --exclude "*.tbl*"

步骤三:创建数据表

SSB测试最终使用的Flat表是由多张基础表JOIN生成的。首先创建基础表。

CREATE TABLE customer
(
  C_CUSTKEY       int,
  C_NAME          text,
  C_ADDRESS       text,
  C_CITY          text,
  C_NATION        text,
  C_REGION        text,
  C_PHONE         text,
  C_MKTSEGMENT    text
)
using beam
with(compresstype=auto, compresslevel=1)
DISTRIBUTED BY(C_CUSTKEY)
ORDER BY (C_CUSTKEY);

CREATE TABLE lineorder
(
  LO_ORDERKEY             bigint,
  LO_LINENUMBER           int,
  LO_CUSTKEY              int,
  LO_PARTKEY              int,
  LO_SUPPKEY              int,
  LO_ORDERDATE            Date,
  LO_ORDERPRIORITY        text,
  LO_SHIPPRIORITY         int,
  LO_QUANTITY             int,
  LO_EXTENDEDPRICE        int,
  LO_ORDTOTALPRICE        int,
  LO_DISCOUNT             int,
  LO_REVENUE              int,
  LO_SUPPLYCOST           int,
  LO_TAX                  int,
  LO_COMMITDATE           Date,
  LO_SHIPMODE             text
)
using beam
with(compresstype=auto,compresslevel=1)
DISTRIBUTED BY(LO_ORDERKEY)
ORDER BY (LO_ORDERDATE, LO_ORDERKEY);

CREATE TABLE part
(
  P_PARTKEY       int,
  P_NAME          text,
  P_MFGR          text,
  P_CATEGORY      text,
  P_BRAND         text,
  P_COLOR         text,
  P_TYPE          text,
  P_SIZE          int,
  P_CONTAINER     text
)
using beam
with(compresstype=auto,compresslevel=1)
DISTRIBUTED BY(P_PARTKEY)
ORDER BY (P_PARTKEY);

CREATE TABLE supplier
(
  S_SUPPKEY       int,
  S_NAME          text,
  S_ADDRESS       text,
  S_CITY          text,
  S_NATION        text,
  S_REGION        text,
  S_PHONE         text
)
using beam
with(compresstype=auto,compresslevel=1)
DISTRIBUTED BY(S_SUPPKEY)
ORDER BY (S_SUPPKEY);

步骤四:创建OSS外表并导入基础数据

创建OSS外表用于导入OSS上的测试数据。更多信息,请参见使用OSS Foreign Table进行数据湖分析

CREATE SERVER oss_serv
    FOREIGN DATA WRAPPER oss_fdw
    OPTIONS (
        endpoint 'oss-xxx.aliyuncs.com',
        bucket 'xxxx'
  );
CREATE USER MAPPING FOR PUBLIC  SERVER oss_serv OPTIONS ( id 'xxxx', key 'xxx' );

CREATE FOREIGN TABLE ext_customer
(
  C_CUSTKEY       int,
  C_NAME          text,
  C_ADDRESS       text,
  C_CITY          text,
  C_NATION        text,
  C_REGION        text,
  C_PHONE         text,
  C_MKTSEGMENT    text
)
server oss_serv
    options (
        prefix 'ssb100g/customer_',
        format 'csv',
        DELIMITER '|'
    );

CREATE FOREIGN TABLE ext_lineorder
(
  LO_ORDERKEY             bigint,
  LO_LINENUMBER           int,
  LO_CUSTKEY              int,
  LO_PARTKEY              int,
  LO_SUPPKEY              int,
  LO_ORDERDATE            Date,
  LO_ORDERPRIORITY        text,
  LO_SHIPPRIORITY         int,
  LO_QUANTITY             int,
  LO_EXTENDEDPRICE        int,
  LO_ORDTOTALPRICE        int,
  LO_DISCOUNT             int,
  LO_REVENUE              int,
  LO_SUPPLYCOST           int,
  LO_TAX                  int,
  LO_COMMITDATE           Date,
  LO_SHIPMODE             text
)
server oss_serv
    options (
        prefix 'ssb100g/lineorder_',
        format 'csv',
        DELIMITER '|'
    );

CREATE FOREIGN TABLE ext_part
(
  P_PARTKEY       int,
  P_NAME          text,
  P_MFGR          text,
  P_CATEGORY      text,
  P_BRAND         text,
  P_COLOR         text,
  P_TYPE          text,
  P_SIZE          int,
  P_CONTAINER     text
)
server oss_serv
    options (
        prefix 'ssb100g/part_',
        format 'csv',
        DELIMITER '|'
    );

CREATE FOREIGN TABLE ext_supplier
(
  S_SUPPKEY       int,
  S_NAME          text,
  S_ADDRESS       text,
  S_CITY          text,
  S_NATION        text,
  S_REGION        text,
  S_PHONE         text
)
server oss_serv
    options (
        prefix 'ssb100g/supplier_',
        format 'csv',
        DELIMITER '|'
    );

--导入数据
insert into part select * from ext_part;
insert into supplier select * from ext_supplier;
insert into customer select * from ext_customer;
insert into lineorder select * from ext_lineorder;

--Drop external table 
drop external table ext_part;
drop external table ext_supplier;
drop external table ext_customer;
drop external table ext_lineorder;

步骤五:生成Flat宽表

4张基础表通过JOIN生成SSB Flat宽表lineorder_flat

  1. 创建lineorder_flat表并导入数据。

    CREATE TABLE lineorder_flat
    (
      LO_ORDERKEY             bigint not null,
      LO_LINENUMBER           int not null,
      LO_CUSTKEY              int not null,
      LO_PARTKEY              int not null,
      LO_SUPPKEY              int not null,
      LO_ORDERDATE            Date not null,
      LO_ORDERPRIORITY        text not null,
      LO_SHIPPRIORITY         int not null,
      LO_QUANTITY             int not null,
      LO_EXTENDEDPRICE        int not null,
      LO_ORDTOTALPRICE        int not null,
      LO_DISCOUNT             int not null,
      LO_REVENUE              int not null,
      LO_SUPPLYCOST           int not null,
      LO_TAX                  int not null,
      LO_COMMITDATE           Date not null,
      LO_SHIPMODE             text not null,
      C_NAME                  text not null,
      C_ADDRESS               text not null,
      C_CITY                  text not null,
      C_NATION                text not null,
      C_REGION                text not null,
      C_PHONE                 text not null,
      C_MKTSEGMENT            text not null,
      S_NAME                  text not null,
      S_ADDRESS               text not null,
      S_CITY                  text not null,
      S_NATION                text not null,
      S_REGION                text not null,
      S_PHONE                 text not null,
      P_NAME                  text not null,
      P_MFGR                  text not null,
      P_CATEGORY              text not null,
      P_BRAND                 text not null,
      P_COLOR                 text not null,
      P_TYPE                  text not null,
      P_SIZE                  int not null,
      P_CONTAINER             text not null
    )
    using beam
    with(compresstype=auto, compresslevel=1)
    DISTRIBUTED BY(LO_ORDERKEY)
    ORDER BY(LO_ORDERDATE, LO_ORDERKEY);
    
    
    INSERT INTO lineorder_flat
    SELECT
        l.LO_ORDERKEY AS LO_ORDERKEY,
        l.LO_LINENUMBER AS LO_LINENUMBER,
        l.LO_CUSTKEY AS LO_CUSTKEY,
        l.LO_PARTKEY AS LO_PARTKEY,
        l.LO_SUPPKEY AS LO_SUPPKEY,
        l.LO_ORDERDATE AS LO_ORDERDATE,
        l.LO_ORDERPRIORITY AS LO_ORDERPRIORITY,
        l.LO_SHIPPRIORITY AS LO_SHIPPRIORITY,
        l.LO_QUANTITY AS LO_QUANTITY,
        l.LO_EXTENDEDPRICE AS LO_EXTENDEDPRICE,
        l.LO_ORDTOTALPRICE AS LO_ORDTOTALPRICE,
        l.LO_DISCOUNT AS LO_DISCOUNT,
        l.LO_REVENUE AS LO_REVENUE,
        l.LO_SUPPLYCOST AS LO_SUPPLYCOST,
        l.LO_TAX AS LO_TAX,
        l.LO_COMMITDATE AS LO_COMMITDATE,
        l.LO_SHIPMODE AS LO_SHIPMODE,
        c.C_NAME AS C_NAME,
        c.C_ADDRESS AS C_ADDRESS,
        c.C_CITY AS C_CITY,
        c.C_NATION AS C_NATION,
        c.C_REGION AS C_REGION,
        c.C_PHONE AS C_PHONE,
        c.C_MKTSEGMENT AS C_MKTSEGMENT,
        s.S_NAME AS S_NAME,
        s.S_ADDRESS AS S_ADDRESS,
        s.S_CITY AS S_CITY,
        s.S_NATION AS S_NATION,
        s.S_REGION AS S_REGION,
        s.S_PHONE AS S_PHONE,
        p.P_NAME AS P_NAME,
        p.P_MFGR AS P_MFGR,
        p.P_CATEGORY AS P_CATEGORY,
        p.P_BRAND AS P_BRAND,
        p.P_COLOR AS P_COLOR,
        p.P_TYPE AS P_TYPE,
        p.P_SIZE AS P_SIZE,
        p.P_CONTAINER AS P_CONTAINER
    FROM lineorder AS l
    INNER JOIN customer AS c ON c.C_CUSTKEY = l.LO_CUSTKEY
    INNER JOIN supplier AS s ON s.S_SUPPKEY = l.LO_SUPPKEY
    INNER JOIN part AS p ON p.P_PARTKEY = l.LO_PARTKEY;
    
    ANALYZE lineorder_flat;
    FLUSH lineorder_flat;
  2. 收集统计信息并刷盘。

    ANALYZE lineorder_flat;
    FLUSH lineorder_flat;
  3. 等待后台任务完成。执行以下查询,当不返回结果时即可进行测试。

    select datname, query, state from pg_stat_activity 
      where state <> 'idle' and backend_type like '%autovacuum%';

100 GB数据集各表数据量参考如下:

表名

行数

customer

3,000,000

lineorder

600,037,902

part

1,400,000

supplier

200,000

lineorder_flat

600,037,902

步骤六:执行开箱配置

ALTER DATABASE your_db set laser.enable_vectorized_engine to on;
ALTER DATABASE your_db set statement_mem = '16GB';

步骤七:运行测试

SSB基准测试包含13条标准SQL查询(附录:SSB测试SQL),分为4组(Q1、Q2、Q3、Q4)。

  1. 测试目录结构如下:

    benchmark/
    ├── benchmark.sh
    └── query/
        ├── q1.sql
        ├── q2.sql
        ├── ...
        └── q13.sql
    • 将以下测试脚本保存为benchmark/benchmark.sh

      #!/bin/bash
      
      usage() {
        cat <<EOF
      Usage:
        $0 <database> <round> <host>
      
      Params:
        database  必填,数据库名
        round     必填,执行轮数(正整数)
        host      必填,数据库主机地址,如 127.0.0.1 或 hostname
      
      Examples:
        # 在本机执行 adbpg,执行 3 轮
        $0 tpcds_db 3 127.0.0.1
      
        # 在远程主机执行 1 轮
        $0 tpcds_db 1 10.0.0.5
      EOF
      }
      
      # 参数: $1=database, $2=round, $3=host
      DATABASE="$1"
      ROUND="${2:-1}"
      HOST="$3"
      
      # 参数检查
      if [ -z "$DATABASE" ] || [ -z "$ROUND" ] || [ -z "$HOST" ]; then
        usage
        exit 1
      fi
      
      echo "Database: $DATABASE"
      echo "Round: $ROUND"
      echo "Host: $HOST"
      
      HOME_PATH=$(pwd)
      QUERY_PATH="$HOME_PATH/query/"
      LOG_PATH="$HOME_PATH/log/"
      mkdir -p "$LOG_PATH"
      
      timestamp=$(date +%Y%m%d_%H%M%S)
      log_file="$HOME_PATH/tpcds_query_rt_${timestamp}.csv"
      echo "query,cost,total_cost" > "$log_file"
      
      total_cost=0
      for ((i=1; i<=99; i++)); do
          last_cost=0
          for ((r=1; r<=ROUND; r++)); do
              echo "begin run TPC-DS Q${i} round ${r}/$ROUND on adbpg (db: $DATABASE, host: $HOST), $(date)"
              begin_time=$(date +%s.%N)
      
              psql -h "$HOST" -d "$DATABASE" -f "$QUERY_PATH/q$i.sql" > "$LOG_PATH/log${i}_r${r}.out"
              rc=$?
      
              end_time=$(date +%s.%N)
              last_cost=$(awk "BEGIN {print $end_time - $begin_time}")
              if [ $rc -ne 0 ] ; then
                  printf "run TPC-DS Q%s round %s on adbpg fail, cost: %.2f, %s\n" "$i" "$r" "$last_cost" "$(date)"
              else
                  printf "run TPC-DS Q%s round %s on adbpg succ, cost: %.2f, %s\n" "$i" "$r" "$last_cost" "$(date)"
              fi
          done
          cost=$last_cost
          total_cost=$(awk "BEGIN {print $total_cost + $cost}")
          printf "TPC-DS Q%s final (round %d): cost: %.2f, totalCost: %.2f, %s\n" "$i" "$ROUND" "$cost" "$total_cost" "$(date)"
          printf "Q%s,%.2f,%.2f\n" "$i" "$cost" "$total_cost" >> "$log_file"
      done
    • 13条测试SQL(附录:SSB测试SQL)分别保存为q1.sql~q13.sql,存放在benchmark/query/目录下。

  2. 进入benchmark目录,执行以下命令运行测试。

    sh benchmark.sh <database> 2 <host>
    • <database>:数据库名称。

    • 2:执行轮数,取最后一轮的结果。

    • <host>:数据库主机地址。

附录:SSB测试SQL

以下为SSB 13条标准测试SQL。

  • Q1.1

    SELECT sum(lo_extendedprice * lo_discount) AS revenue
    FROM lineorder_flat
    WHERE lo_orderdate >= '1993-01-01' and lo_orderdate <= '1993-12-31' AND lo_discount BETWEEN 1 AND 3 AND lo_quantity < 25;
  • Q1.2

    SELECT sum(lo_extendedprice * lo_discount) AS revenue FROM lineorder_flat
    WHERE lo_orderdate >= '1994-01-01' and lo_orderdate <= '1994-01-31' AND lo_discount BETWEEN 4 AND 6 AND lo_quantity BETWEEN 26 AND 35;
  • Q1.3

    SELECT sum(lo_extendedprice * lo_discount) AS revenue
    FROM lineorder_flat
    WHERE extract(week from lo_orderdate) = 6 AND lo_orderdate >= '1994-01-01' and lo_orderdate <= '1994-12-31' AND lo_discount BETWEEN 5 AND 7 AND lo_quantity BETWEEN 26 AND 35;
  • Q2.1

    SELECT sum(lo_revenue), extract(year from lo_orderdate) AS year, p_brand FROM lineorder_flat
    WHERE p_category = 'MFGR#12' AND s_region = 'AMERICA'
    GROUP BY year, p_brand
    ORDER BY year, p_brand
    LIMIT 100;
  • Q2.2

    SELECT sum(lo_revenue), extract(year from lo_orderdate) AS year, p_brand FROM lineorder_flat
    WHERE p_brand >= 'MFGR#2221' AND p_brand <= 'MFGR#2228' AND s_region = 'ASIA'
    GROUP BY year, p_brand
    ORDER BY year, p_brand
    LIMIT 100;
  • Q2.3

    SELECT sum(lo_revenue), extract(year from lo_orderdate) AS year, p_brand FROM lineorder_flat
    WHERE p_brand = 'MFGR#2239' AND s_region = 'EUROPE'
    GROUP BY year, p_brand
    ORDER BY year, p_brand
    LIMIT 100;
  • Q3.1

    SELECT c_nation, s_nation, extract(year from lo_orderdate) AS year, sum(lo_revenue) AS revenue FROM lineorder_flat
    WHERE c_region = 'ASIA' AND s_region = 'ASIA' AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31'
    GROUP BY c_nation, s_nation, year
    ORDER BY year ASC, revenue DESC
    LIMIT 100;
  • Q3.2

    SELECT c_city, s_city, extract(year from lo_orderdate) AS year, sum(lo_revenue) AS revenue
    FROM lineorder_flat
    WHERE c_nation = 'UNITED STATES' AND s_nation = 'UNITED STATES' AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31'
    GROUP BY c_city, s_city, year
    ORDER BY year ASC, revenue DESC
    LIMIT 100;
  • Q3.3

    SELECT c_city, s_city, extract(year from lo_orderdate) AS year, sum(lo_revenue) AS revenue
    FROM lineorder_flat
    WHERE c_city in ( 'UNITED KI1' ,'UNITED KI5') AND s_city in ( 'UNITED KI1' ,'UNITED KI5') AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31'
    GROUP BY c_city, s_city, year
    ORDER BY year ASC, revenue DESC
    LIMIT 100;
  • Q3.4

    SELECT c_city, s_city, extract(year from lo_orderdate) AS year, sum(lo_revenue) AS revenue
    FROM lineorder_flat
    WHERE c_city in ('UNITED KI1', 'UNITED KI5') AND s_city in ( 'UNITED KI1', 'UNITED KI5') AND lo_orderdate >= '1997-12-01' AND lo_orderdate <= '1997-12-31'
    GROUP BY c_city, s_city, year
    ORDER BY year ASC, revenue DESC
    LIMIT 100;
  • Q4.1

    SELECT EXTRACT(YEAR FROM lo_orderdate) AS YEAR, c_nation, SUM(lo_revenue - lo_supplycost) AS profit
    FROM lineorder_flat
    WHERE c_region = 'AMERICA' AND s_region = 'AMERICA' AND p_mfgr IN ('MFGR#1', 'MFGR#2')
    GROUP BY YEAR, c_nation
    ORDER BY YEAR ASC,c_nation ASC LIMIT 100;
  • Q4.2

    SELECT extract(year from lo_orderdate) AS year, s_nation, p_category, sum(lo_revenue - lo_supplycost) AS profit
    FROM lineorder_flat
    WHERE c_region = 'AMERICA' AND s_region = 'AMERICA' AND lo_orderdate >= '1997-01-01' and lo_orderdate <= '1998-12-31' AND p_mfgr in ( 'MFGR#1' , 'MFGR#2')
    GROUP BY year, s_nation, p_category
    ORDER BY year ASC, s_nation ASC, p_category ASC
    LIMIT 100;
  • Q4.3

    SELECT extract(year from lo_orderdate) AS year, s_city, p_brand, sum(lo_revenue - lo_supplycost) AS profit
    FROM lineorder_flat
    WHERE s_nation = 'UNITED STATES' AND lo_orderdate >= '1997-01-01' and lo_orderdate <= '1998-12-31' AND p_category = 'MFGR#14'
    GROUP BY year, s_city, p_brand
    ORDER BY year ASC, s_city ASC, p_brand ASC
    LIMIT 100;