云数据库 SelectDB 版旨在提供卓越的性能和便捷的数据分析服务,在宽表聚合、多表关联以及高并发点查等场景下均具有优异的性能表现。本文将详细介绍SelectDB在TPC-H标准测试上的测试方法和测试结果。
概述
TPC-H是一个决策支持基准(Decision Support Benchmark),它由一套面向业务的特别查询和并发数据修改组成,使用的数据具有广泛的行业相关性。该基准测试通过一系列的查询操作来评估数据库系统在处理复杂查询和数据挖掘任务时的性能。TPC-H报告的性能指标称为TPC-H每小时复合查询性能指标(QphH@Size),该指标反映了系统处理查询能力的多个方面。这些方面包括执行查询时所选择的数据库大小,由单个流提交查询时的查询处理能力,以及由多个并发用户提交查询时的查询吞吐量等。
本文的TPC-H的实现基于TPC-H的基准测试,并不符合TPC-H基准测试的所有要求。本测试结果不能等同于完全遵守TPC-H测试规范所获得的测试结果,因此不能与完全遵守该测试规范获得的测试结果进行对比。
TPC-H 在内的标准测试集通常和实际业务场景差距较大,并且部分测试会针对测试集进行参数调优。所以标准测试集的测试结果仅能反映数据库在特定场景下的性能表现。建议使用实际业务数据进行进一步的测试。
准备工作
步骤一:准备目标实例
准备实例。
如果您已有目标实例,您需根据以下要求检查实例配置。
如果您还没有目标实例,您需创建实例。
本文中进行性能测试所用实例需满足以下要求。
内核版本大于等于4.0。
如果您已有目标实例,但版本小于4.0,您可升级实例达到版本要求。具体操作,请参见升级内核版本。
规格需高于或等于96核 384 GB。本文中的测试基于96核 384 GB的规格进行。
集群缓存空间需大于等于1200 GB。本文中的测试基于1200 GB的缓存空间进行。
修改
streaming_load_max_mb
参数至最大值。在测试过程中,工具通过Stream load方式将测试数据上传至SelectDB。由于本次测试数据量较大,已超过默认的Stream load最大导入限制10240MB,因此您需要将BE的参数streaming_load_max_mb调整至最大值10240000MB。如何修改参数,请参见参数配置。
创建待导入测试数据的目标库。
如果您已有目标库,跳过此操作。
连接实例,具体操作,请参见通过MySQL客户端连接云数据库SelectDB版实例。
建库。
本次测试的目标库为test_db。建表语句如下。
CREATE DATABASE test_db;
步骤二:准备测试服务器
下述安装依赖工具的脚本适用于操作系统为Linux的服务器。如果您的服务器操作系统非Linux,您需自行修改安装脚本,以使其适应您的操作系统。
注意事项
您的服务器需注意以下事项。
如果您准备在服务器使用Git下载TPC-H测试工具。您需开通公网。
新购ECS实例:需购买ECS时,公网 IP勾选分配公网 IPv4 地址。
已有ECS实例但未开通公网:ECS开通公网,请参见开通公网。
此次测试数据集生成的数据文件约1000GB,需确保服务器的内存足够大。
操作步骤
创建目标服务器。
如果您已有目标服务器,跳过此步骤。
如果您还没有目标服务器,您可自定义购买ECS实例,镜像选择Alibaba Cloud Linux。
安装测试中的依赖工具。
安装MySQL客户端。
yum install mysql
安装unzip。
yum install unzip
(可选)安装Git。
本次测试中,使用Git下载TPC-H工具。若已通过其他方式获取TPC-H工具并准备手动上传至服务器,可跳过此步骤。
yum install git
步骤三:确保网络互通
确保将安装TPC-H测试工具的目标服务器与SelectDB实例网络互通:
步骤四:了解本次测试数据集
本次测试TPC-H会生成1000GB的数据,并将其导入SelectDB,以进行SelectDB的性能测试。以下是关于测试数据集为1000GB数据表的相关说明。
TPC-H表名 | 行数 | 备注 |
TPC-H表名 | 行数 | 备注 |
REGION | 5 | 区域表 |
NATION | 25 | 国家表 |
SUPPLIER | 1000万 | 供应商表 |
PART | 2亿 | 零部件表 |
PARTSUPP | 8亿 | 零部件供应表 |
CUSTOMER | 1.5亿 | 客户表 |
ORDERS | 15亿 | 订单表 |
LINEITEM | 60亿 | 订单明细表 |
操作步骤
下述脚本适用于操作系统为Linux的服务器。如果您的服务器操作系统非Linux,您需自行修改安装脚本,以使其适应您的操作系统。
步骤一:登录目标服务器
如果您的服务是阿里云ECS,如何登录请参见连接ECS。
其他服务器如何登录,参见各自产品。
步骤二:下载安装TPC-H数据生成工具
下载工具。
本次测试使用Git下载工具,具体脚本如下。
git clone https://github.com/apache/doris.git && cd ./doris/tools/tpch-tools
您也可以通过tpch-tools链接下载工具后,手动上传至目标服务器。
编译工具。
执行以下脚本,编译工具。
sh bin/build-tpch-dbgen.sh
步骤三:生成TPC-H测试数据集
数据量越大,生成时间越长,具体取决于服务器的性能。
您需在测试工具安装目录下执行生成测试数据集的脚本。
语法如下。
sh bin/gen-tpch-data.sh -s <yourAimDataNum>
参数说明如下。
yourAimDataNum:
含义:需要使用TPC-H生成的数据大小。
单位:GB
此次测试为中规模测试,需要生成1000 GB(即 1 TB)的测试数据集。该步骤所需时间可能较长,建议您将该任务置于后台执行,语句如下。
nohup sh bin/gen-tpch-data.sh -s 1000 > gen-tpch-data.log 2>&1 &
执行结果会保存在测试工具安装目录下的gen-tpch-data.log文件中,可通过查看该文件获取执行过程是否正常。
测试数据结果集将保存在测试工具安装目录的bin目录下的tpch-data目录中,数据文件后缀为.tbl
。
步骤四:使用脚本为SelectDB创建用于测试的表
配置SelectDB实例信息。 准备
doris-cluster.conf
文件。在执行建表脚本前,需要在
doris-cluster.conf
文件中,配置SelectDB实例的相关信息。该文件位于测试工具安装目录tpch-tools/conf/
下。示例如下。Any of FE host export FE_HOST='selectdb-cn-****.selectdbfe.rds.aliyuncs.com' # http_port in fe.conf export FE_HTTP_PORT=8080 # query_port in fe.conf export FE_QUERY_PORT=9030 # Doris username export USER='admin' # Doris password export PASSWORD='****' # The database where TPC-H tables located export DB='test_db'
参数说明如下。
参数名称
参数描述
参数名称
参数描述
FE_HOST
SelectDB实例的访问地址。
您可以从SelectDB控制台的实例详情中的网络信息中获取实例的VPC地址或公网地址。
FE_HTTP_PORT
SelectDB实例的HTTP协议端口。
目前SelectDB该端口默认为8080。
具体您可以从SelectDB控制台的实例详情中的网络信息中获取实例的HTTP协议端口。
FE_QUERY_PORT
SelectDB实例的MySQL协议端口。
目前SelectDB该端口默认为9030.
具体您可以从SelectDB控制台的实例详情中的网络信息中获取实例的MySQL协议端口。
USER
SelectDB实例账号。
创建SelectDB实例后,系统会默认为您创建admin账号。
PASSWORD
SelectDB实例账号的密码。
如果您的USER配置的是admin账号,但您忘记了此账号的密码,您可以在控制台上重置实例admin密码。
DB
SelectDB实例中待导入数据的库名。
建表。
在测试工具安装目录下,执行以下脚本,创建用以测试的表。脚本执行完成后,SelectDB的目标库中,会生成上述步骤四:了解本次测试数据集中的表。
sh bin/create-tpch-tables.sh -s 1000
步骤五:导入数据至SelectDB
测试数据量越大,导入时间越长,具体取决于服务器的性能。
在测试工具安装目录下,执行以下脚本,完成TPC-H测试集所有数据的导入至SelectDB。
sh bin/load-tpch-data.sh
此次测试为中规模测试,需要将生成的1000 GB(即 1 TB)的测试数据集导入至SelectDB中。该步骤所需时间可能较长,建议您将该任务置于后台执行,语句如下。
nohup sh bin/load-tpch-data.sh > load-tpch-data.log 2>&1 &
执行结果保存到测试工具安装目录下的load-tpch-data.log文件中,可通过查看该文件获取执行过程是否正常。
步骤六:测试查询性能
批量测试查询SQL性能
测试数据量越大,批量测试时间越长,具体取决于服务器的性能。
您可以执行TPC-H测试SQL的相关脚本,批量执行测试集的SQL。
语法如下。
sh bin/run-tpch-queries.sh -s <yourAimDataNum>
参数说明如下。
yourAimDataNum:确保查询针对正确规模的数据集运行,与生成数据时的规模一致(即生成数据时若用
-s 1000
,运行查询时也需使用-s 1000
)。脚本执行完毕后,控制台窗口将显示测试集中每个SQL在SelectDB中的性能情况。
此次测试为中规模测试,需要测试查询的1000 GB(即 1 TB)的测试数据集。该步骤所需时间可能较长,建议您将该任务置于后台执行,语句如下。
nohup sh bin/run-tpch-queries.sh -s 1000 > run-tpch-queries.log 2>&1 &
批量测试的SQL详情请参见TPCH-Query-SQL。
目前SelectDB的查询优化器和统计信息功能仍有提升空间,所以我们在TPC-H中重写了一些查询以适应SelectDB的执行框架,但不影响结果的正确性。
查询性能结果将会保存到到测试工具安装目录下的run-tpch-queries.log文件中,可通过查看该文件获取查询过程是否正常的信息以及查询测试的结果。本文档针对1000GB数据的测试结果,请参见测试结果。
单条查询SQL性能测试
您也可以对SelectDB执行某个SQL的性能进行测试,具体操作如下:
连接SelectDB实例。具体操作,请参见通过DMS连接云数据库SelectDB版实例。
执行目标SQL。
您可以在TPC-H 测试查询语句中获取目标SQL,进行执行。
您也可以在本次测试时使用的SQL语句,选取某一条语句进行执行。
--Q1 select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate <= date '1998-12-01' - interval '90' day group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus; --Q2 select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from part, supplier, partsupp, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = 15 and p_type like '%BRASS' and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE' and ps_supplycost = ( select min(ps_supplycost) from partsupp, supplier, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE' ) order by s_acctbal desc, n_name, s_name, p_partkey limit 100; --Q3 select l_orderkey, sum(l_extendedprice * (1 - l_discount)) as revenue, o_orderdate, o_shippriority from customer, orders, lineitem where c_mktsegment = 'BUILDING' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < date '1995-03-15' and l_shipdate > date '1995-03-15' group by l_orderkey, o_orderdate, o_shippriority order by revenue desc, o_orderdate limit 10; --Q4 select o_orderpriority, count(*) as order_count from orders where o_orderdate >= date '1993-07-01' and o_orderdate < date '1993-07-01' + interval '3' month and exists ( select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) group by o_orderpriority order by o_orderpriority; --Q5 select n_name, sum(l_extendedprice * (1 - l_discount)) as revenue from customer, orders, lineitem, supplier, nation, region where c_custkey = o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'ASIA' and o_orderdate >= date '1994-01-01' and o_orderdate < date '1994-01-01' + interval '1' year group by n_name order by revenue desc; --Q6 select sum(l_extendedprice * l_discount) as revenue from lineitem where l_shipdate >= date '1994-01-01' and l_shipdate < date '1994-01-01' + interval '1' year and l_discount between .06 - 0.01 and .06 + 0.01 and l_quantity < 24; --Q7 select supp_nation, cust_nation, l_year, sum(volume) as revenue from ( select n1.n_name as supp_nation, n2.n_name as cust_nation, extract(year from l_shipdate) as l_year, l_extendedprice * (1 - l_discount) as volume from supplier, lineitem, orders, customer, nation n1, nation n2 where s_suppkey = l_suppkey and o_orderkey = l_orderkey and c_custkey = o_custkey and s_nationkey = n1.n_nationkey and c_nationkey = n2.n_nationkey and ( (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY') or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE') ) and l_shipdate between date '1995-01-01' and date '1996-12-31' ) as shipping group by supp_nation, cust_nation, l_year order by supp_nation, cust_nation, l_year; --Q8 select o_year, sum(case when nation = 'BRAZIL' then volume else 0 end) / sum(volume) as mkt_share from ( select extract(year from o_orderdate) as o_year, l_extendedprice * (1 - l_discount) as volume, n2.n_name as nation from part, supplier, lineitem, orders, customer, nation n1, nation n2, region where p_partkey = l_partkey and s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name = 'AMERICA' and s_nationkey = n2.n_nationkey and o_orderdate between date '1995-01-01' and date '1996-12-31' and p_type = 'ECONOMY ANODIZED STEEL' ) as all_nations group by o_year order by o_year; --Q9 select nation, o_year, sum(amount) as sum_profit from ( select n_name as nation, extract(year from o_orderdate) as o_year, l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount from part, supplier, lineitem, partsupp, orders, nation where s_suppkey = l_suppkey and ps_suppkey = l_suppkey and ps_partkey = l_partkey and p_partkey = l_partkey and o_orderkey = l_orderkey and s_nationkey = n_nationkey and p_name like '%green%' ) as profit group by nation, o_year order by nation, o_year desc; --Q10 select c_custkey, c_name, sum(l_extendedprice * (1 - l_discount)) as revenue, c_acctbal, n_name, c_address, c_phone, c_comment from customer, orders, lineitem, nation where c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate >= date '1993-10-01' and o_orderdate < date '1993-10-01' + interval '3' month and l_returnflag = 'R' and c_nationkey = n_nationkey group by c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment order by revenue desc limit 20; --Q11 select ps_partkey, sum(ps_supplycost * ps_availqty) as value from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'GERMANY' group by ps_partkey having sum(ps_supplycost * ps_availqty) > ( select sum(ps_supplycost * ps_availqty) * 0.000002 from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'GERMANY' ) order by value desc; --Q12 select l_shipmode, sum(case when o_orderpriority = '1-URGENT' or o_orderpriority = '2-HIGH' then 1 else 0 end) as high_line_count, sum(case when o_orderpriority <> '1-URGENT' and o_orderpriority <> '2-HIGH' then 1 else 0 end) as low_line_count from orders, lineitem where o_orderkey = l_orderkey and l_shipmode in ('MAIL', 'SHIP') and l_commitdate < l_receiptdate and l_shipdate < l_commitdate and l_receiptdate >= date '1994-01-01' and l_receiptdate < date '1994-01-01' + interval '1' year group by l_shipmode order by l_shipmode; --Q13 select c_count, count(*) as custdist from ( select c_custkey, count(o_orderkey) as c_count from customer left outer join orders on c_custkey = o_custkey and o_comment not like '%special%requests%' group by c_custkey ) as c_orders group by c_count order by custdist desc, c_count desc; --Q14 select 100.00 * sum(case when p_type like 'PROMO%' then l_extendedprice * (1 - l_discount) else 0 end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue from lineitem, part where l_partkey = p_partkey and l_shipdate >= date '1995-09-01' and l_shipdate < date '1995-09-01' + interval '1' month; --Q15 select s_suppkey, s_name, s_address, s_phone, total_revenue from supplier, revenue0 where s_suppkey = supplier_no and total_revenue = ( select max(total_revenue) from revenue0 ) order by s_suppkey; --Q16 select p_brand, p_type, p_size, count(distinct ps_suppkey) as supplier_cnt from partsupp, part where p_partkey = ps_partkey and p_brand <> 'Brand#45' and p_type not like 'MEDIUM POLISHED%' and p_size in (49, 14, 23, 45, 19, 3, 36, 9) and ps_suppkey not in ( select s_suppkey from supplier where s_comment like '%Customer%Complaints%' ) group by p_brand, p_type, p_size order by supplier_cnt desc, p_brand, p_type, p_size; --Q17 select sum(l_extendedprice) / 7.0 as avg_yearly from lineitem, part where p_partkey = l_partkey and p_brand = 'Brand#23' and p_container = 'MED BOX' and l_quantity < ( select 0.2 * avg(l_quantity) from lineitem where l_partkey = p_partkey ); --Q18 select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_orderkey in ( select l_orderkey from lineitem group by l_orderkey having sum(l_quantity) > 300 ) and c_custkey = o_custkey and o_orderkey = l_orderkey group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice order by o_totalprice desc, o_orderdate limit 100; --Q19 select sum(l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where ( p_partkey = l_partkey and p_brand = 'Brand#12' and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') and l_quantity >= 1 and l_quantity <= 1 + 10 and p_size between 1 and 5 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#23' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity >= 10 and l_quantity <= 10 + 10 and p_size between 1 and 10 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#34' and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') and l_quantity >= 20 and l_quantity <= 20 + 10 and p_size between 1 and 15 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ); --Q20 select s_name, s_address from supplier, nation where s_suppkey in ( select ps_suppkey from partsupp where ps_partkey in ( select p_partkey from part where p_name like 'forest%' ) and ps_availqty > ( select 0.5 * sum(l_quantity) from lineitem where l_partkey = ps_partkey and l_suppkey = ps_suppkey and l_shipdate >= date '1994-01-01' and l_shipdate < date '1994-01-01' + interval '1' year ) ) and s_nationkey = n_nationkey and n_name = 'CANADA' order by s_name; --Q21 select s_name, count(*) as numwait from supplier, lineitem l1, orders, nation where s_suppkey = l1.l_suppkey and o_orderkey = l1.l_orderkey and o_orderstatus = 'F' and l1.l_receiptdate > l1.l_commitdate and exists ( select * from lineitem l2 where l2.l_orderkey = l1.l_orderkey and l2.l_suppkey <> l1.l_suppkey ) and not exists ( select * from lineitem l3 where l3.l_orderkey = l1.l_orderkey and l3.l_suppkey <> l1.l_suppkey and l3.l_receiptdate > l3.l_commitdate ) and s_nationkey = n_nationkey and n_name = 'SAUDI ARABIA' group by s_name order by numwait desc, s_name limit 100; --Q22 select cntrycode, count(*) as numcust, sum(c_acctbal) as totacctbal from ( select substring(c_phone, 1, 2) as cntrycode, c_acctbal from customer where substring(c_phone, 1, 2) in ('13', '31', '23', '29', '30', '18', '17') and c_acctbal > ( select avg(c_acctbal) from customer where c_acctbal > 0.00 and substring(c_phone, 1, 2) in ('13', '31', '23', '29', '30', '18', '17') ) and not exists ( select * from orders where o_custkey = c_custkey ) ) as custsale group by cntrycode order by cntrycode;
测试结果
以下为在规格为96核384GB、集群缓存空间为1200GB的SelectDB实例上进行的TPC-H 1000GB查询性能测试结果。
Query | TPCH 1000GB(s) |
Query | TPCH 1000GB(s) |
Q1 | 15.24 |
Q2 | 0.27 |
Q3 | 5.13 |
Q4 | 1.99 |
Q5 | 8.65 |
Q6 | 0.23 |
Q7 | 2.05 |
Q8 | 4.93 |
Q9 | 19.52 |
Q10 | 5.3 |
Q11 | 0.84 |
Q12 | 0.67 |
Q13 | 10.39 |
Q14 | 0.72 |
Q15 | 1.78 |
Q16 | 1.37 |
Q17 | 1.03 |
Q18 | 18.23 |
Q19 | 2.43 |
Q20 | 0.62 |
Q21 | 6.68 |
Q22 | 4.2 |
合计 | 112.27 |
- 本页导读 (1)
- 概述
- 准备工作
- 步骤一:准备目标实例
- 步骤二:准备测试服务器
- 步骤三:确保网络互通
- 步骤四:了解本次测试数据集
- 操作步骤
- 步骤一:登录目标服务器
- 步骤二:下载安装TPC-H数据生成工具
- 步骤三:生成TPC-H测试数据集
- 步骤四:使用脚本为SelectDB创建用于测试的表
- 步骤五:导入数据至SelectDB
- 步骤六:测试查询性能
- 测试结果