言わずと知れたTPC-CがOLTPの性能指標となるのであれば、TPC-HはDSSの指標となるもの。
10年ほど前の新入社員のころにMySQL4.1で無理矢理TPC-Hのクエリを動かしたものの、結果が返ってこない(時間がかかりすぎる)ことがあったのですが、最近のバージョンではどうかと思い、久しぶりに試してみました。
IPAの報告書など、TPC-Hの実装であるDBT3を実施したレポートはネット上でもあったので参考にしながら自分で作ろうかと思ったのですが、HammerDBなるものがあることを知って、横着させていただきました。
HammerDBは数種類のDBMSに対して、TPC-CやTPC-Hに基づいた試験を行えるベンチマークツールのようです。
http://hammerora.sourceforge.net/
DBMSはMySQLやPostgreSQLだけでなく、OracleやSQL Serverにも対応してるなど、簡単に比較するのにも便利そうです。
Times Tenとかredisとかでも使えるってのが珍しい。
web向けのテストや、複数クライアントからの負荷がけなんかもできそうです。
マニュアルを見ると、MySQLは5.6向けではある模様
HammerDB version 2.16 has been built and tested against a MySQL 5.6 client installation. On Linux this
means that HammerDB will require a MySQL client library called libmysqlclient.so.18. T
では早速使ってみました。
ダウンロード後に実行するだけ。
$ chmod +x HammerDB-2.16-Linux-x86-64-Install HammerDB-2.16-Linux-x86-64-Install
$ ./HammerDB-2.16-Linux-x86-64-Install
This will install HammerDB on your computer. Continue? [n/Y] y
Where do you want to install HammerDB? [/home/XXXXX/HammerDB-2.16]
Installing HammerDB…
Installing Program Files…
Installation complete.
とSSHでつないでターミナルから実行したものの、tclベースのGUI利用前提だったもよう。
なので、gnomeのターミナルから同じことを実行しましたところ、GUIが起動されました。
さて、テストを行う前に、データロードする必要がありますので、まずはそこから。
Options > Benchmark
からMySQLのTPC-Hを選択。
次にTPC-Hのデータロード用に接続設定を行う。
Options > TPC-H Schema > Build and Driver を選択
各種情報を設定します。
rootのPassowrdは設定していなかったので、空欄にしてたら動きませんでした・・のでパスワードは必須。
試してはいないですが、Databaseは手動で作っておかないと動かないかも。
あと、Storage EngineはデフォルトだとMyISAMになっていたので、InnoDBに変更してます。
Scale Factorはお試しで1に設定してみました。
設定が終わったら、Create TPC Schemaと表示が出る、何かよくわからないアイコンをクリックすると
設定内容の再確認が出て、Yesをクリックするとロードが始まります。
CREATE TABLEから、インデックスの作成、データロードまで全自動でなかなか楽です。
ハードスペックにもよると思いますが、1.5時間程度はかかりました。
buffer_poolくらいしか設定変えてなかったので、チューニングすればもう少し早くなると思います。
ロードが完了すればいよいよ、クエリの実行です。
Options > TPC-H Schema > Load Driver Script を選択し、テスト用のスクリプトをロード?します。
次にトランザクション数を測定するためのカウンターなるものの設定です。
Options > TPC-H Schema > Transaction Counter を選択
ここも同様に環境に合わせてパスワードを設定します。
Refresh Rateを短くすると、たいしたものではないと思いますが、若干DBへのアクセス頻度が増えてしまうので、必要以上に短くしなくても良いかと。
最後に、クエリを実行するユーザの設定です。
また何やらよくわかりませんが、赤いアイコンをクリックすると、下のペインに、Virtual Userが登録されます。
複数のユーザからクエリを実行する場合は、ここに複数ユーザを登録するようです。
そして、実行ボタンらしいアイコンをクリックすることでクエリが流れ始めます。
このときに、また何か良くわからない右の方のアイコンをクリックすると、クエリの実行数が確認できます。
qphは「query per hour」の略で、その名の通り1時間換算のクエリ実行数です。
(追記)
********************************************************************
どうやら単純に1時間のクエリ数でも無いようで、こちらにサイトに紹介がありました。
[TPC-H] QphHはどうやって計算するのか知ってますか?
********************************************************************
一応processlistを確認
動いてるっぽいです。
以上でTPC-Hを元にクエリを実行することができました。
ただ、個人的にはqphよりも各クエリの実行時間が知りたいので、general_logを出してみました。
/usr/sbin/mysqld, Version: 5.6.16-log (MySQL Community Server (GPL)). started with:
Tcp port: 3306 Unix socket: /data/mysql/mysql.sock
Time Id Command Argument
# Time: 140302 7:40:56
# User@Host: root[root] @ localhost [127.0.0.1] Id: 70
# Query_time: 1.932741 Lock_time: 0.000281 Rows_sent: 100 Rows_examined: 648306
use tpch;
SET timestamp=1393713656;
select s_name, count(*) as numwait from SUPPLIER, LINEITEM l1, ORDERS, NATION where s_suppkey = l1.l_suppkey and o_orderkey = l1.l_orderkey and o_orderstatus = ‘F’ and l1.l_receiptdate > l1.l_commitdate and exists ( select * from LINEITEM l2 where l2.l_orderkey = l1.l_orderkey and l2.l_suppkey <> l1.l_suppkey) and not exists ( select * from LINEITEM l3 where l3.l_orderkey = l1.l_orderkey and l3.l_suppkey <> l1.l_suppkey and l3.l_receiptdate > l3.l_commitdate) and s_nationkey = n_nationkey and n_name = ‘CHINA’ group by s_name order by numwait desc, s_name limit 100;
# Time: 140302 7:40:58
# User@Host: root[root] @ localhost [127.0.0.1] Id: 70
# Query_time: 2.233532 Lock_time: 0.000180 Rows_sent: 10 Rows_examined: 2830403
SET timestamp=1393713658;
select l_orderkey, sum(l_extendedprice * (1 – l_discount)) as revenue, o_orderdate, o_shippriority from CUSTOMER, ORDERS, LINEITEM where c_mktsegment = ‘HOUSEHOLD’ and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < date ‘1995-03-16’ and l_shipdate > date ‘1995-03-16’ group by l_orderkey, o_orderdate, o_shippriority order by revenue desc, o_orderdate limit 10;
# Time: 140302 7:41:03
# User@Host: root[root] @ localhost [127.0.0.1] Id: 70
# Query_time: 4.689517 Lock_time: 0.000191 Rows_sent: 8 Rows_examined: 7651587
SET timestamp=1393713663;
select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from CUSTOMER, ORDERS, LINEITEM where o_orderkey in ( select l_orderkey from LINEITEM group by l_orderkey having sum(l_quantity) > 312) and c_custkey = o_custkey and o_orderkey = l_orderkey group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice order by o_totalprice desc, o_orderdate limit 100;
# Time: 140302 7:41:04
# User@Host: root[root] @ localhost [127.0.0.1] Id: 70
# Query_time: 0.817588 Lock_time: 0.000205 Rows_sent: 4 Rows_examined: 560132
SET timestamp=1393713664;
select n_name, sum(l_extendedprice * (1 – l_discount)) as revenue from CUSTOMER, ORDERS, LINEITEM, SUPPLIER, NATION, REGION where c_custkey = o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = ‘MIDDLE EAST’ and o_orderdate >= date ‘1996-01-01’ and o_orderdate < date ‘1996-01-01’ + interval ‘1’ year group by n_name order by revenue desc;
# User@Host: root[root] @ localhost [127.0.0.1] Id: 70
# Query_time: 0.107374 Lock_time: 0.000203 Rows_sent: 1038 Rows_examined: 94397
SET timestamp=1393713664;
select ps_partkey, sum(ps_supplycost * ps_availqty) as value from PARTSUPP, SUPPLIER, NATION where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = ‘ETHIOPIA’ group by ps_partkey having sum(ps_supplycost * ps_availqty) > ( select sum(ps_supplycost * ps_availqty) * 0.0001000000 from PARTSUPP, SUPPLIER, NATION where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = ‘ETHIOPIA’) order by value desc;
# Time: 140302 7:41:06
# User@Host: root[root] @ localhost [127.0.0.1] Id: 70
# Query_time: 1.506114 Lock_time: 0.000250 Rows_sent: 4 Rows_examined: 909683
SET timestamp=1393713666;
select supp_nation, cust_nation, l_year, sum(volume) as revenue from ( select n1.n_name as supp_nation, n2.n_name as cust_nation, extract(year from l_shipdate) as l_year, l_extendedprice * (1 – l_discount) as volume from SUPPLIER, LINEITEM, ORDERS, CUSTOMER, NATION n1, NATION n2 where s_suppkey = l_suppkey and o_orderkey = l_orderkey and c_custkey = o_custkey and s_nationkey = n1.n_nationkey and c_nationkey = n2.n_nationkey and ( (n1.n_name = ‘ETHIOPIA’ and n2.n_name = ‘CHINA’) or (n1.n_name = ‘CHINA’ and n2.n_name = ‘ETHIOPIA’)) and l_shipdate between date ‘1995-01-01’ and date ‘1996-12-31’) shipping group by supp_nation, cust_nation, l_year order by supp_nation, cust_nation, l_year;
# Time: 140302 7:41:08
# User@Host: root[root] @ localhost [127.0.0.1] Id: 70
# Query_time: 2.492662 Lock_time: 0.000152 Rows_sent: 1 Rows_examined: 6001515
SET timestamp=1393713668;
select sum(l_extendedprice * l_discount) as revenue from LINEITEM where l_shipdate >= date ‘1995-01-01’ and l_shipdate < date ‘1995-01-01’ + interval ‘1’ year and l_discount between 0.06 – 0.01 and 0.06 + 0.01 and l_quantity < 24;
# Time: 140302 7:41:09
# User@Host: root[root] @ localhost [127.0.0.1] Id: 70
# Query_time: 0.936907 Lock_time: 0.000212 Rows_sent: 82 Rows_examined: 257367
SET timestamp=1393713669;
select s_name, s_address from SUPPLIER, NATION where s_suppkey in ( select ps_suppkey from PARTSUPP where ps_partkey in ( select p_partkey from PART where p_name like ‘dark%’) and ps_availqty > ( select 0.5 * sum(l_quantity) from LINEITEM where l_partkey = ps_partkey and l_suppkey = ps_suppkey and l_shipdate >= date ‘1994-01-01’ and l_shipdate < date ‘1994-01-01’ + interval ‘1’ year)) and s_nationkey = n_nationkey and n_name = ‘EGYPT’ order by s_name;
# User@Host: root[root] @ localhost [127.0.0.1] Id: 70
# Query_time: 0.395880 Lock_time: 0.000173 Rows_sent: 1 Rows_examined: 389326
SET timestamp=1393713669;
select sum(l_extendedprice) / 7.0 as avg_yearly from LINEITEM, PART where p_partkey = l_partkey and p_brand = ‘Brand#35’ and p_container = ‘JUMBO BOX’ and l_quantity < ( select 0.2 * avg(l_quantity) from LINEITEM where l_partkey = p_partkey);
# Time: 140302 7:41:12
# User@Host: root[root] @ localhost [127.0.0.1] Id: 70
# Query_time: 2.623498 Lock_time: 0.000199 Rows_sent: 2 Rows_examined: 6032625
SET timestamp=1393713672;
select l_shipmode, sum(case when o_orderpriority = ‘1-URGENT’ or o_orderpriority = ‘2-HIGH’ then 1 else 0 end) as high_line_count, sum(case when o_orderpriority <> ‘1-URGENT’ and o_orderpriority <> ‘2-HIGH’ then 1 else 0 end) as low_line_count from ORDERS, LINEITEM where o_orderkey = l_orderkey and l_shipmode in (‘FOB’, ‘TRUCK’) and l_commitdate < l_receiptdate and l_shipdate < l_commitdate and l_receiptdate >= date ‘1994-01-01’ and l_receiptdate < date ‘1994-01-01’ + interval ‘1’ year group by l_shipmode order by l_shipmode;
# User@Host: root[root] @ localhost [127.0.0.1] Id: 70
# Query_time: 0.345243 Lock_time: 0.000189 Rows_sent: 18261 Rows_examined: 395717
SET timestamp=1393713672;
select p_brand, p_type, p_size, count(distinct ps_suppkey) as supplier_cnt from PARTSUPP, PART where p_partkey = ps_partkey and p_brand <> ‘Brand#31’ and p_type not like ‘STANDARD BRUSHED%’ and p_size in (27, 8, 32, 37, 13, 46, 31, 5) and ps_suppkey not in ( select s_suppkey from SUPPLIER where s_comment like ‘%Customer%Complaints%’) group by p_brand, p_type, p_size order by supplier_cnt desc, p_brand, p_type, p_size;
# Time: 140302 7:41:14
# User@Host: root[root] @ localhost [127.0.0.1] Id: 70
# Query_time: 1.955579 Lock_time: 0.000463 Rows_sent: 1 Rows_examined: 510249
SET timestamp=1393713674;
select s_suppkey, s_name, s_address, s_phone, total_revenue from SUPPLIER, revenue0 where s_suppkey = supplier_no and total_revenue = ( select max(total_revenue) from revenue0) order by s_suppkey;
# Time: 140302 7:41:17
# User@Host: root[root] @ localhost [127.0.0.1] Id: 70
# Query_time: 2.497492 Lock_time: 0.000180 Rows_sent: 32 Rows_examined: 1850065
SET timestamp=1393713677;
select c_count, count(*) as custdist from ( select c_custkey as c_custkey, count(o_orderkey) as c_count from CUSTOMER left outer join ORDERS on c_custkey = o_custkey and o_comment not like ‘%unusual%accounts%’ group by c_custkey) as c_orders group by c_count order by custdist desc, c_count desc;
# Time: 140302 7:41:18
# User@Host: root[root] @ localhost [127.0.0.1] Id: 70
# Query_time: 0.956482 Lock_time: 0.000206 Rows_sent: 20 Rows_examined: 406925
SET timestamp=1393713678;
select c_custkey, c_name, sum(l_extendedprice * (1 – l_discount)) as revenue, c_acctbal, n_name, c_address, c_phone, c_comment from CUSTOMER, ORDERS, LINEITEM, NATION where c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate >= date ‘1993-02-01’ and o_orderdate < date ‘1993-02-01’ + interval ‘3’ month and l_returnflag = ‘R’ and c_nationkey = n_nationkey group by c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment order by revenue desc limit 20;
# User@Host: root[root] @ localhost [127.0.0.1] Id: 70
# Query_time: 0.411825 Lock_time: 0.000232 Rows_sent: 100 Rows_examined: 336528
SET timestamp=1393713678;
select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from PART, SUPPLIER, PARTSUPP, NATION, REGION where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = 17 and p_type like ‘%STEEL’ and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = ‘EUROPE’ and ps_supplycost = ( select min(ps_supplycost) from PARTSUPP, SUPPLIER, NATION, REGION where p_partkey = ps_partkey and s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = ‘EUROPE’) order by s_acctbal desc, n_name, s_name, p_partkey limit 100;
# Time: 140302 7:41:20
# User@Host: root[root] @ localhost [127.0.0.1] Id: 70
# Query_time: 1.675488 Lock_time: 0.000264 Rows_sent: 2 Rows_examined: 1786223
SET timestamp=1393713680;
select o_year, sum(case when NATION = ‘CHINA’ then volume else 0 end) / sum(volume) as mkt_share from ( select extract(year from o_orderdate) as o_year, l_extendedprice * (1 – l_discount) as volume, n2.n_name as NATION from PART, SUPPLIER, LINEITEM, ORDERS, CUSTOMER, NATION n1, NATION n2, REGION where p_partkey = l_partkey and s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name = ‘ASIA’ and s_nationkey = n2.n_nationkey and o_orderdate between date ‘1995-01-01’ and date ‘1996-12-31’ and p_type = ‘PROMO ANODIZED COPPER’) all_nations group by o_year order by o_year;
# User@Host: root[root] @ localhost [127.0.0.1] Id: 70
# Query_time: 0.279161 Lock_time: 0.000174 Rows_sent: 1 Rows_examined: 149926
SET timestamp=1393713680;
select 100.00 * sum(case when p_type like ‘PROMO%’ then l_extendedprice * (1 – l_discount) else 0 end) / sum(l_extendedprice * (1 – l_discount)) as promo_revenue from LINEITEM, PART where l_partkey = p_partkey and l_shipdate >= date ‘1996-04-01’ and l_shipdate < date ‘1996-04-01’ + interval ‘1’ month;
# User@Host: root[root] @ localhost [127.0.0.1] Id: 70
# Query_time: 0.129972 Lock_time: 0.000217 Rows_sent: 1 Rows_examined: 215907
SET timestamp=1393713680;
select sum(l_extendedprice* (1 – l_discount)) as revenue from LINEITEM, PART where ( p_partkey = l_partkey and p_brand = ‘Brand#31’ and p_container in (‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’) and l_quantity >= 10 and l_quantity <= 10 + 10 and p_size between 1 and 5 and l_shipmode in (‘AIR’, ‘AIR REG’) and l_shipinstruct = ‘DELIVER IN PERSON’) or ( p_partkey = l_partkey and p_brand = ‘Brand#42’ and p_container in (‘MED BAG’, ‘MED BOX’, ‘MED PKG’, ‘MED PACK’) and l_quantity >= 19 and l_quantity <= 19 + 10 and p_size between 1 and 10 and l_shipmode in (‘AIR’, ‘AIR REG’) and l_shipinstruct = ‘DELIVER IN PERSON’) or ( p_partkey = l_partkey and p_brand = ‘Brand#25’ and p_container in (‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’) and l_quantity >= 29 and l_quantity <= 29 + 10 and p_size between 1 and 15 and l_shipmode in (‘AIR’, ‘AIR REG’) and l_shipinstruct = ‘DELIVER IN PERSON’);
# Time: 140302 7:41:24
# User@Host: root[root] @ localhost [127.0.0.1] Id: 70
# Query_time: 3.396313 Lock_time: 0.000234 Rows_sent: 175 Rows_examined: 2575076
SET timestamp=1393713684;
select nation, o_year, sum(amount) as sum_profit from ( select n_name as nation, extract(year from o_orderdate) as o_year, l_extendedprice * (1 – l_discount) – ps_supplycost * l_quantity as amount from PART, SUPPLIER, LINEITEM, PARTSUPP, ORDERS, NATION where s_suppkey = l_suppkey and ps_suppkey = l_suppkey and ps_partkey = l_partkey and p_partkey = l_partkey and o_orderkey = l_orderkey and s_nationkey = n_nationkey and p_name like ‘%cornflower%’) profit group by nation, o_year order by nation, o_year desc;
# User@Host: root[root] @ localhost [127.0.0.1] Id: 70
# Query_time: 0.145015 Lock_time: 0.000234 Rows_sent: 7 Rows_examined: 305314
SET timestamp=1393713684;
select cntrycode, count(*) as numcust, sum(c_acctbal) as totacctbal from ( select substr(c_phone, 1, 2) as cntrycode, c_acctbal from CUSTOMER where substr(c_phone, 1, 2) in (’32’, ’34’, ’11’, ’12’, ’13’, ’33’, ’28’) and c_acctbal > ( select avg(c_acctbal) from CUSTOMER where c_acctbal > 0.00 and substr(c_phone, 1, 2) in (’32’, ’34’, ’11’, ’12’, ’13’, ’33’, ’28’)) and not exists ( select * from ORDERS where o_custkey = c_custkey)) custsale group by cntrycode order by cntrycode;
# Time: 140302 7:41:41
# User@Host: root[root] @ localhost [127.0.0.1] Id: 70
# Query_time: 16.831948 Lock_time: 0.000181 Rows_sent: 4 Rows_examined: 6001523
SET timestamp=1393713701;
select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 – l_discount)) as sum_disc_price, sum(l_extendedprice * (1 – l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from LINEITEM where l_shipdate <= date ‘1998-12-01’ – interval ‘109’ day group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus;
# User@Host: root[root] @ localhost [127.0.0.1] Id: 70
# Query_time: 0.454927 Lock_time: 0.000172 Rows_sent: 5 Rows_examined: 140720
SET timestamp=1393713701;
select o_orderpriority, count(*) as order_count from ORDERS where o_orderdate >= date ‘1993-03-01’ and o_orderdate < date ‘1993-03-01’ + interval ‘3’ month and exists ( select * from LINEITEM where l_orderkey = o_orderkey and l_commitdate < l_receiptdate) group by o_orderpriority order by o_orderpriority;
MySQLで確認する場合は、TPC-Hであればgeneral_logを有効にした方がわかりやすいと思います。general_logの書き込みはクエリ実行に比較すると非常に短いので、誤差レベルでしょう。。
データ量が少ないので、どのクエリも無難な時間で完了してますが、Scale Factorを大きくしたときにどうなるかは別途確認が必要です。
これで一通り、TPC-Hのクエリが確認できました。
TPC-Cは確認していませんが、同じ手順で大丈夫かと思います。
思ったよりもHammerDBが、便利そうなのでもう少し使ってみようかと。