1. 前言
中启乘数科技是一家专业的PostgreSQL和Greenplum数据库服务提供商,专注于数据库极致的性能,当前PostgreSQL 14beta1已经发布,其中一个激动人心的特性就是当数据库有海量连接时,其它连接的性能基本不下降。下面我们通过实际的测试来说明。
2. 测试环境和方法
本着不详细介绍测试方法和测试环境的测试都是耍流氓的精神,我们把测试环境和方法详细介绍如下:
我们准备在一台物理机器上准备两套环境,一套是PostgreSQL 13.3,另一套是PostgreSQL 14beta1。
物理机器的硬件为2路服务器,CPU为:Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz,内存为256GB。
我们使用initdb命令初始化数据库,然后修改最大连接数:
max_connections = 11000
其它参数都不修改,默认shared_buffer为128MB。
然后用pgbench造2千万行数据:
pgbench -i -s 200
PostgreSQL 14与PostgreSQL 13在造数据上花的时间差不多,稍微快一点:
PostgreSQL 14:
[pg14@pg01 ~]$ time pgbench -i -s 200dropping old tables...NOTICE: table "pgbench_accounts" does not exist, skipping...20000000 of 20000000 tuples (100%) done (elapsed 15.45 s, remaining 0.00 s)vacuuming...creating primary keys...done in 23.67 s (drop tables 0.00 s, create tables 0.06 s, client-side generate 15.54 s, vacuum 2.70 s, primary keys 5.37 s).real 0m23.674suser 0m5.027ssys 0m0.138s
PostgreSQL 13:
[pg13@pg01 ~]$ time pgbench -i -s 200dropping old tables...NOTICE: table "pgbench_accounts" does not exist, skipping.........20000000 of 20000000 tuples (100%) done (elapsed 17.23 s, remaining 0.00 s)vacuuming...creating primary keys...done in 26.41 s (drop tables 0.00 s, create tables 0.06 s, client-side generate 17.32 s, vacuum 2.69 s, primary keys 6.34 s).real 0m26.409suser 0m5.022ssys 0m0.117s
测试方法的思路是我们给数据库建9000个执行pg_sleep(2000)的空闲连接,然后在用pgbench做只读压力测试。
建9000个pg_sleep(2000)连接的方法为:
我们建一个sleep.sql的文件,内容如下:
select pg_sleep(2000);
然后执行下面的命令
pgbench -c 9000 -f sleep.sql
而我们压力测试的命令为:
pgbench -S -c 64 -j 96 -M prepared -T30 -P 2
3. 实际测试
3.1 开启9000个空闲连接的测试
测试PostgreSQL 13.3
先开9000个执行pg_sleep()的连接:
[pg13@pg01 ~]$ pgbench -c 9000 -f sleep.sqlstarting vacuum...end.
然后压力测试:
[pg13@pg01 ~]$ pgbench -S -c 128 -j 16 -M prepared -T30 -P 2starting vacuum...end.progress: 2.0 s, 259526.5 tps, lat 0.446 ms stddev 0.185progress: 4.0 s, 268625.8 tps, lat 0.437 ms stddev 0.131......progress: 30.0 s, 272168.1 tps, lat 0.462 ms stddev 0.135transaction type: <builtin: select only>scaling factor: 200query mode: preparednumber of clients: 128number of threads: 16duration: 30 snumber of transactions actually processed: 8120358latency average = 0.454 mslatency stddev = 0.151 mstps = 270446.442022 (including connections establishing)tps = 271185.174447 (excluding connections establishing)
测试PostgreSQL 14beta1
先开9000个执行pg_sleep()的连接:
[pg14@pg01 ~]$ pgbench -c 9000 -f sleep.sqlstarting vacuum...end.
然后压力测试:
[pg14@pg01 ~]$ pgbench -S -c 128 -j 16 -M prepared -T30 -P 2starting vacuum...end.progress: 2.0 s, 324460.7 tps, lat 0.335 ms stddev 0.475progress: 4.0 s, 384877.4 tps, lat 0.316 ms stddev 0.454progress: 6.0 s, 385626.7 tps, lat 0.316 ms stddev 0.467......progress: 28.0 s, 557425.8 tps, lat 0.213 ms stddev 0.152progress: 30.0 s, 557706.0 tps, lat 0.212 ms stddev 0.145pgbench (PostgreSQL) 14.0transaction type: <builtin: select only>scaling factor: 200query mode: preparednumber of clients: 128number of threads: 16duration: 30 snumber of transactions actually processed: 15378932latency average = 0.231 mslatency stddev = 0.256 msinitial connection time = 216.714 mstps = 515524.706846 (without initial connection time)
可以看到在PostgreSQL 13中只能到达27万,而在PostgreSQL 14中就可以达到51万了,有近翻倍的性能提升。
极端测试: 20000万个空闲连接
PostgreSQL 13.3
[pg13@pg01 ~]$ pgbench -S -c 128 -j 16 -M prepared -T30 -P 2starting vacuum...end.progress: 30.0 s, 146343.3 tps, lat 0.447 ms stddev 0.176transaction type: <builtin: select only>scaling factor: 200query mode: preparednumber of clients: 128number of threads: 16duration: 30 snumber of transactions actually processed: 4392927latency average = 0.447 mslatency stddev = 0.176 mstps = 146325.374803 (including connections establishing)tps = 155519.134507 (excluding connections establishing)
PostgreSQL 14beta1
[pg14@pg01 ~]$ pgbench -S -c 128 -j 16 -M prepared -T30 -P 2starting vacuum...end.progress: 2.0 s, 390343.4 tps, lat 0.234 ms stddev 0.196progress: 4.0 s, 542178.6 tps, lat 0.219 ms stddev 0.152progress: 6.0 s, 550231.9 tps, lat 0.216 ms stddev 0.145progress: 8.0 s, 550509.8 tps, lat 0.215 ms stddev 0.153progress: 10.0 s, 554279.3 tps, lat 0.214 ms stddev 0.144progress: 12.0 s, 552410.2 tps, lat 0.215 ms stddev 0.148progress: 14.0 s, 554254.9 tps, lat 0.214 ms stddev 0.147progress: 16.0 s, 554077.1 tps, lat 0.214 ms stddev 0.149progress: 18.0 s, 552176.4 tps, lat 0.215 ms stddev 0.148progress: 20.0 s, 554539.0 tps, lat 0.214 ms stddev 0.144progress: 22.0 s, 550789.9 tps, lat 0.215 ms stddev 0.154progress: 24.0 s, 554287.8 tps, lat 0.214 ms stddev 0.144progress: 26.0 s, 552759.7 tps, lat 0.215 ms stddev 0.149progress: 28.0 s, 554362.7 tps, lat 0.214 ms stddev 0.148pgbench (PostgreSQL) 14.0transaction type: <builtin: select only>scaling factor: 200query mode: preparednumber of clients: 128number of threads: 16duration: 30 snumber of transactions actually processed: 16240811latency average = 0.216 mslatency stddev = 0.160 msinitial connection time = 460.811 mstps = 548627.931145 (without initial connection time)
可以看到PostgreSQL 14还是可以达到50多万的TPS,而PostgreSQL 13.3就只能是14万了,差距巨大。
3.3 进一步的测试
我们不建空闲连接,直接不断的改变连接数,看看解结果是怎样的:
pgbench -S -c <连接数> -j 16 -M prepared -T30 -P 2
压测时只改变上面的-c的连接数。测试的过程这里就不写出来,直接给出结果:
| 连接数 | PostgreSQL 13.3 | PostgreSQL 14beta1 |
|---|---|---|
| 64 | 496841 | 513140 |
| 128 | 528556 | 552778 |
| 256 | 475030 | 497435 |
| 1000 | 347863 | 402369 |
| 5000 | 205857 | 310090 |
| 10000 | 124371 | 248701 |
4. 结论
PostgtreSQL 14可以支持大量的空闲连接,这些空闲连接对执行的SQL的性能影响很小,但在PostgreSQL 13,会有很大的影响。所以从PostgreSQL 14之后,数据库可以支持上万个空闲连接,以前为了支持很多连接需要上连接池,现在可以不用了。
5. 原理简介
PostgreSQL 14主要是优化了GetSnapshotData()的性能,原先是随连接数增加,这个函数的性能是线性下降,从PostgreSQL 14之后就不是了。这个优化总共提交了7个PATCH,如果想进一步了解的同学可以见:
- snapshot scalability: Don’t compute global horizons while building snapshots.
- snapshot scalability: Move PGXACT->xmin back to PGPROC.
- snapshot scalability: Move PGXACT->vacuumFlags to ProcGlobal->vacuumFlags.
- snapshot scalability: Move subxact info to ProcGlobal, remove PGXACT.
- snapshot scalability: Introduce dense array of in-progress xids.
- snapshot scalability: cache snapshots using a xact completion counter.
- Fix race condition in snapshot caching when 2PC is used.