1. 问题

中启乘数科技是一家专业的Greenplum数据库服务商,在一个客户的环境中发现一张小表 left join 分区大表,经常不走索引。经过仔细测试,发现情况是:

  1. 一张小表 left join的大表必须是分区表,而且join的字段必须是varchar类型的字段是才不走索引。
  2. 一张小表 left join的普通大表,可以走到索引
  3. 一张小表 left join的大表,即使是分区表,但如果join健是 text类型或int类型,也是可以走到索引的。

2. 问题重现过程

建测试表:

  1. create table test_big(
  2. id varchar(32),
  3. t varchar(32)
  4. ) distributed by (id)
  5. partition by range(t)
  6. (
  7. partition p1 start ('0') inclusive end ('5') exclusive,
  8. partition p2 start ('5') inclusive end ('9999999999999999999') inclusive
  9. );
  10. insert into test_big select seq, seq from generate_series(1, 10000000) as seq;
  11. create index idx_test_big_id on test_big(id);
  12. create table test_small(
  13. id varchar(32),
  14. t varchar(32)
  15. ) distributed by (id);
  16. insert into test_small select seq*10000, seq*10000 from generate_series(1, 100) as seq;

看执行计划:

  1. postgres=# set Optimizer to off;
  2. SET
  3. postgres=# set enable_nestloop to on;
  4. SET
  5. postgres=# analyze test_big_1_prt_p1;
  6. ANALYZE
  7. postgres=# analyze test_big_1_prt_p2;
  8. ANALYZE
  9. postgres=# explain analyze select a.* from test_small a left join test_big b on a.id=b.id;
  10. QUERY PLAN
  11. ---------------------------------------------------------------------------------------------------------------------------------------------
  12. ---------------
  13. Gather Motion 2:1 (slice2; segments: 2) (cost=4.25..350948.25 rows=100 width=12) (actual time=5428.774..5430.486 rows=100 loops=1)
  14. -> Hash Right Join (cost=4.25..350948.25 rows=50 width=12) (actual time=44.883..5427.152 rows=61 loops=1)
  15. Hash Cond: ((b.id)::text = (a.id)::text)
  16. Extra Text: (seg1) Hash chain length 1.0 avg, 1 max, using 61 of 524288 buckets.
  17. -> Redistribute Motion 2:2 (slice1; segments: 2) (cost=0.00..313443.00 rows=5000000 width=7) (actual time=2.339..4486.409 rows=50
  18. 00055 loops=1)
  19. Hash Key: b.id
  20. -> Append (cost=0.00..113443.00 rows=5000000 width=7) (actual time=0.499..2188.017 rows=5000055 loops=1)
  21. -> Seq Scan on test_big_1_prt_p1 b (cost=0.00..50419.45 rows=2222223 width=7) (actual time=0.498..828.350 rows=2223115
  22. loops=1)
  23. -> Seq Scan on test_big_1_prt_p2 b_1 (cost=0.00..63023.55 rows=2777778 width=7) (actual time=0.234..527.056 rows=27786
  24. 15 loops=1)
  25. -> Hash (cost=3.00..3.00 rows=50 width=12) (actual time=0.119..0.119 rows=61 loops=1)
  26. -> Seq Scan on test_small a (cost=0.00..3.00 rows=50 width=12) (actual time=0.069..0.079 rows=61 loops=1)
  27. Planning time: 0.861 ms
  28. (slice0) Executor memory: 127K bytes.
  29. (slice1) Executor memory: 75K bytes avg x 2 workers, 75K bytes max (seg0).
  30. (slice2) Executor memory: 4232K bytes avg x 2 workers, 4232K bytes max (seg0). Work_mem: 3K bytes max.
  31. Memory used: 128000kB
  32. Optimizer: Postgres query optimizer
  33. Execution time: 5432.691 ms
  34. (18 rows)

发现不走索引。

但是如果是普通的 inner join是可以走索引的:

  1. postgres=# explain analyze select a.* from test_small a join test_big b on a.id=b.id;
  2. QUERY PLAN
  3. ---------------------------------------------------------------------------------------------------------------------------------------------
  4. ------------------------------
  5. Gather Motion 2:1 (slice2; segments: 2) (cost=0.18..66891.00 rows=100 width=12) (actual time=84.616..99.975 rows=100 loops=1)
  6. -> Nested Loop (cost=0.18..66891.00 rows=50 width=12) (actual time=43.385..94.197 rows=61 loops=1)
  7. -> Broadcast Motion 2:2 (slice1; segments: 2) (cost=0.00..6.00 rows=100 width=12) (actual time=1.052..1.165 rows=100 loops=1)
  8. -> Seq Scan on test_small a (cost=0.00..3.00 rows=50 width=12) (actual time=0.055..0.065 rows=61 loops=1)
  9. -> Append (cost=0.18..334.40 rows=1 width=7) (actual time=0.106..0.937 rows=1 loops=100)
  10. -> Index Only Scan using test_big_1_prt_p1_id_idx on test_big_1_prt_p1 b (cost=0.18..167.20 rows=1 width=7) (actual time=0.0
  11. 49..0.587 rows=0 loops=100)
  12. Index Cond: (id = (a.id)::text)
  13. Heap Fetches: 0
  14. -> Index Only Scan using test_big_1_prt_p2_id_idx on test_big_1_prt_p2 b_1 (cost=0.18..167.20 rows=1 width=7) (actual time=0
  15. .056..0.345 rows=0 loops=100)
  16. Index Cond: (id = (a.id)::text)
  17. Heap Fetches: 0
  18. Planning time: 1.046 ms
  19. (slice0) Executor memory: 200K bytes.
  20. (slice1) Executor memory: 58K bytes avg x 2 workers, 58K bytes max (seg0).
  21. (slice2) Executor memory: 280K bytes avg x 2 workers, 280K bytes max (seg0).
  22. Memory used: 128000kB
  23. Optimizer: Postgres query optimizer
  24. Execution time: 102.337 ms
  25. (18 rows)

如果直接join分区,是可以直接走索引的:

  1. postgres=# explain analyze select a.* from test_small a left join test_big_1_prt_p1 b on a.id=b.id;
  2. QUERY PLAN
  3. ---------------------------------------------------------------------------------------------------------------------------------------------
  4. ---------------------
  5. Gather Motion 2:1 (slice1; segments: 2) (cost=0.18..16724.00 rows=100 width=12) (actual time=4.751..5.025 rows=100 loops=1)
  6. -> Nested Loop Left Join (cost=0.18..16724.00 rows=50 width=12) (actual time=0.250..2.501 rows=61 loops=1)
  7. -> Seq Scan on test_small a (cost=0.00..3.00 rows=50 width=12) (actual time=0.056..0.074 rows=61 loops=1)
  8. -> Index Only Scan using test_big_1_prt_p1_id_idx on test_big_1_prt_p1 b (cost=0.18..167.20 rows=1 width=7) (actual time=0.002..0.
  9. 038 rows=0 loops=61)
  10. Index Cond: (id = (a.id)::text)
  11. Heap Fetches: 0
  12. Planning time: 0.740 ms
  13. (slice0) Executor memory: 96K bytes.
  14. (slice1) Executor memory: 164K bytes avg x 2 workers, 164K bytes max (seg0).
  15. Memory used: 128000kB
  16. Optimizer: Postgres query optimizer
  17. Execution time: 7.444 ms
  18. (12 rows)

如果join健是text类型则没有这个问题

建测试表:

  1. create table test2_big(
  2. id text,
  3. t text
  4. ) distributed by (id)
  5. partition by range(t)
  6. (
  7. partition p1 start ('0') inclusive end ('5') exclusive,
  8. partition p2 start ('5') inclusive end ('9999999999999999999') inclusive
  9. );
  10. insert into test2_big select seq, seq from generate_series(1, 10000000) as seq;
  11. create index idx_test2_big_id on test2_big(id);
  12. create table test2_small(
  13. id text,
  14. t text
  15. ) distributed by (id);
  16. insert into test2_small select seq*10000, seq*10000 from generate_series(1, 100) as seq;
  17. postgres=# analyze test2_big_1_prt_p1;
  18. ANALYZE
  19. postgres=# analyze test2_big_1_prt_p2;
  20. ANALYZE

看执行计划:

  1. postgres=# set Optimizer to off;
  2. SET
  3. postgres=# set enable_nestloop to on;
  4. SET
  5. postgres=# explain analyze select a.* from test2_small a left join test2_big b on a.id=b.id;
  6. QUERY PLAN
  7. ---------------------------------------------------------------------------------------------------------------------------------------------
  8. -------------------------------
  9. Gather Motion 2:1 (slice1; segments: 2) (cost=0.18..33445.00 rows=100 width=12) (actual time=2.988..4.136 rows=100 loops=1)
  10. -> Nested Loop Left Join (cost=0.18..33445.00 rows=50 width=12) (actual time=0.113..1.983 rows=61 loops=1)
  11. -> Seq Scan on test2_small a (cost=0.00..3.00 rows=50 width=12) (actual time=0.011..0.030 rows=61 loops=1)
  12. -> Append (cost=0.18..334.40 rows=1 width=7) (actual time=0.002..0.031 rows=1 loops=61)
  13. -> Index Only Scan using test2_big_1_prt_p1_id_idx on test2_big_1_prt_p1 b (cost=0.18..167.20 rows=1 width=7) (actual time=0
  14. .000..0.013 rows=0 loops=61)
  15. Index Cond: (id = a.id)
  16. Heap Fetches: 0
  17. -> Index Only Scan using test2_big_1_prt_p2_id_idx on test2_big_1_prt_p2 b_1 (cost=0.18..167.20 rows=1 width=7) (actual time
  18. =0.001..0.015 rows=1 loops=61)
  19. Index Cond: (id = a.id)
  20. Heap Fetches: 0
  21. Planning time: 0.810 ms
  22. (slice0) Executor memory: 128K bytes.
  23. (slice1) Executor memory: 268K bytes avg x 2 workers, 268K bytes max (seg0).
  24. Memory used: 128000kB
  25. Optimizer: Postgres query optimizer
  26. Execution time: 6.018 ms
  27. (16 rows)

3. 问题解决

问题已经定位,是bug,需要修改代码解决,如有兴趣,可以邮件customer@csudata.com

本站文章,未经作者同意,请勿转载,如需转载,请邮件customer@csudata.com.
0 评论  
添加一条新评论