INSERT my_table (fields,...) value (value_1,...), (value_2, ...)... is slow in citus
0
votes
1
answer
54
views
Environment:
- host:
- resources:32 cores, 64GB RAM, Docker 24.07
- OS: CentOS 7
- disk: 10 separate disk partitions
The 10 separate disk partitions are as below
df -h
filesystem capacity used avail used% mount
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 0 32G 0% /dev/shm
tmpfs 32G 66M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/mapper/centos-root 168G 16G 153G 10% /
/dev/sde1 9.8T 107M 9.8T 1% /data03
/dev/sdj1 9.8T 33M 9.8T 1% /data08
/dev/sdd1 9.8T 105M 9.8T 1% /data02
/dev/sdl1 9.8T 33M 9.8T 1% /data10
/dev/sdb1 9.8T 1.4G 9.8T 1% /data00
/dev/sdh1 9.8T 33M 9.8T 1% /data06
/dev/sdi1 9.8T 33M 9.8T 1% /data07
/dev/sdf1 9.8T 107M 9.8T 1% /data04
/dev/sdc1 9.8T 107M 9.8T 1% /data01
/dev/sdg1 9.8T 33M 9.8T 1% /data05
/dev/sda1 1014M 226M 789M 23% /boot
tmpfs 6.3G 0 6.3G 0% /run/user/1010
I use below docker-compose.yml to build a 3-node citus cluser in docker, and a standalone postgresql named "pg-16". They all use the same image. The only difference between pg-16 and citus-xxxx container is that I remove
shared_preload_libraries='citus'
from postgresql.conf
for pg-16. By the way, I didn't change other configuration in postgresql.conf. All default values
version: "3.7"
services:
citus-coordinator:
image: citusdata/citus:latest
container_name: citus-coordinator
hostname: citus-coordinator
deploy:
resources:
limits:
cpus: "4"
memory: "16G"
ports:
- 5440:5432
environment:
POSTGRES_PASSWORD: 'postgres'
volumes:
- /data01/postgresql:/var/lib/postgresql
- /data01/postgresql/data:/var/lib/postgresql/data
citus-worker1:
image: citusdata/citus:latest
container_name: citus-worker1
hostname: citus-worker1
deploy:
resources:
limits:
cpus: "4"
memory: "16G"
ports:
- 5441:5432
environment:
POSTGRES_PASSWORD: 'postgres'
volumes:
- /data02/postgresql:/var/lib/postgresql
- /data02/postgresql/data:/var/lib/postgresql/data
citus-worker2:
image: citusdata/citus:latest
container_name: citus-worker2
hostname: citus-worker2
deploy:
resources:
limits:
cpus: "4"
memory: "16G"
ports:
- 5442:5432
environment:
POSTGRES_PASSWORD: 'postgres'
volumes:
- /data03/postgresql:/var/lib/postgresql
- /data03/postgresql/data:/var/lib/postgresql/data
citus-worker3:
image: citusdata/citus:latest
container_name: citus-worker3
hostname: citus-worker3
deploy:
resources:
limits:
cpus: "4"
memory: "16G"
ports:
- 5443:5432
environment:
POSTGRES_PASSWORD: 'postgres'
volumes:
- /data04/postgresql:/var/lib/postgresql
- /data04/postgresql/data:/var/lib/postgresql/data
pg-16:
image: citusdata/citus:latest
container_name: pg-16
hostname: pg-16
deploy:
resources:
limits:
cpus: "4"
memory: "16G"
ports:
- 5433:5432
environment:
POSTGRES_PASSWORD: 'postgres'
volumes:
- /data00/postgresql:/var/lib/postgresql
- /data00/postgresql/data:/var/lib/postgresql/data
As you can see in my docker-compose.yml, each citus node (incl. coordinator) uses one disk partition, so that the I/O of disk partition is not shared by other nodes.
My table gm_data2
is as below:
create table if not exists gm_data2
(
id bigint not null,
period_type varchar(2) not null,
billing_determinates varchar(40),
file_id bigint not null,
region varchar(2) not null,
ratio varchar(8) not null,
date_created timestamp not null,
last_updated timestamp not null,
read_seq varchar(40) not null,
cyc_dt varchar(3) not null,
cust_addr varchar(40) not null,
hes_code varchar(30),
seg_type varchar(3) not null,
ctr_type varchar(5) not null,
custno varchar(20) not null,
meterno varchar(16) not null constraint gm_data2_pkey primary key,
tou varchar(5),
cust_name varchar(30),
publish_id varchar(5),
group_id varchar(3),
reg_ctr_vol varchar(8),
period_dt varchar(3),
service_group_id varchar(4),
read_dt varchar(10),
billing_mark varchar(2),
high_low_vol integer
);
After the above citus containers are created, I have completed the required actions to configure the cluster
dbbench=> SELECT * FROM citus_get_active_worker_nodes();
node_name | node_port
---------------+-----------
citus-worker3 | 5432
citus-worker2 | 5432
citus-worker1 | 5432
dbbench=> SELECT create_distributed_table('gm_data2', 'meterno');
rebalance_table_shards
------------------------
(1 row)
dbbench=> SELECT rebalance_table_shards('gm_data2');
rebalance_table_shards
------------------------
(1 row)
SELECT count(1) FROM citus_shards;
count
-------
32
(1 row)
dbbench=> SELECT * FROM citus_shards;
table_name | shardid | shard_name | citus_table_type | colocation_id | nodename | nodeport | shard_size
------------+---------+-----------------+------------------+---------------+-------------------+----------+------------
gm_data2 | 102040 | gm_data2_102040 | distributed | 2 | citus-coordinator | 5432 | 565248
gm_data2 | 102041 | gm_data2_102041 | distributed | 2 | citus-worker1 | 5432 | 581632
gm_data2 | 102042 | gm_data2_102042 | distributed | 2 | citus-worker2 | 5432 | 557056
gm_data2 | 102043 | gm_data2_102043 | distributed | 2 | citus-worker3 | 5432 | 892928
gm_data2 | 102044 | gm_data2_102044 | distributed | 2 | citus-coordinator | 5432 | 917504
gm_data2 | 102045 | gm_data2_102045 | distributed | 2 | citus-worker1 | 5432 | 589824
gm_data2 | 102046 | gm_data2_102046 | distributed | 2 | citus-worker2 | 5432 | 933888
gm_data2 | 102047 | gm_data2_102047 | distributed | 2 | citus-worker3 | 5432 | 892928
gm_data2 | 102048 | gm_data2_102048 | distributed | 2 | citus-coordinator | 5432 | 892928
gm_data2 | 102049 | gm_data2_102049 | distributed | 2 | citus-worker1 | 5432 | 573440
gm_data2 | 102050 | gm_data2_102050 | distributed | 2 | citus-worker2 | 5432 | 933888
gm_data2 | 102051 | gm_data2_102051 | distributed | 2 | citus-worker3 | 5432 | 925696
gm_data2 | 102052 | gm_data2_102052 | distributed | 2 | citus-coordinator | 5432 | 892928
gm_data2 | 102053 | gm_data2_102053 | distributed | 2 | citus-worker1 | 5432 | 573440
gm_data2 | 102054 | gm_data2_102054 | distributed | 2 | citus-worker2 | 5432 | 917504
gm_data2 | 102055 | gm_data2_102055 | distributed | 2 | citus-worker3 | 5432 | 909312
gm_data2 | 102056 | gm_data2_102056 | distributed | 2 | citus-coordinator | 5432 | 950272
gm_data2 | 102057 | gm_data2_102057 | distributed | 2 | citus-worker1 | 5432 | 548864
gm_data2 | 102058 | gm_data2_102058 | distributed | 2 | citus-worker2 | 5432 | 909312
gm_data2 | 102059 | gm_data2_102059 | distributed | 2 | citus-worker3 | 5432 | 950272
gm_data2 | 102060 | gm_data2_102060 | distributed | 2 | citus-coordinator | 5432 | 917504
gm_data2 | 102061 | gm_data2_102061 | distributed | 2 | citus-worker1 | 5432 | 573440
gm_data2 | 102062 | gm_data2_102062 | distributed | 2 | citus-worker2 | 5432 | 901120
gm_data2 | 102063 | gm_data2_102063 | distributed | 2 | citus-worker3 | 5432 | 933888
gm_data2 | 102064 | gm_data2_102064 | distributed | 2 | citus-coordinator | 5432 | 917504
gm_data2 | 102065 | gm_data2_102065 | distributed | 2 | citus-worker1 | 5432 | 581632
gm_data2 | 102066 | gm_data2_102066 | distributed | 2 | citus-worker2 | 5432 | 917504
gm_data2 | 102067 | gm_data2_102067 | distributed | 2 | citus-worker3 | 5432 | 876544
gm_data2 | 102068 | gm_data2_102068 | distributed | 2 | citus-coordinator | 5432 | 917504
gm_data2 | 102069 | gm_data2_102069 | distributed | 2 | citus-worker1 | 5432 | 581632
gm_data2 | 102070 | gm_data2_102070 | distributed | 2 | citus-worker2 | 5432 | 884736
gm_data2 | 102071 | gm_data2_102071 | distributed | 2 | citus-worker3 | 5432 | 925696
(32 rows)
Now I need to insert 100000 records to a table named gm_data2
. There are two methods to insert these records.
- method 1 (batch size=1):
Use below INSERT for each records. That means I need to do 100000 times of INSERT actions
INSERT INTO gm_data2 (field11, field12, ...) values (value1_1, value1_2, ...)
- method 2 (batch size=100000):
Use batch INSERT to insert the 100000 records into foo_table in one command as below:
INSERT INTO gm_data2(field11, field12, ...) values
(value1_1, value1_2, ...),
(value2_1, value2_2, ...),
(value3_1, value3_2, ...),
.....
(value100000_1, value100000_2, ...)
below is my test result. the cell is seconds
| server | batch_size=1 | batch_size=1000 | batch_size=10000 | batch_size=100000 |
| -------------- | ------------ | --------------- | ---------------- | ----------------- |
| citus cluster | 62.752 | 26.746 | 35.002 | 40.899 |
| pg-16 | 28.908 | 7.876 | 8.375 | 8.909 |
My question is, why a 3-node citus cluster is far slower than that of a standalone pg-16 ? And how to tune this citus performance ?
Asked by wureka
(159 rep)
Dec 11, 2023, 11:47 PM
Last activity: Jan 4, 2024, 01:09 PM
Last activity: Jan 4, 2024, 01:09 PM