Sample Header Ad - 728x90

INSERT my_table (fields,...) value (value_1,...), (value_2, ...)... is slow in citus

0 votes
1 answer
54 views
Environment: - host: - resources:32 cores, 64GB RAM, Docker 24.07 - OS: CentOS 7 - disk: 10 separate disk partitions The 10 separate disk partitions are as below df -h filesystem capacity used avail used% mount devtmpfs 32G 0 32G 0% /dev tmpfs 32G 0 32G 0% /dev/shm tmpfs 32G 66M 32G 1% /run tmpfs 32G 0 32G 0% /sys/fs/cgroup /dev/mapper/centos-root 168G 16G 153G 10% / /dev/sde1 9.8T 107M 9.8T 1% /data03 /dev/sdj1 9.8T 33M 9.8T 1% /data08 /dev/sdd1 9.8T 105M 9.8T 1% /data02 /dev/sdl1 9.8T 33M 9.8T 1% /data10 /dev/sdb1 9.8T 1.4G 9.8T 1% /data00 /dev/sdh1 9.8T 33M 9.8T 1% /data06 /dev/sdi1 9.8T 33M 9.8T 1% /data07 /dev/sdf1 9.8T 107M 9.8T 1% /data04 /dev/sdc1 9.8T 107M 9.8T 1% /data01 /dev/sdg1 9.8T 33M 9.8T 1% /data05 /dev/sda1 1014M 226M 789M 23% /boot tmpfs 6.3G 0 6.3G 0% /run/user/1010 I use below docker-compose.yml to build a 3-node citus cluser in docker, and a standalone postgresql named "pg-16". They all use the same image. The only difference between pg-16 and citus-xxxx container is that I remove shared_preload_libraries='citus' from postgresql.conf for pg-16. By the way, I didn't change other configuration in postgresql.conf. All default values version: "3.7" services: citus-coordinator: image: citusdata/citus:latest container_name: citus-coordinator hostname: citus-coordinator deploy: resources: limits: cpus: "4" memory: "16G" ports: - 5440:5432 environment: POSTGRES_PASSWORD: 'postgres' volumes: - /data01/postgresql:/var/lib/postgresql - /data01/postgresql/data:/var/lib/postgresql/data citus-worker1: image: citusdata/citus:latest container_name: citus-worker1 hostname: citus-worker1 deploy: resources: limits: cpus: "4" memory: "16G" ports: - 5441:5432 environment: POSTGRES_PASSWORD: 'postgres' volumes: - /data02/postgresql:/var/lib/postgresql - /data02/postgresql/data:/var/lib/postgresql/data citus-worker2: image: citusdata/citus:latest container_name: citus-worker2 hostname: citus-worker2 deploy: resources: limits: cpus: "4" memory: "16G" ports: - 5442:5432 environment: POSTGRES_PASSWORD: 'postgres' volumes: - /data03/postgresql:/var/lib/postgresql - /data03/postgresql/data:/var/lib/postgresql/data citus-worker3: image: citusdata/citus:latest container_name: citus-worker3 hostname: citus-worker3 deploy: resources: limits: cpus: "4" memory: "16G" ports: - 5443:5432 environment: POSTGRES_PASSWORD: 'postgres' volumes: - /data04/postgresql:/var/lib/postgresql - /data04/postgresql/data:/var/lib/postgresql/data pg-16: image: citusdata/citus:latest container_name: pg-16 hostname: pg-16 deploy: resources: limits: cpus: "4" memory: "16G" ports: - 5433:5432 environment: POSTGRES_PASSWORD: 'postgres' volumes: - /data00/postgresql:/var/lib/postgresql - /data00/postgresql/data:/var/lib/postgresql/data As you can see in my docker-compose.yml, each citus node (incl. coordinator) uses one disk partition, so that the I/O of disk partition is not shared by other nodes. My table gm_data2 is as below: create table if not exists gm_data2 ( id bigint not null, period_type varchar(2) not null, billing_determinates varchar(40), file_id bigint not null, region varchar(2) not null, ratio varchar(8) not null, date_created timestamp not null, last_updated timestamp not null, read_seq varchar(40) not null, cyc_dt varchar(3) not null, cust_addr varchar(40) not null, hes_code varchar(30), seg_type varchar(3) not null, ctr_type varchar(5) not null, custno varchar(20) not null, meterno varchar(16) not null constraint gm_data2_pkey primary key, tou varchar(5), cust_name varchar(30), publish_id varchar(5), group_id varchar(3), reg_ctr_vol varchar(8), period_dt varchar(3), service_group_id varchar(4), read_dt varchar(10), billing_mark varchar(2), high_low_vol integer ); After the above citus containers are created, I have completed the required actions to configure the cluster dbbench=> SELECT * FROM citus_get_active_worker_nodes(); node_name | node_port ---------------+----------- citus-worker3 | 5432 citus-worker2 | 5432 citus-worker1 | 5432 dbbench=> SELECT create_distributed_table('gm_data2', 'meterno'); rebalance_table_shards ------------------------ (1 row) dbbench=> SELECT rebalance_table_shards('gm_data2'); rebalance_table_shards ------------------------ (1 row) SELECT count(1) FROM citus_shards; count ------- 32 (1 row) dbbench=> SELECT * FROM citus_shards; table_name | shardid | shard_name | citus_table_type | colocation_id | nodename | nodeport | shard_size ------------+---------+-----------------+------------------+---------------+-------------------+----------+------------ gm_data2 | 102040 | gm_data2_102040 | distributed | 2 | citus-coordinator | 5432 | 565248 gm_data2 | 102041 | gm_data2_102041 | distributed | 2 | citus-worker1 | 5432 | 581632 gm_data2 | 102042 | gm_data2_102042 | distributed | 2 | citus-worker2 | 5432 | 557056 gm_data2 | 102043 | gm_data2_102043 | distributed | 2 | citus-worker3 | 5432 | 892928 gm_data2 | 102044 | gm_data2_102044 | distributed | 2 | citus-coordinator | 5432 | 917504 gm_data2 | 102045 | gm_data2_102045 | distributed | 2 | citus-worker1 | 5432 | 589824 gm_data2 | 102046 | gm_data2_102046 | distributed | 2 | citus-worker2 | 5432 | 933888 gm_data2 | 102047 | gm_data2_102047 | distributed | 2 | citus-worker3 | 5432 | 892928 gm_data2 | 102048 | gm_data2_102048 | distributed | 2 | citus-coordinator | 5432 | 892928 gm_data2 | 102049 | gm_data2_102049 | distributed | 2 | citus-worker1 | 5432 | 573440 gm_data2 | 102050 | gm_data2_102050 | distributed | 2 | citus-worker2 | 5432 | 933888 gm_data2 | 102051 | gm_data2_102051 | distributed | 2 | citus-worker3 | 5432 | 925696 gm_data2 | 102052 | gm_data2_102052 | distributed | 2 | citus-coordinator | 5432 | 892928 gm_data2 | 102053 | gm_data2_102053 | distributed | 2 | citus-worker1 | 5432 | 573440 gm_data2 | 102054 | gm_data2_102054 | distributed | 2 | citus-worker2 | 5432 | 917504 gm_data2 | 102055 | gm_data2_102055 | distributed | 2 | citus-worker3 | 5432 | 909312 gm_data2 | 102056 | gm_data2_102056 | distributed | 2 | citus-coordinator | 5432 | 950272 gm_data2 | 102057 | gm_data2_102057 | distributed | 2 | citus-worker1 | 5432 | 548864 gm_data2 | 102058 | gm_data2_102058 | distributed | 2 | citus-worker2 | 5432 | 909312 gm_data2 | 102059 | gm_data2_102059 | distributed | 2 | citus-worker3 | 5432 | 950272 gm_data2 | 102060 | gm_data2_102060 | distributed | 2 | citus-coordinator | 5432 | 917504 gm_data2 | 102061 | gm_data2_102061 | distributed | 2 | citus-worker1 | 5432 | 573440 gm_data2 | 102062 | gm_data2_102062 | distributed | 2 | citus-worker2 | 5432 | 901120 gm_data2 | 102063 | gm_data2_102063 | distributed | 2 | citus-worker3 | 5432 | 933888 gm_data2 | 102064 | gm_data2_102064 | distributed | 2 | citus-coordinator | 5432 | 917504 gm_data2 | 102065 | gm_data2_102065 | distributed | 2 | citus-worker1 | 5432 | 581632 gm_data2 | 102066 | gm_data2_102066 | distributed | 2 | citus-worker2 | 5432 | 917504 gm_data2 | 102067 | gm_data2_102067 | distributed | 2 | citus-worker3 | 5432 | 876544 gm_data2 | 102068 | gm_data2_102068 | distributed | 2 | citus-coordinator | 5432 | 917504 gm_data2 | 102069 | gm_data2_102069 | distributed | 2 | citus-worker1 | 5432 | 581632 gm_data2 | 102070 | gm_data2_102070 | distributed | 2 | citus-worker2 | 5432 | 884736 gm_data2 | 102071 | gm_data2_102071 | distributed | 2 | citus-worker3 | 5432 | 925696 (32 rows) Now I need to insert 100000 records to a table named gm_data2. There are two methods to insert these records. - method 1 (batch size=1): Use below INSERT for each records. That means I need to do 100000 times of INSERT actions INSERT INTO gm_data2 (field11, field12, ...) values (value1_1, value1_2, ...) - method 2 (batch size=100000): Use batch INSERT to insert the 100000 records into foo_table in one command as below: INSERT INTO gm_data2(field11, field12, ...) values (value1_1, value1_2, ...), (value2_1, value2_2, ...), (value3_1, value3_2, ...), ..... (value100000_1, value100000_2, ...) below is my test result. the cell is seconds | server | batch_size=1 | batch_size=1000 | batch_size=10000 | batch_size=100000 | | -------------- | ------------ | --------------- | ---------------- | ----------------- | | citus cluster | 62.752 | 26.746 | 35.002 | 40.899 | | pg-16 | 28.908 | 7.876 | 8.375 | 8.909 | My question is, why a 3-node citus cluster is far slower than that of a standalone pg-16 ? And how to tune this citus performance ?
Asked by wureka (159 rep)
Dec 11, 2023, 11:47 PM
Last activity: Jan 4, 2024, 01:09 PM