Sample Header Ad - 728x90

Regularly updating table with huge amount of rows

2 votes
1 answer
179 views
I have a table containing (in addition to some metadata) a score and a boolean flag: CREATE TABLE scores ( score_id SERIAL NOT NULL CONSTRAINT score_pkey PRIMARY KEY, user_id INTEGER NOT NULL, score DOUBLE PRECISION NOT NULL, flag BOOLEAN NOT NULL ); There are multiple indexes for querying the data: CREATE UNIQUE INDEX score_pkey ON scores (score_id); CREATE INDEX ix_scores_user_id ON scores (user_id); CREATE INDEX ix_scores_score_desc ON scores (score desc); CREATE INDEX ix_scores_flag ON scores (score_id) WHERE (flag = true); The table currently contains around 120 million rows and for most of them, the score value is updated once a day. The flag column may be toggled (only from default value false to true but then never back to false) at any point of the day and independently from the score updates. Scores are generally only updated for some users (around half to one third currently) and where the flag is set to false. The scores are calculated on a worker machine and updated in batches (ca 3000 at a time) in a single update-query. There is no way of calculating the data on the database server. The query looks like this: UPDATE scores as s SET score = tmp.score FROM ( VALUES (32373477, 0.5566874822853446), (32373478, 0.5243741268418393) ) AS tmp(id, score) WHERE tmp.id = s.score_id The huge amount of updates causes some issues with our database as the updates require a lot of disk I/O (index updates and column rewrites), increases the queue depth and thus slowing down other queries. My primary goal is to reduce the I/O writes, speed up the update process and reduce load on the database. What are my options here? It is possible for me to move the flag column to another table and do a complete rewrite of the score table. May this be more performant than updating only a selected amount of rows (~one third is updated each day)? Alternatively, is writing all changed rows to a separate, index-less table and updating the primary table in-database a preferable solution? Does anybody have some experience with and solutions for a similar kind of problem they are able to share? The database server is a m4.large (2 cores, 8 GiB memory, 250 GiB storage, 750 IOPS) PostgreSQL 9.6.6 instance.
Asked by Birne94 (371 rep)
Nov 1, 2018, 10:46 AM
Last activity: Jul 1, 2025, 10:07 PM