Sample Header Ad - 728x90

Can I optimize UPDATE statements using EXPLAIN?

1 vote
1 answer
265 views
I have a few UPDATE calls on my production system that are running slow and I've been instructed to try and optimize them. I'm reading into how to understand EXPLAIN with (ANALYZE, BUFFERS) as well to get more information and I'm trying to figure out why this one plan node is taking so long. I have a feeling that it might be dude to the number of records that are getting updated but I wonder if someone with more experience looking at EXPLAIN output might be able to shed some light on this mystery? Here is the UPDATE statement that is showing up in my metrics:
-- $1 and $2 are variables set in a Rails application
UPDATE "redacted_table_name"
   SET "some_fk_id" = $1
 WHERE "redacted_table_name"."some_fk_id" = $2
For context, this table has over **280 million** records. I wanted to try and do a full analysis on this table so I had the database cloned and ran a "safe" EXPLAIN (ANYALYZE, BUFFERS) which runs an actual update and sure enough it took quite some time to complete:
-- Do an actual UPDATE w/o changing any data
EXPLAIN (ANALYZE, BUFFERS)
UPDATE "redacted_table_name"
   SET "some_fk_id" = 1
 WHERE "redacted_table_name"."some_fk_id" = 1
QUERY PLAN:

Update on redacted_table_name  (cost=669.64..108899.13 rows=0 width=0) (actual time=104067.491..104067.492 rows=0 loops=1)
  Buffers: shared hit=3329124 read=290028 dirtied=276605 written=906
  I/O Timings: shared read=90426.849 write=26.727
  ->  Bitmap Heap Scan on redacted_table_name  (cost=669.64..108899.13 rows=30073 width=10) (actual time=213.507..14936.244 rows=137766 loops=1)
        Recheck Cond: (some_fk_id = 1)
        Rows Removed by Index Recheck: 5904106
        Heap Blocks: exact=39787 lossy=38580
        Buffers: shared hit=411 read=78433 dirtied=10
        I/O Timings: shared read=14094.250
        ->  Bitmap Index Scan on index_redacted_table_name_on_some_fk_id  (cost=0.00..662.12 rows=30073 width=0) (actual time=206.434..206.434 rows=137766 loops=1)
              Index Cond: (some_fk_id = 1)
              Buffers: shared read=477
              I/O Timings: shared read=193.942
Planning Time: 0.074 ms
Execution Time: 104067.523 ms
As you can see, I have an INDEX already on "redacted_table_name"."some_fk_id" which PG is taking advantage of but why is it doing a Bitmap Head Scan after the fact? It doesn't seem to be using the INDEX that its child node is using. Why is it taking so long there? Here is a similar update on the some_fk_id column that has the most records in the table (3 million records and the worst case scenario):
Update on redacted_table_name  (cost=0.00..5344540.00 rows=0 width=0) (actual time=436965.067..436965.068 rows=0 loops=1)
  Buffers: shared hit=139949860 read=4964122 dirtied=4521138 written=1928290
  I/O Timings: shared read=176626.661 write=8909.815
  ->  Seq Scan on redacted_table_name  (cost=0.00..5344540.00 rows=3890845 width=10) (actual time=0.026..36374.777 rows=3617975 loops=1)
        Filter: (some_fk_id = 93330)
        Rows Removed by Filter: 278327308
        Buffers: shared hit=207516 read=1612708 dirtied=52 written=420751
        I/O Timings: shared read=17983.171 write=2653.138
Planning:
  Buffers: shared hit=1
Planning Time: 0.078 ms
Execution Time: 436965.106 ms
The Seq Scan on redacted_table_name indicates to me that the index wasn't even used with this UPDATE. So what gives? **UPDATE:** Both on my prod and my cloned prod the work_mem is 4MB and on prod shared_buffers is 8047248kB but on the cloned prod that I ran these on the shared_buffers was 3983192kB. The prod clone was shutdown so I wasn't able to get the effective_cache_size but prod has 7966384kB for that setting. **UPDATE: #2** Below is the CREATE TABLE script along with all of the constraints and indexes that are on the table:
-- This table simply has a PK and four foreign keys
-- that reference other tables in the database.
CREATE TABLE IF NOT EXISTS public.redacted_table_name
(
    id bigint NOT NULL DEFAULT nextval('redacted_table_name_id_seq'::regclass),
    fk_table_3_id bigint,

    -- to stay with example above this is the key/column
    -- that's getting updated.
    some_fk_id bigint,
    fk_table_4_id bigint,
    fk_table_2_id bigint,
    fk_table_1_id bigint,
    CONSTRAINT redacted_table_name_pkey PRIMARY KEY (id),
    CONSTRAINT fk_rails_3fc9a63917 FOREIGN KEY (fk_table_4_id)
        REFERENCES public.fk_table_4 (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION,
    CONSTRAINT fk_rails_4e272bcb96 FOREIGN KEY (fk_table_1_id)
        REFERENCES public.fk_table_1 (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION,
    CONSTRAINT fk_rails_9ba08049bc FOREIGN KEY (some_fk_id)
        REFERENCES public.some_fk_table (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION,
    CONSTRAINT fk_rails_d6e3c90d4d FOREIGN KEY (fk_table_3_id)
        REFERENCES public.fk_table_3 (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION,
    CONSTRAINT fk_rails_f79e2fb3ae FOREIGN KEY (fk_table_2_id)
        REFERENCES public.fk_table_2 (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
)

TABLESPACE pg_default;

ALTER TABLE IF EXISTS public.redacted_table_name
    OWNER to pg_user;

CREATE INDEX IF NOT EXISTS index_redacted_table_name_on_fk_table_4_id
    ON public.redacted_table_name USING btree
    (fk_table_4_id ASC NULLS LAST)
    TABLESPACE pg_default;

CREATE INDEX IF NOT EXISTS index_redacted_table_name_on_some_fk_id
    ON public.redacted_table_name USING btree
    (some_fk_id ASC NULLS LAST)
    TABLESPACE pg_default;

CREATE INDEX IF NOT EXISTS index_redacted_table_name_on_fk_table_1_id
    ON public.redacted_table_name USING btree
    (fk_table_1_id ASC NULLS LAST)
    TABLESPACE pg_default;

CREATE INDEX IF NOT EXISTS index_redacted_table_name_on_fk_table_2_id
    ON public.redacted_table_name USING btree
    (fk_table_2_id ASC NULLS LAST)
    TABLESPACE pg_default;

CREATE INDEX IF NOT EXISTS index_redacted_table_name_on_fk_table_3_id
    ON public.redacted_table_name USING btree
    (fk_table_3_id ASC NULLS LAST)
    TABLESPACE pg_default;

CREATE INDEX IF NOT EXISTS index_redacted_table_name_on_fk_table_3_id_and_some_fk_id
    ON public.redacted_table_name USING btree
    (fk_table_3_id ASC NULLS LAST, some_fk_id ASC NULLS LAST)
    TABLESPACE pg_default;

CREATE UNIQUE INDEX IF NOT EXISTS unique_on_fk_table_3_fk_table_4
    ON public.redacted_table_name USING btree
    (fk_table_3_id ASC NULLS LAST, some_fk_id ASC NULLS LAST, fk_table_1_id ASC NULLS LAST)
    TABLESPACE pg_default;
**TL;DR** Is this just a slow SQL statement due to the number of rows it has to update or can I optimize this table somehow? If I'm not providing enough information, please let me know what you need and I'll provide it. Thanks again for any help!
Asked by aarona (113 rep)
Aug 23, 2024, 03:38 AM
Last activity: Aug 28, 2024, 04:52 PM