Optimize query by telling Postgres to scan records from the most recent to the oldest
1
vote
1
answer
116
views
I am using Postgres 12 and in my app I have a table that I am using to store specific events that contain information about things that happened outside of the system and related to some records in my DB.
The table looks like this:
CREATE TABLE events (
id BIGSERIAL PRIMARY KEY,
eventable_type VARCHAR(255) NOT NULL,
eventable_id BIGINT NOT NULL,
type VARCHAR(255) NOT NULL,
data JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
);
CREATE INDEX index_events_on_eventable ON events (eventable_type, eventable_id);
For example: a meeting was booked in Google calendar. An event is created in this table with the details of what happened and the record is associated with the internal representation of the meeting in the DB. The data
attribute contains the details of the event that also contain a unique id like:
INSERT INTO events (eventable_type, eventable_id, type, data) VALUES ('MyInternalEvent', 1234, 'GoogleCalendarEvent', '{"action": "created", "GoogleId": "abcdef1234"}'::jsonb);
INSERT INTO events (eventable_type, eventable_id, type, data) VALUES ('MyInternalEvent', 1234, 'GoogleCalendarEvent', '{"action": "updated", "GoogleId": "abcdef1234"}'::jsonb);
INSERT INTO events (eventable_type, eventable_id, type, data) VALUES ('MyInternalEvent', 1234, 'GoogleCalendarEvent', '{"action": "deleted", "GoogleId": "abcdef1234"}'::jsonb);
INSERT INTO events (eventable_type, eventable_id, type, data) VALUES ('MyInternalEvent', 5678, 'GoogleCalendarEvent', '{"action": "created", "GoogleId": "dsfsdf2343"}'::jsonb);
INSERT INTO events (eventable_type, eventable_id, type, data) VALUES ('MyInternalEvent', 5678, 'GoogleCalendarEvent', '{"action": "updated", "GoogleId": "dsfsdf2343"}'::jsonb);
INSERT INTO events (eventable_type, eventable_id, type, data) VALUES ('MyInternalEvent', 5678, 'GoogleCalendarEvent', '{"action": "deleted", "GoogleId": "dsfsdf2343"}'::jsonb);
I query the events table like:
SELECT * FROM events WHERE events.type = 'GoogleCalendarEvent' AND (data->>'GoogleId' = 'abcdef1234') LIMIT 1;
In terms of cardinality of operations, **the number of writes is approximately 3 times than the number of reads**. That is: we write more than we read. The table has around 3 million rows, growing rapidly. About 300k rows are added to the table every day.
At the moment we only store one other type
of event in the table, let's call it GoogleEmailEvent
. Filtering by GoogleCalendarEvent
would return roughly 50% of records in the table. Filtering by GoogleId
would normally return less than 10 records, but we really only need 1 because they are all associated to the same "Eventable" as you can see in the example inserts.
I want to improve the execution time of the query, I have thought about:
- adding an index WHERE data->>'GoogleId' IS NOT NULL
. But I am worried about slowing down writes
- storing data->>'GoogleId'
in a separate table together with the id of the event to allow for fast retrieval. How effective would this be? This would also slow down writes somewhat.
- indexing created_at
and using that in the query to narrow down the records in the query somehow
Important detail:
The vast majority of the time (99% of the time or more) the matching event is one that has been inserted in the table recently (say, within 10 minutes). Can I take advantage of this details to speed up the query? Would adding ORDER BY Id DESC LIMIT 1
do the trick?
Asked by Perennialista
(113 rep)
Aug 17, 2024, 05:42 PM
Last activity: Aug 21, 2024, 09:52 AM
Last activity: Aug 21, 2024, 09:52 AM