Sharding a Materialized View
0
votes
1
answer
233
views
I am a Postgres 15.3 AWS RDS instance and have a database that centers around one table, called
posts
that has 20 other relations to build the full object. The full query is about a second altogether and there are indexes on all the join keys. So in order to make reads fast we have a materialized view, mv_posts
that represents the full table with ~20 joins and subselects and everything the front end needs (see below).
Here is the math that scares me, I have about 5000 rows and building the materialized view takes 2050 ms. This is not scalable. I am considering the following implementation.
Create a new materialized view mv_new_posts
that keeps new posts created that day, and join it 1-1 to the mv_posts
materialized view in a normal view and use that to query the posts object. I would then have to truncate mv_new_posts
and regenerate mv_posts
at midnight every night. This feels ugly but would scale for everything except when a posts
row is updated, which would require regenerating the base view in some form.
I have considered the following other solutions:
- Partition Tables: The issue is that I dont have the row volume for this. I have not benchmarked it but do not beleive that with my small number of rows it would work.
- pg_ivm
but the restrictions on outer (i.e. left) joins and aggregation functions make this impossible.
- Rolling my own, i.e. creating a table and triggers so that a single insert/update on a dependent table results in a single row update in the denormalized table that would replace the materialized view. This I have done on a smaller scale but do not feel would be worth the work here. In particular it would be a maintenance nightmare.
The materialized view definition is below, in case there is an obvious optimization I have missed.
CREATE MATERIALIZED VIEW mv_posts AS(
SELECT
p.*,
(
setweight(to_tsvector('english', COALESCE(p.title, '')), 'A') ||
setweight(to_tsvector('english', COALESCE(p.description, '')), 'B') ||
setweight(to_tsvector('english', COALESCE(pc.email, '')), 'A') ||
setweight(to_tsvector('english', COALESCE(d.name, '')), 'B')
) AS search_document,
CASE WHEN pt.slug = 'ping' THEN True ELSE False END AS ping,
ARRAY_REMOVE(vis.all_user_ids, NULL) AS all_visibility_user_ids,
ARRAY_REMOVE(vis.user_ids, NULL) AS visibility_user_ids,
ARRAY_REMOVE(vis.group_ids, NULL) AS visibility_group_ids,
ARRAY_REMOVE(vis.org_ids, NULL) AS visibility_org_ids,
CASE WHEN hm.id IS NULL THEN NULL ELSE ROW_TO_JSON(hm.*) END AS headline_file,
COALESCE(JSON_AGG(DISTINCT JSON_BUILD_OBJECT(
'id', pm.id,
'file_name', pm.file_name,
'content_type', pm.content_type,
'name', pm.name,
'orphan', pm.orphan,
'folder', pm.folder,
'extension', pm.extension,
'ordinal', pmj.ordinal)::JSONB
) FILTER (
WHERE NOT (pm.id::TEXT = 'null') ), '[]'::JSON
) AS file,
COALESCE(JSON_AGG(DISTINCT pmj.file_id) FILTER ( WHERE NOT (pmj.file_id IS NULL) ), '[]'::JSON) AS file_ids,
COALESCE(JSON_AGG(DISTINCT pu) FILTER ( WHERE NOT (pu.id::TEXT = 'null') ), '[]'::JSON) AS users,
COALESCE(JSON_AGG(DISTINCT pu.email) FILTER ( WHERE NOT (pu.id::TEXT = 'null') ), '[]'::JSON) AS user_emails,
pt.id AS post_type_id,
row_to_json(pt.*) AS post_type,
row_to_json(pit.*) AS incident_type,
row_to_json(pc.*) AS creator,
d.id AS org_id,
JSON_BUILD_OBJECT(
'id', d.id,
'name', d.name,
'admin_id', d.admin_id,'group_id', d.group_id,
'creator_id', d.creator_id,
'law_enforcement_partner', d.law_enforcement_partner,
'slug', d.slug,
'state', d.state,
'location', JSON_BUILD_OBJECT(
'id', dl.id,
'address', dl.address,
'city', dl.city,
'state_province', dl.state_province,
'postal_code', dl.postal_code,
'geojson', ST_AsGeoJSON(dl.geom)::JSON
)
) AS org,
ARRAY_REMOVE(ARRAY_AGG(pp.user_id),NULL) AS users_saved,
COALESCE(
JSON_AGG(
DISTINCT
JSON_BUILD_OBJECT(
'id', v.id,
'make', v.make,
'year', v.year,
'state', v.state,
'model', v.model,
'color', v.color,
'license_plate', v.license_plate,
'vin', v.vin,
'description', v.description,
'file', (
SELECT COALESCE(JSON_AGG(vm.*) FILTER ( WHERE NOT (vm.id::TEXT = 'null') ), '[]'::JSON)
FROM file vm
JOIN car_file_join vmj ON vmj.file_id = vm.id
WHERE vmj.car_id=v.id
)
)::JSONB
) FILTER ( WHERE NOT (v.id::TEXT = 'null') ), '[]'::JSON)
AS cars,COALESCE(
JSON_AGG(
DISTINCT
JSON_BUILD_OBJECT(
'id', psn.id,
'first_name', psn.first_name,
'last_name', psn.last_name,
'aliases', psn.aliases,
'description', psn.description,
'hair', psn.hair,
'build', psn.build,
'sex_id', psn.sex_id,
'age', psn.age,
'eye_color', psn.eye_color,
'height', psn.height,
'weight', psn.weight,
'race', psn.race,
'identifying_characteristics', psn.identifying_characteristics,
'country', psn.country,
'guy_type', row_to_json(psnt.*),
'location', row_to_json(psnl.*),
'file', (
SELECT COALESCE(JSON_AGG(psnm.*) FILTER ( WHERE NOT (psnm.id::TEXT = 'null') ), '[]'::JSON)
FROM file psnm
JOIN guy_file_join psnmj ON psnmj.file_id = psnm.id
WHERE psnmj.guy_id=psn.id
)
)::JSONB
) FILTER ( WHERE NOT (psn.id::TEXT = 'null') ), '[]'::JSON)
AS guys,COALESCE(
JSON_AGG(
DISTINCT
JSON_BUILD_OBJECT(
'id', l.id,
'address', l.address,
'city', l.city,
'state_province', l.state_province,
'postal_code', l.postal_code,
'geojson', ST_AsGeoJSON(l.geom)::JSON,
'location_string', l.location_string
)::JSONB
) FILTER ( WHERE NOT (l.id::TEXT = 'null') ), '[]'::JSON) AS locations,
COALESCE(
ARRAY_AGG(DISTINCT l.location_string) FILTER ( WHERE l.location_string IS NOT NULL ),
'{}'::TEXT[]
)AS location_strings,
ST_UNION( l.geom )::geography AS geom
FROM posts p
INNER JOIN visibility vis ON vis.post_id = p.id
LEFT JOIN file hm ON hm.id = p.headline_file_id
LEFT JOIN post_file_join pmj ON pmj.post_id = p.id
LEFT JOIN file pm ON pm.id=pmj.file_id
LEFT JOIN auth_users AS pc ON p.creator_id=pc.id
LEFT JOIN post_user_join puj ON puj.post_id=p.id
LEFT JOIN auth_users pu ON pu.id=puj.user_id
LEFT JOIN types pt ON pt.id=p.type_id
LEFT JOIN incident_types pit ON pit.id = p.incident_type_id
LEFT JOIN post_location_join plj ON plj.post_id = p.id
LEFT JOIN locations l ON plj.location_id = l.id
LEFT JOIN post_guy_join ppj ON ppj.post_id = p.id
LEFT JOIN guys psn ON psn.id = ppj.guy_id
LEFT JOIN guy_types psnt ON psnt.id = ppj.guy_type_id
LEFT JOIN locations psnl ON psnl.id= psn.current_location_id
LEFT JOIN post_car_join pvj ON pvj.post_id=p.id
LEFT JOIN cars v ON v.id = pvj.car_id
LEFT JOIN pinned_posts pp ON pp.post_id = p.id
LEFT JOIN orgs d ON pc.org_id = d.id
LEFT JOIN locations dl ON d.primary_location_id = dl.id
WHERE p.active
GROUP BY
p.id,
vis.all_user_ids,
vis.user_ids,
vis.org_ids,
vis.group_ids,
hm.id,
pc.id,
pt.id,
pit.id,
d.id,
dl.id);
Asked by Frank Conry
(123 rep)
Jan 28, 2024, 01:07 PM
Last activity: Jun 12, 2025, 04:09 AM
Last activity: Jun 12, 2025, 04:09 AM