Sample Header Ad - 728x90

Updating table of versioned rows with historical records in PostgreSQL

3 votes
1 answer
2972 views
I have a master table of versioned rows: CREATE TABLE master ( id SERIAL PRIMARY KEY, rec_id integer, val text, valid_on date[], valid_during daterange ); INSERT INTO master (rec_id, val, valid_on, valid_during) VALUES (1, 'a', '{2015-01-01,2015-01-05}', '[2015-01-01,infinity)'), (2, 'b', '{2015-01-01,2015-01-05}', '[2015-01-01,infinity)'), (3, 'c', '{2015-01-01,2015-01-05}', '[2015-01-01,infinity)'); SELECT * FROM master ORDER BY rec_id, id; /* id | rec_id | val | valid_on | valid_during ----+--------+-----+-------------------------+----------------------- 1 | 1 | a | {2015-01-01,2015-01-05} | [2015-01-01,infinity) 2 | 2 | b | {2015-01-01,2015-01-05} | [2015-01-01,infinity) 3 | 3 | c | {2015-01-01,2015-01-05} | [2015-01-01,infinity) */ The rec_id is a the record's natural key, the valid_on is an array of dates on which the record was valid, and the valid_during is a date range describing the interval during which the record is valid. (The upper bound on the valid_during is 'infinity' if there is no record with the same rec_id with a more recent valid_on value.) Given a second table of updated records, along with new dates on which each record was valid: CREATE TABLE updates (id SERIAL PRIMARY KEY, rec_id integer, val text, valid_on date); INSERT INTO updates (rec_id, val, valid_on) VALUES (1, 'a', '2015-01-03'), -- (1) same "val" for id 1, just add valid_on date (2, 'd', '2015-01-06'), -- (2) different val for id 2, (3, 'e', '2015-01-03'); -- (3) different val for id 3 with new date -- intersecting old date range SELECT * FROM updates; /* id | rec_id | val | valid_on ----+--------+-----+------------ 1 | 1 | a | 2015-01-03 2 | 2 | d | 2015-01-06 3 | 3 | e | 2015-01-03 */ I would like to insert/update the master table to wind up with something like this: -- The goal SELECT rec_id, val, valid_on, valid_during FROM master ORDER BY rec_id, id; /* rec_id | val | valid_on | valid_during --------+-----+------------------------------------+----------------------- 1 | a | {2015-01-01,2015-01-05,2015-01-03} | [2015-01-01,infinity) 2 | b | {2015-01-01,2015-01-05} | [2015-01-01,2015-01-06) 2 | d | {2015-01-06} | [2015-01-06,infinity) 3 | c | {2015-01-01} | [2015-01-01,2015-01-03) 3 | e | {2015-01-03} | [2015-01-03,2015-01-05) 3 | c | {2015-01-05} | [2015-01-05,infinity) */ Specifically: - If a new record's rec_id exists in the master table with the same val, but the new valid_on date is not in the valid_on array in the master, simply add the new date to the master table's valid_on field (see rec_id 1) - If a new record's rec_id exists with a different val, insert the new record into the master table. The old record in the master table should have its valid_during value end on the date of the new record's valid_on (see rec_id 2) - If the new record's valid_on date intersects the old record's valid_during range, the old record should appear on both "sides" of the updated record (see rec_id 3) I can get *most* of the way there. The first case is straightforward: we just need to update the valid_on field in the master table (we'll worry about the valid_during field momentarily in a separate step): UPDATE master m SET valid_on = m.valid_on || u.valid_on FROM updates u WHERE m.rec_id = u.rec_id AND m.val = u.val AND NOT m.valid_on @> ARRAY[u.valid_on]; SELECT * FROM master ORDER BY rec_id, id; /* id | rec_id | val | valid_on | valid_during ----+--------+-----+------------------------------------+----------------------- 1 | 1 | a | {2015-01-01,2015-01-05,2015-01-03} | [2015-01-01,infinity) 2 | 2 | b | {2015-01-01,2015-01-05} | [2015-01-01,infinity) 3 | 3 | c | {2015-01-01,2015-01-05} | [2015-01-01,infinity) */ For case #2, we can do a simple insert: INSERT INTO master (rec_id, val, valid_on) SELECT u.rec_id, u.val, ARRAY[u.valid_on] FROM updates u LEFT JOIN master m ON u.rec_id = m.rec_id AND u.val = m.val WHERE m.id IS NULL; SELECT * FROM master ORDER BY rec_id, id; /* id | rec_id | val | valid_on | valid_during ----+--------+-----+------------------------------------+----------------------- 1 | 1 | a | {2015-01-01,2015-01-05,2015-01-03} | [2015-01-01,infinity) 2 | 2 | b | {2015-01-01,2015-01-05} | [2015-01-01,infinity) 4 | 2 | d | {2015-01-06} | 3 | 3 | c | {2015-01-01,2015-01-05} | [2015-01-01,infinity) 5 | 3 | e | {2015-01-03} | */ Now, we can correct the valid_during range in one pass by joining on a subquery which uses a window function that checks for the next valid date for a record with the same rec_id: -- Helper function... CREATE OR REPLACE FUNCTION arraymin(anyarray) RETURNS anyelement AS $$ SELECT min($1[i]) FROM generate_series(array_lower($1,1), array_upper($1,1)) g(i); $$ language sql immutable strict; UPDATE master m SET valid_during = daterange(arraymin(valid_on), new_valid_until) FROM ( SELECT id, lead(arraymin(valid_on), 1, 'infinity'::date) OVER (partition by rec_id ORDER BY arraymin(valid_on)) AS new_valid_until FROM master ) t WHERE m.id = t.id; SELECT * FROM master ORDER BY rec_id, id; /* id | rec_id | val | valid_on | valid_during ----+--------+-----+------------------------------------+------------------------- 1 | 1 | a | {2015-01-01,2015-01-05,2015-01-03} | [2015-01-01,infinity) 2 | 2 | b | {2015-01-01,2015-01-05} | [2015-01-01,2015-01-06) 4 | 2 | d | {2015-01-06} | [2015-01-06,infinity) 3 | 3 | c | {2015-01-01,2015-01-05} | [2015-01-01,2015-01-03) 5 | 3 | e | {2015-01-03} | [2015-01-03,infinity) */ And here's where I'm stuck: rec_id 1 and 2 are exactly what I want, but rec_id 3 needs to be inserted again to appear valid on '2015-01-05'. I can't seem to wrap my head around the array operation to perform that insert. Any thoughts on approaches that don't involve unnesting the master table? Or is that the only/best approach here? I'm using PostgreSQL 9.3 (but would happily upgrade to 9.4 if there's a graceful way to do this in the newer version).
Asked by danpelota (133 rep)
Mar 20, 2015, 09:19 PM
Last activity: Mar 5, 2022, 01:11 AM