Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

1 votes

1 answers

70 views

How can I version my knowledge database?

I have a large number of documents (>= 1 million or more). I need to implement snapshot-style versioning, but I can't figure it out. It should be able to take snapshots from anywhere. (like git?) For example, if have Snap1, add data, take Snap2, and then roll back to Snap1, add data, and take Snap3....

Snap1 ---- Snap2
      ---- Snap3 ---- Snap 4

How can I design such a structure? I was thinking of creating 1 million snapshot id - document ids every time a snapshot is created and then joining them, but that doesn't seem like a good idea. In more detail, this should be possible to implement in milvus (of course, snapshot metadata would probably have to be stored in RDBMS/NoSQL).

user212942 (61 rep)

Jun 2, 2025, 05:25 PM • Last activity: Jun 7, 2025, 07:00 AM

2 votes

2 answers

2040 views

Identify the version of a collation from ICU in Postgres

postgresql collation data-versioning international-components-unicode

Postgres 10 and later incorporates the [International Components for Unicode (ICU)][1] library for text-handling and other internationalization issues. Changes happen to human languages, such as sorting rules evolving. Therefore collation definitions are versioned in ICU. ➥ How can I identify the ve...

                                  Postgres 10 and later incorporates the International Components for Unicode (ICU)  library for text-handling and other internationalization issues. 

Changes happen to human languages, such as sorting rules evolving. Therefore collation definitions are versioned in ICU. 

➥ How can I identify the version number of a collation? 

I know Postgres will report  version number as part of a collation version mismatch error message. Is there a way to proactively query for the version rather than wait for an error message?

Basil Bourque (11188 rep)

Aug 4, 2019, 03:16 AM • Last activity: Mar 11, 2025, 06:36 AM

2 votes

3 answers

937 views

how best to have a record versioning or tracking in relational database?

oracle design-pattern data-versioning

The scenario involves maintaining versions of a 'Product' object (e.g., Product table) and creating new records when the product changes to the latest version (e.g. v2 from v1). It is crucial not to update existing product records, as other objects might reference them. The goal is to update referri...

                                  The scenario involves maintaining versions of a 'Product' object (e.g., Product table) and creating new records when the product changes to the latest version (e.g. v2 from v1). It is crucial not to update existing product records, as other objects might reference them. The goal is to update referring objects (those needing v2) while preserving the old versions (v1) for objects still using them. And at the same time, we need to keep track of how the product has evolved i.e. v1->v2->v3

I've considered two potential approaches:
1. Versioning : Assigning version labels (e.g., V1, V2) to the object to track and reference old versions if necessary.
2. LinkedList (or doubly LinkedList): Each new object includes an 'old_object_ID' pointing to the previous object's ID, enabling tracking of all past objects.

Of course, there are pros and cons of both approaches like data redundancy, complex queries, and easy navigation. 

At the same time, we need to consider Query Performance, Concurrency, Referential Integrity, Auditing, and History

**Are there alternative or more effective strategies/patterns for achieving this versioning and referencing mechanism in relational databases?**

Enigma (121 rep)

Mar 4, 2024, 09:05 AM • Last activity: Jun 23, 2024, 07:25 AM

0 votes

1 answers

44 views

How to model edited linked records of a multi-schema entity while preserving the original linked records?

database-design design-pattern data-versioning

If you have a base class `parent`. `parent` can have multiple `child` records, a one-to-many relationship. ``` parent schema: +-----------+ | parent_id | +-----------+ child schema: +-----------+----------+ | parent_id | child_id | +-----------+----------+ ``` Then `parent` is used in another entity...

If you have a base class parent. parent can have multiple child records, a one-to-many relationship.

parent schema:
+-----------+
| parent_id |
+-----------+

child schema:
+-----------+----------+
| parent_id | child_id |
+-----------+----------+

Then parent is used in another entity, table x, that can have many parents, again in a one-to-many relationship:

x_parent schema:
+------+-----------+
| x_id | parent_id |
+------+-----------+

Users can edit entities from table x to add and remove parents. While doing this, they can also edit the child records of the parent, however these changes need to be saved separately (as edits of x), not modifying the original parent or its children. How should this be modeled to keep the data normalized, and also to know which records to use - the original child records, or some edited ones? This is the trickiest part to me, as the absence of edited records could mean the user removed all the original child records, or has simply not edited the child records of parent. EG table x_child could look like this:

+------+-----------+--------------------+
| x_id | parent_id | child_edited_value |
+------+-----------+--------------------+

Is it a good idea to stick an edited column on the x_parent table, indicating to query for x_child rows instead of simply left joining x_parent with it's child records?

Henry (213 rep)

Jan 9, 2024, 03:49 PM • Last activity: Jan 15, 2024, 02:21 AM

11 votes

6 answers

3207 views

Is it bad practice to have a "most_recent" boolean column in addition to a "create_at" timestamp column to track the latest version of a record?

best-practices sorting data-versioning

The table looks like this, it's SCD type 2: ``` +-----------+------------------+------------------------+ | id (text) | version (serial) | created_at (timestamp) | +-----------+------------------+------------------------+ ``` For 99% of queries we will be searching the entire table and filtering by...

The table looks like this, it's SCD type 2:

+-----------+------------------+------------------------+
| id (text) | version (serial) | created_at (timestamp) |
+-----------+------------------+------------------------+

For 99% of queries we will be searching the entire table and filtering by additional columns and join tables. For these queries we're only interested in the most recent version of a record per unique ID. We will also be sorting by created_at and other columns. To make it easy to find the most current records I was considering to add a most_recent (boolean) column as described in the answer here: https://stackoverflow.com/questions/34495479/add-constraint-to-make-column-unique-per-group-of-rows/34495621#34495621 However I realized we already have the created_at column which tells us this information - we could use a DISTINCT clause in our search queries and order by created_date as described by @Svet's answer here: https://stackoverflow.com/questions/17327043/how-can-i-select-rows-with-most-recent-timestamp-for-each-key-value However, we'd then have to re-order the results by the column that we actually want to use to show the data. It seems simpler in the long run to add the extra 'current' field, and like it would be more performant, but is it also bad practice?

Henry (213 rep)

Jan 11, 2024, 07:19 PM • Last activity: Jan 14, 2024, 12:43 AM

0 votes

1 answers

49 views

Timestamp consistency of dependent system-version tables

mysql mariadb data-versioning

I want to system-version some items that consists of data in forms of `blob`s as well as metadata. I plan to use two tables, say `Metadata` and `Data`, to store them. `Data` has a `metadata_id` to refer to the metadata. (It could be the other way round actually, I am not sure which way around is mor...

                                  I want to system-version some items that consists of data in forms of blobs as well as metadata.

I plan to use two tables, say Metadata and Data, to store them. Data has a metadata_id to refer to the metadata. (It could be the other way round actually, I am not sure which way around is more efficient in practice).

I want to make sure that when I query the history of the DB, I get a consistent DB. However that is not the case, because the timestamps that are present in Data and Metadata are not consistent, even when they are jointly filled within a transaction. (Sure, the time delta might be small, but I am looking for guaranteed perfect equality here).

How is this situation handled in practice? Should system versioning be used only on tables that do not need to be filled in a transaction along with other tables? Should I put my entries in a single row despite having large blobs? Should I implement my own versioning system?

user209974 (268 rep)

Aug 9, 2023, 02:59 PM • Last activity: Aug 9, 2023, 04:47 PM

1 votes

0 answers

32 views

sdc type 2 and fk between dimension tables

data-versioning

this kind of argument has been discussed many times but I can't find a precise answer. I have two dimension tables, say Customer and Address, that i want to model after scd type 2 pattern, with valid_from and valid_to fields, and address having a fk to customer. The question is: when Customer change...

                                  this kind of argument has been discussed many times but I can't find a precise answer. 
I have two dimension tables, say Customer and Address, that i want to model after scd type 2 pattern, with valid_from and valid_to fields, and address having a fk to customer.

The question is: when Customer changes, creating a new record and a new synthetic primary key, do I need to create a new record for the linked addresses or should I model those data differently? In my case I don't think I can put everything in the same table because the real case is a little more complex.

Another solution is to create an intermediary table, called CustomerAddress that would contain only primary key mappings, so for example if a customer changes I would create also a new record in CustomerAddress copying the old address pk and putting the new customer pk. I don't really like this solution though because it is probably hard to maintain (and in my case expose to the sqlalchemy orm)

thanks

DRC (111 rep)

Mar 5, 2023, 08:53 PM

1 votes

1 answers

584 views

How to check if a table is system versioned for MariaDB?

mariadb temporal-tables data-versioning

How to check if a table is system versioned for MariaDB ? This post refers to same question on "SQL Server 2016" https://dba.stackexchange.com/questions/218517/how-to-check-if-the-system-versioning-for-a-table-is-on I've tried this syntax on MariaDB, but I got error. I guess "TableTemporalType", is...

                                  How to check if a table is system versioned for MariaDB ?

This post refers to same question on "SQL Server 2016"

https://dba.stackexchange.com/questions/218517/how-to-check-if-the-system-versioning-for-a-table-is-on 

I've tried this syntax on MariaDB, but I got error.

I guess "TableTemporalType", is MS specific

(see https://learn.microsoft.com/en-us/sql/t-sql/functions/objectproperty-transact-sql?view=sql-server-ver15  )

Anyone knows how can be done in MariaDB ?

Thanks for your help.

John D. (13 rep)

Jan 16, 2023, 11:24 AM • Last activity: Jan 16, 2023, 11:43 AM

3 votes

1 answers

2972 views

Updating table of versioned rows with historical records in PostgreSQL

postgresql cte data-versioning range-types

I have a master table of versioned rows: CREATE TABLE master ( id SERIAL PRIMARY KEY, rec_id integer, val text, valid_on date[], valid_during daterange ); INSERT INTO master (rec_id, val, valid_on, valid_during) VALUES (1, 'a', '{2015-01-01,2015-01-05}', '[2015-01-01,infinity)'), (2, 'b', '{2015-01-...

                                  I have a master table of versioned rows:

    CREATE TABLE master (
        id SERIAL PRIMARY KEY,
        rec_id integer, 
        val text, 
        valid_on date[], 
        valid_during daterange
    );
    
    INSERT INTO master (rec_id, val, valid_on, valid_during) VALUES
        (1, 'a', '{2015-01-01,2015-01-05}', '[2015-01-01,infinity)'),
        (2, 'b', '{2015-01-01,2015-01-05}', '[2015-01-01,infinity)'),
        (3, 'c', '{2015-01-01,2015-01-05}', '[2015-01-01,infinity)');
    
    SELECT * FROM master ORDER BY rec_id, id;
    /*
         id | rec_id | val |        valid_on         |     valid_during
        ----+--------+-----+-------------------------+-----------------------
          1 |      1 | a   | {2015-01-01,2015-01-05} | [2015-01-01,infinity)
          2 |      2 | b   | {2015-01-01,2015-01-05} | [2015-01-01,infinity)
          3 |      3 | c   | {2015-01-01,2015-01-05} | [2015-01-01,infinity)
    */

The rec_id is a the record's natural key, the valid_on is an array of dates on which the record was valid, and the valid_during is a date range describing the interval during which the record is valid. (The upper bound on the valid_during is 'infinity' if there is no record with the same rec_id with a more recent valid_on value.)

Given a second table of updated records, along with new dates on which each record was valid:

    CREATE TABLE updates (id SERIAL PRIMARY KEY, rec_id integer, val text, valid_on date); 
    INSERT INTO updates (rec_id, val, valid_on) VALUES
    (1, 'a', '2015-01-03'), -- (1) same "val" for id 1, just add valid_on date
    (2, 'd', '2015-01-06'), -- (2) different val for id 2,
    (3, 'e', '2015-01-03'); -- (3) different val for id 3 with new date 
                            --     intersecting old date range
    
    SELECT * FROM updates;
    /*
         id | rec_id | val |  valid_on
        ----+--------+-----+------------
          1 |      1 | a   | 2015-01-03
          2 |      2 | d   | 2015-01-06
          3 |      3 | e   | 2015-01-03
    */

I would like to insert/update the master table to wind up with something like this:

    -- The goal
    SELECT rec_id, val, valid_on, valid_during FROM master ORDER BY rec_id, id;
    /*
         rec_id | val |        valid_on                    |     valid_during
        --------+-----+------------------------------------+-----------------------
          1     | a   | {2015-01-01,2015-01-05,2015-01-03} | [2015-01-01,infinity)
          2     | b   | {2015-01-01,2015-01-05}            | [2015-01-01,2015-01-06)
          2     | d   | {2015-01-06}                       | [2015-01-06,infinity)
          3     | c   | {2015-01-01}                       | [2015-01-01,2015-01-03)
          3     | e   | {2015-01-03}                       | [2015-01-03,2015-01-05)
          3     | c   | {2015-01-05}                       | [2015-01-05,infinity)
    */

Specifically:

 - If a new record's rec_id exists in the master table with the same val, but the new valid_on date is not in the valid_on array in the master, simply add the new date to the master table's valid_on field (see rec_id 1)
 - If a new record's rec_id exists with a different val, insert the new record into the master table. The old record in the master table should have its valid_during value end on the date of the new record's valid_on (see rec_id 2)
 - If the new record's valid_on date intersects the old record's valid_during range, the old record should appear on both "sides" of the updated record (see rec_id 3)

I can get *most* of the way there. The first case is straightforward: we just need to update the valid_on field in the master table (we'll worry about the valid_during field momentarily in a separate step):

    UPDATE master m
    SET valid_on = m.valid_on || u.valid_on
    FROM updates u
    WHERE m.rec_id = u.rec_id 
    	AND m.val = u.val 
    	AND NOT m.valid_on @> ARRAY[u.valid_on];
    
    SELECT * FROM master ORDER BY rec_id, id;
    /*
         id | rec_id | val |              valid_on              |     valid_during
        ----+--------+-----+------------------------------------+-----------------------
          1 |      1 | a   | {2015-01-01,2015-01-05,2015-01-03} | [2015-01-01,infinity)
          2 |      2 | b   | {2015-01-01,2015-01-05}            | [2015-01-01,infinity)
          3 |      3 | c   | {2015-01-01,2015-01-05}            | [2015-01-01,infinity)
    */

For case #2, we can do a simple insert:

    INSERT INTO master (rec_id, val, valid_on)
    SELECT u.rec_id, u.val, ARRAY[u.valid_on]
    FROM updates u 
        LEFT JOIN master m ON u.rec_id = m.rec_id AND u.val = m.val
    WHERE m.id IS NULL;
    
    SELECT * FROM master ORDER BY rec_id, id;
    /*
         id | rec_id | val |              valid_on              |     valid_during
        ----+--------+-----+------------------------------------+-----------------------
          1 |      1 | a   | {2015-01-01,2015-01-05,2015-01-03} | [2015-01-01,infinity)
          2 |      2 | b   | {2015-01-01,2015-01-05}            | [2015-01-01,infinity)
          4 |      2 | d   | {2015-01-06}                       |
          3 |      3 | c   | {2015-01-01,2015-01-05}            | [2015-01-01,infinity)
          5 |      3 | e   | {2015-01-03}                       |
    */

Now, we can correct the valid_during range in one pass by joining on a subquery which uses a window function that checks for the next valid date for a record with the same rec_id:

    -- Helper function...
    CREATE OR REPLACE FUNCTION arraymin(anyarray) 
    RETURNS anyelement AS $$ 
        SELECT min($1[i]) 
        FROM generate_series(array_lower($1,1), array_upper($1,1)) g(i); 
    $$ language sql immutable strict; 


    UPDATE master m
    SET valid_during = daterange(arraymin(valid_on), new_valid_until)
    FROM (
        SELECT
            id,
            lead(arraymin(valid_on), 1, 'infinity'::date)
    		OVER (partition by rec_id ORDER BY arraymin(valid_on)) AS new_valid_until
        FROM master ) t
    WHERE
        m.id = t.id;
    
    SELECT * FROM master ORDER BY rec_id, id;
    /*
         id | rec_id | val |              valid_on              |      valid_during
        ----+--------+-----+------------------------------------+-------------------------
          1 |      1 | a   | {2015-01-01,2015-01-05,2015-01-03} | [2015-01-01,infinity)
          2 |      2 | b   | {2015-01-01,2015-01-05}            | [2015-01-01,2015-01-06)
          4 |      2 | d   | {2015-01-06}                       | [2015-01-06,infinity)
          3 |      3 | c   | {2015-01-01,2015-01-05}            | [2015-01-01,2015-01-03)
          5 |      3 | e   | {2015-01-03}                       | [2015-01-03,infinity)
    */

And here's where I'm stuck: rec_id 1 and 2 are exactly what I want, but rec_id 3 needs to be inserted again to appear valid on '2015-01-05'. I can't seem to wrap my head around the array operation to perform that insert. Any thoughts on approaches that don't involve unnesting the master table? Or is that the only/best approach here?

I'm using PostgreSQL 9.3 (but would happily upgrade to 9.4 if there's a graceful way to do this in the newer version).
                                

danpelota (133 rep)

Mar 20, 2015, 09:19 PM • Last activity: Mar 5, 2022, 01:11 AM

0 votes

0 answers

100 views

Horizontal scaling with database schema

data-versioning schema-migration

I am designing a database schema for our customers on-premise, which will be moved to Azure PaaS later. Regarding the "scale out" strategies I lean towards to create "one database", and create schema for each and every customer to store the database objects such as tables, procedures, and views. Alt...

                                  I am designing a database schema for our customers on-premise, which will be moved to Azure PaaS later.

Regarding the "scale out" strategies I lean towards to create "one database", and create schema for each and every customer to store the database objects such as tables, procedures, and views.

Although, I find little to no information how to keep track of schema changes in each and every schema, and put a version number on each.
Tools such as Liquibase or Fluentmigrator has limited support for such it seems, as they seems to be better for a single database.

What are the best practices to version, and manage database schemas in a database?

Avi (265 rep)

Jan 17, 2022, 06:15 PM

4 votes

0 answers

220 views

LIFO with stock splits

mariadb data-versioning

## Background I am managing a database containing my users' trading of financial products, stocks bonds, ETFs etc. The system offers simple bookkeeping functionality (create portfolio, create instrument, book a trade, book a cash flow ...) for my users, and some bulk features for the data admin user. The system keeps track of trade / position performance ("pnl", profit-and-loss) and values sell-trades using the FIFO method, i.e. outgoing stock is valued at 'oldest' prices. ## Problem statement Thru so-called corporate actions, a company manipulates their quantity of outstanding stock. An N-for-1 stock split, for example, results in N times the number of stock outstanding (at a price of 1/N of the pre-Split price), and the same holds for the number of stock in a given portfolio of a user, e.g. their quantities double during at a 2-for-1-split. Thus, the meaning of quantity may change over time, complicating the FIFO method in my database. I want to reconcile the ability to value a sale trade using the **FIFO method** with the concept of stock splits. ## Constraints - hard constraint: Historical entries must be recoverable 'as is': Users want to see their historical trades' units as of that date. - soft constraint: Data changes should be as local as possible, and be reversible: There should be no additional effect of adjusting for the 'same' stock split twice. This is very a small project with <10 users and no interaction between the users' trades, concurrency is not an issue, I guess. ## Status Quo For simplicity, I have dropped some entities such as portfolio, instrument etc. below. In order to be able to calculate the FIFO value for the position in a financial asset (think: stocks), my system contains a table called

that holds the history of each trade with trade quantity qty, unit price prcand a field qty_allotable that tells me how many units of each *buy* trade have not yet been sold off (i.e. not yet **allotted** to a sale). The relevant fields for my problem are:

lang-sql
CREATE TABLE db.trade (
   ,valuedate       DATE                    NOT NULL
   ,qty             DECIMAL(16,6)           NOT NULL
   ,prc             DECIMAL(12,6) UNSIGNED  NOT NULL
  ,qty_allotable    DECIMAL(16,6) UNSIGNED  NOT NULL
);

For example, let's buy 6 units of some stock on one day and another 4 units the next day (via some stored procedure not depicted here) valuedate qty prc qty_allotable 2021-08-02 6 10.00 6 2021-08-03 4 11.00 4 On day 3, we sell 7 units. Via FIFO, we value the outgoing units at 6 x 10 + 1 x 11 = 71.00 where the FIFO algorithm is implemented in some function, and we **deplete** the allotable quantity after the sale: valuedate qty prc qty_allotable 2021-08-02 5 10.00 0 2021-08-03 5 11.00 3 2021-08-04 -7 12.00 0 -- a sale is not allotable itself where the table update algorithm is performed by some stored procedure. At each point in time, the total invested volume is a simple SELECT(prc * qty_allotable) FROM .... NB.: The setup requires trades to be entered (truly) in sequential order. If a user 'forgets' to enter a historical trade, I rebuild the whole trade table for the affected combination of user and instrument... ## Increased complexity due to stock splits My question now is how to properly treat events that change the significance of a stock's quantity, e.g. a stock split. To make it tangible, say there's a 2-for-1 stock split on 2021-08-05, i.e. my remaining three units become six units (and the trading price halves), i.e. my **position** now is qty unit_prc 6 5.50 -- doubling the total qty, halving the prices. ## Perceived solutions How should I incorporate this effect in the database? I think that I have identified a couple of options to follow, #### 1. Fire and forget WITH SYSTEM VERSIONING Introduce versioning to the trade table and simply update the **historical** prc and qty_allotable fields by the split factor. Although I can retrieve original trades with this setup, I cannot 'roll back' to an earlier state without risking loss of trade data entered in the mean time. #### 2. Introduce a table corporate_actions and adjusted business logic. By introducing a table with an appropriate uniqueness index

lang-sql
CREATE TABLE db.corporate_action(
   ,valuedate       DATE                    NOT NULL
   ,split_factor    DECIMAL(12,6) UNSIGNED  NOT NULL
);
CREATE UNIQUE INDEX idx_ca_unq ON db.corporate_action(value_date);

In our example, the table could look like this: valuedate split_factor 1970-01-01 1.000000 -- auto entry per instrument 2021-08-05 2.000000 2099-12-31 1.000000 -- auto entry per instrument This way, I make sure that a 'double entry' cannot take place (thru the uniqueness index). In order to apply the FIFO method, I must adjust the historical quantities and prices through some view on the corporate_actions table, e.g.

lang-sql
SELECT 
     c1.valuedate
    ,EXP(-SUM(LOG(c2.split_factor))) as compound_factor
FROM       corporate_action c1 
INNER JOIN corporate_action c2 
        ON c1.valuedate <= c2.valuedate 
GROUP BY   c1.valuedate;

resulting in valuedate compound_factor 1970-01-01 0.5 2021-08-05 0.5 2099-12-31 1 Then, when calculating the FIFO price, I'd calculate prc*compound_factor and qty/compound_factor in order to properly calculate the outgoing value in a sell trade. ## Question Do you think that either of the two perceived solutions is sensible, or can you propose another solution?

Kermittfrog (141 rep)

Aug 28, 2021, 12:12 PM • Last activity: Aug 28, 2021, 06:11 PM

0 votes

1 answers

1665 views

How to eliminate duplicates made by recursive CTE

postgresql cte duplication recursive data-versioning

I have following schema in PostgreSQL 12 (objects and their relations are versioned by date ranges): CREATE TABLE tmp_deps ( id bigint, code text, name text, start_date date, end_date date ); CREATE TABLE tmp_dep_parents ( department_id bigint, parent_id bigint, start_date date, end_date date ); INS...

                                  I have following schema in PostgreSQL 12 (objects and their relations are versioned by date ranges):

    CREATE TABLE tmp_deps (
    	id bigint,
    	code text,
    	name text,
    	start_date date,
    	end_date date
    );
    
    CREATE TABLE tmp_dep_parents (
    	department_id bigint,
    	parent_id bigint,
    	start_date date,
    	end_date date
    );
    
    INSERT INTO tmp_deps (id, code, name, start_date, end_date)
    VALUES 
    	(1, '11111111', 'Root Dep', '1970-01-01', '9999-12-31'),
    	(2, '11111112', 'Sub Root Dep', '1970-01-01', '9999-12-31'),
    	(3, '11111113', 'Dep 1', '1970-01-01', '2021-02-23'),
    	(3, '11111113', 'Dep 1', '2021-02-24', '9999-12-31'),
    	(4, '11111114', 'Dep 2', '1970-01-01', '2020-01-01'),
    	(4, '11111114', 'Dep 2', '2020-01-02', '9999-12-31'),
    	(5, '11111115', 'Sub Dep 1', '1970-01-01', '9999-12-31'),
    	(6, '11111116', 'Sub Dep 2', '1970-01-01', '9999-12-31');
    
    INSERT INTO tmp_dep_parents (department_id, parent_id, start_date, end_date)
    VALUES 
    	(2, 1, '1970-01-01', '9999-12-31'),
    	(3, 2, '1970-01-01', '2005-12-31'),
    	(3, 1, '2006-01-01', '2006-12-31'),
    	(3, 2, '2007-01-01', '9999-12-31'),
    	(4, 2, '1970-01-01', '2010-05-01'),
    	(4, 3, '2010-05-02', '2010-05-03'),
    	(4, 2, '2010-05-04', '9999-12-31'),
    	(5, 3, '1970-01-01', '9999-12-31'),
    	(6, 4, '1970-01-01', '9999-12-31');

> objects and their relations are versioned by date ranges

It means that object with id=1 can be named as "Root Department" from 1 Jan 2000 to 31 Dec 2021, and can be named as "Super Department" from 1 Jan 2022 to 31 Dec 9999. 

Object with id=3 can  belong to object with id=2 from 1 Jan 1970 to 15 May 2003 and can belong to object with id=1 from 16 May 2003 to 31 Dec 9999

---

The hierarchy of objects can be changed at any time.
I am using recursive CTE to create MATERIALIZED VIEW and I noticed duplicates when I filter objects and their relations by date. And also I need an ability to view historical data too that is the reason why I can't just filter objects and their relations by date inside CTE.
My CTE is:

    WITH RECURSIVE deps AS (
    	SELECT
    		dep.id,
    		dep.start_date as dep_start_date,
    		dep.end_date as dep_end_date,
    		dep.name,
    		dep.code,
    		1::int AS "depth",
    
    	    dep.parent_id,
    		dep.dep_par_start_date,
    		dep.dep_par_end_date
    
    	FROM all_deps dep
    
    	WHERE dep.code = '11111111'
    
    	UNION
    
    	SELECT
    		dep.id,
    		dep.start_date as dep_start_date,
    		dep.end_date as dep_end_date,
    		dep.name,
    		dep.code,
    		s.depth + 1 AS "depth",
    
    	    dep.parent_id,
    		dep.dep_par_start_date,
    		dep.dep_par_end_date
    	FROM all_deps dep
    	INNER JOIN deps s ON dep.parent_id = s.id
    ),
    all_deps AS (
    	SELECT
    		dep.*,
    
    		dp.parent_id,
    		dp.start_date AS dep_par_start_date,
    		dp.end_date AS dep_par_end_date
    	FROM tmp_deps dep
    	LEFT JOIN tmp_dep_parents dp ON dp.department_id = dep.id
    )
    SELECT * FROM deps

The problem with duplicates occurs when I add the "depth" output in CTE.
And as I understand it correctly, the duplicates are hidden when I remove the "depth" output from CTE, but I can still get wrong data.
My CTE contains all versions of objects and their relations.
And I don't understand how to get the CTE to follow the correct hierarchy path.

---

I want to see the hierarchy of departments and the departments themselves as of 15 May 2020 or 9 June 2021 or any other date, that is why I put all data states (versions) to CTE.

Materialized view is created from RCTE:

    CREATE MATERIALIZED VIEW departments_mat_view AS
    WITH RECURSIVE deps AS (
    	SELECT
    		dep.id,
    		dep.start_date as dep_start_date,
    		dep.end_date as dep_end_date,
    		dep.name,
    		dep.code,
    		1::int AS "depth",
    
    		dep.parent_id,
    		dep.dep_par_start_date,
    		dep.dep_par_end_date
    
    	FROM all_deps dep
    
    	WHERE dep.code = '11111111'
    
    	UNION
    
    	SELECT
    		dep.id,
    		dep.start_date as dep_start_date,
    		dep.end_date as dep_end_date,
    		dep.name,
    		dep.code,
    		s.depth + 1 AS "depth",
    
    		dep.parent_id,
    		dep.dep_par_start_date,
    		dep.dep_par_end_date
    	FROM all_deps dep
    	INNER JOIN deps s ON dep.parent_id = s.id
    ),
    all_deps AS (
    	SELECT
    		dep.*,
    
    		dp.parent_id,
    		dp.start_date AS dep_par_start_date,
    		dp.end_date AS dep_par_end_date
    	FROM tmp_deps dep
    	LEFT JOIN tmp_dep_parents dp ON dp.department_id = dep.id
    )
    SELECT * FROM deps

And here is select from MATERIALIZED VIEW:

    SELECT * 
    FROM departments_mat_view
    WHERE
    	dep_start_date = '2021-06-09'
    	AND (dep_par_start_date IS NULL OR dep_par_start_date = '2021-06-09')

Here is output from CTE:

    "id"	"dep_start_date"	"dep_end_date"	"name"			"code"		"depth"	"parent_id"		"dep_par_start_date"	"dep_par_end_date"
    1		"1970-01-01"		"9999-12-31"	"Root Dep"		11111111	1		NULL			NULL					NULL
    2		"1970-01-01"		"9999-12-31"	"Sub Root Dep"	11111112	2		1				"1970-01-01"			"9999-12-31"
    3		"2021-02-24"		"9999-12-31"	"Dep 1"			11111113	3		2				"2007-01-01"			"9999-12-31"
    4		"2020-01-02"		"9999-12-31"	"Dep 2"			11111114	3		2				"2010-05-04"			"9999-12-31"
    5		"1970-01-01"		"9999-12-31"	"Sub Dep 1"		11111115	3		3				"1970-01-01"			"9999-12-31"
    5		"1970-01-01"		"9999-12-31"	"Sub Dep 1"		11111115	4		3				"1970-01-01"			"9999-12-31"
    6		"1970-01-01"		"9999-12-31"	"Sub Dep 2"		11111116	4		4				"1970-01-01"			"9999-12-31"
    6		"1970-01-01"		"9999-12-31"	"Sub Dep 2"		11111116	5		4				"1970-01-01"			"9999-12-31"

Desired output is:

    "id"	"dep_start_date"	"dep_end_date"	"name"			"code"		"depth"	"parent_id"		"dep_par_start_date"	"dep_par_end_date"
    1		"1970-01-01"		"9999-12-31"	"Root Dep"		11111111	1		NULL			NULL					NULL
    2		"1970-01-01"		"9999-12-31"	"Sub Root Dep"	11111112	2		1				"1970-01-01"			"9999-12-31"
    3		"2021-02-24"		"9999-12-31"	"Dep 1"			11111113	3		2				"2007-01-01"			"9999-12-31"
    4		"2020-01-02"		"9999-12-31"	"Dep 2"			11111114	3		2				"2010-05-04"			"9999-12-31"
    5		"1970-01-01"		"9999-12-31"	"Sub Dep 1"		11111115	4		3				"1970-01-01"			"9999-12-31"
    6		"1970-01-01"		"9999-12-31"	"Sub Dep 2"		11111116	4		4				"1970-01-01"			"9999-12-31"


                                

Dmitry K. (143 rep)

Jun 9, 2021, 08:16 PM • Last activity: Jun 11, 2021, 11:14 AM

0 votes

0 answers

52 views

DB relationships with historical tables review

database-design database-diagrams data-versioning

Idea is simple. User -> Projects -> Sprints -> Tasks, but I need to keep historical versions for them. It will fit to my requirements or there are some changes to be applied? - or different approach. [![database_proposal][1]][1] [1]: https://i.sstatic.net/uJahb.png

                                  Idea is simple. User -> Projects -> Sprints -> Tasks, but I need to keep historical versions for them. It will fit to my requirements or there are some changes to be applied? - or different approach.

p__d (113 rep)

Apr 11, 2021, 06:08 PM

6 votes

3 answers

2582 views

Building a branched versioning model for relational databases

database-design version-control data-versioning

I am database designer and at my current project I'm implementing versioning capabilities required to concurrently edit rows of data in RDBMS. The project requirements says, that data editing sessions can go on for several hours or days until performing commit. Also, conflicts are arising during sim...

                                  I am database designer and at my current project I'm implementing versioning capabilities required to concurrently edit rows of data in RDBMS. The project requirements says, that data editing sessions can go on for several hours or days until performing commit. Also, conflicts are arising during simultaneous modifying of the same data by different users should be handled with possibility of manual and semi-automatic resolution. In other words, desired editing workflow is similar to one used in document-oriented version control systems, such as SVN or Git. Therefore, traditional OLTP approaches and conflict resolution strategies (MVCC, optimistic/pessimistic locks) doesn't satisfy my constraints. I have done some observation of existing tools, that offer possibilities for branched version history and multiversion workflow:

 - ArcSDE - ESRI's ArcGIS supports versioning for geodatabases through ArcSDE  data layer;
 - Oracle Workspace Manager  - feature of Oracle Database, providing high degree of version isolation and data history management;
 - SQL:2011 temporal features, including valid time and transactional time support.

SQL:2011 doesn't solve my problem, as it offers support for "linear" history of edits, not branched I'm looking for. Solutions from ESRI and Oracle are good candidates, but I'm disappointed that both have vendor-specific interfaces for manipulating versions. It seems that at this moment nobody can offer industry standard solution for branched versioning of relational data (as SQL:2011 does for temporal tables and linear version history). As a newcoming database researcher, i want to understand:

 - is relational database community interested in developing standard models of branched data versioning and will any contribution or research in this area be valuable? (for example, standartization as it was done for temporal features in SQL2011 in the form of language improvements)
 - do developers and database designers lack for database-independent open-source middleware (similar to ArcSDE), that offers support for branched version management of relational data or it would be better to introduce such features in RDBMS itself?

I think I can try to dig deeper and propose some standard model or sublanguage to deal with Git-like versioning, but i don't know where to start.

Nipheris (161 rep)

Aug 17, 2014, 11:14 PM • Last activity: Sep 23, 2020, 07:30 AM

3 votes

0 answers

410 views

Is there a reliable way to determine the order in which Postgres transactions were committed without logical decoding?

postgresql transaction change-data-capture data-versioning mvcc

Is there a reliable way to determine the serial order in which Postgres transactions were committed without logical decoding? From what I understand `pg_xact_commit_timestamp` [can’t necessarily be relied upon for this][1] (and I’m not even sure I’ll be able to turn this on in this environment)? I d...

                                  Is there a reliable way to determine the serial order in which Postgres transactions were committed without logical decoding? From what I understand pg_xact_commit_timestamp can’t necessarily be relied upon for this  (and I’m not even sure I’ll be able to turn this on in this environment)? I don’t need to know the exact times transactions were committed - I just care about the order.

I should probably state the problem as perhaps there’s a better way to go about this. I have a series of tables which I am using to generate quotes. For any one of those quotes I need to be able to go back and prove that the correct quote was generated. Therefore for any tables that informed this calculation I also need to be able to reproduce their state at the time the point was generated. As this is derived data it makes the problem a bit harder.

My initial thought was I could generate these quotes using a REPEATABLE_READ READ ONLY transaction and record the txid_current_snapshot() so I know which transactions were still in-flight and therefore which snapshots to rule out from my history (I need to do this because with the current audit logs the timestamps are before the transaction was committed and therefore can not be used reliably to determine order). If there’s no way to reliably determine transaction ordering this solution falls apart.

This is what I’m using for my audit history: https://wiki.postgresql.org/wiki/Audit_trigger_91plus 

Streaming the WAL via logical decoding to some upstream consumer is the eventual goal, but it’s not something I have the time to do at the moment.

Jordan (151 rep)

Jul 30, 2020, 03:41 AM

1 votes

1 answers

40 views

When placing an order in a shop, should a shapshop be taken of the products in an order also?

database-design data-versioning

...to prevent messing up with data updates, such as prices, titles, of the products that are placed in an order. Namely, a customer buys 3 items: for $5, $10 and $33 and pays for them. All is well. When I as an owner of a shop step in and, before I've delivered those products to a customer, decide t...

                                  ...to prevent messing up with data updates, such as prices, titles, of the products that are placed in an order.

Namely, a customer buys 3 items: for $5, $10 and $33 and pays for them. All is well. When I as an owner of a shop step in and, before I've delivered those products to a customer, decide to descrease the price of the product, say, #2. Its new price will be $8 instead of $10. And I'll also rename the product #3 a little bit. A customer then would go to a status tracking page and they'll see updated data. Yes, they've paid and the order is being delivered, but the data has already changed in the database.   They'll be consufed. 

And so will be I in a year after I'll have changed the prices again, renamed something, etc... and decide to view history of the orders for a year.

**Question 1**: is there a practise of making a snaphot of the current prices, names, total order price and other characteristics of the products that constitute an order, as well as other details (chosen shipping rate and shipping method, taxes, discounts, etc) at the moment when it's being placed? Rather than calculating those dynamically when an "order page" is opened?

If yes, does it have a name? And are there recommendations of how to do it properly? 

**Question 2**: where and how should I store a snaphot?

option #1: 

I'd have to create multiple tables then:

  * frozen_products
  * frozen_discounts
  * frozen_shipping_method_and_rates

  etc...

that will have the **same structure** as their dynamics corresponding ones.

Laborious. Is there a better way?

option #2: 

along with an order, in the "orders" table. But how again, given the fact that an order is a single row? For instance, the products in an order is **a list**. How would I store a list with its characteristics (price, weight, colour, material, what have you) in a row in such a way that'll be more or less easy to retrieve in the future? Not as a string.

Dajaku (41 rep)

May 18, 2020, 05:48 AM • Last activity: May 21, 2020, 06:27 PM

1 votes

1 answers

1924 views

Foreign key referencing table with composite primary key when I only need 1 column association

sql-server database-design foreign-key computed-column data-versioning

So lets say we have 2 tables on SQL Server database: ``` CREATE TABLE VersionedData ( Id INT NOT NULL, Version INT NOT NULL CONSTRAINT [PK_VerData] PRIMARY KEY (Id, Version) ) ``` And: ``` CREATE TABLE PointingTable ( Id INT PRIMARY KEY, RefId INT NOT NULL CONSTRAINT [FK_Pnt_Ver] FOREIGN KEY (RefId)...

So lets say we have 2 tables on SQL Server database:

CREATE TABLE VersionedData (
    Id INT NOT NULL,
    Version INT NOT NULL

    CONSTRAINT [PK_VerData]
      PRIMARY KEY (Id, Version)
)

And:

CREATE TABLE PointingTable (
    Id INT PRIMARY KEY,
    RefId INT NOT NULL
    
    CONSTRAINT [FK_Pnt_Ver] FOREIGN KEY (RefId)
      REFERENCES VersionedData (Id)
)

My idea of this relationship is when joining these 2 particular tables I only care about latest version of VersionedData table (meaning I will ALWAYS match with maximum version for specific ID value giving me exactly 1 row that I need). Of course the code above causes error that I must reference both PK columns from PointingTable. I could add Version column for PointingTable - however I'd be stuck to that specific version even when newer one will be inserted for VersionedTable. I'm struggling to figure out how to redesign this relationship, I've thought of computed column for PointingTable that would utilize scalar function for getting max version, but I think that would be overkill, there must be a better way.

Julius Rupšys (41 rep)

Mar 21, 2020, 02:52 AM • Last activity: Mar 21, 2020, 11:52 PM

3 votes

2 answers

1047 views

Restore a SQL 7 database into SQL 2014 without in-between versions?

sql-server upgrade data-versioning

Our software is over 20 years old, and dates way back to databases such as Sybase and SQL 7. Now let's ignore the Sybase part. We have many old customers using very old versions of our software, varying between SQL 7 (2001-2002) to SQL 2014 (We support today). Technically speaking, our software can...

                                  Our software is over 20 years old, and dates way back to databases such as Sybase and SQL 7. Now let's ignore the Sybase part. We have many old customers using very old versions of our software, varying between SQL 7 (2001-2002) to SQL 2014 (We support today). Technically speaking, our software can still actually run on MSDE 2000. But that's besides the point. We enforce all updated clients to use *at least* SQL Server 2008 R2, but officially support up to 2014. And yes, we do have many old customers still using ancient versions of our software on Windows '95.

Anyway, down to the question. Currently, whenever we have one of these old customers return and wish to upgrade, we have to go through a big elaborate upgrade process. Now the scripts are the easy part. I am actually just finalizing the latest major upgrade script, which supports upgrading databases from our 2001 (SQL 7) version to our latest version. The very same script accommodates for any in-between version.

However, before I can run these scripts, I first have to take the database through a series of manual and tedious steps. I have it sped up to this process:

 1. Acquire database over internet from customer 
 2. Restore SQL 7 database directly into MSDE 2000 (on a dedicated XP machine) 
 3. Raise database compatibility to MSDE 2000 
 4. Backup database to a new file 
 5. Restore MSDE 2000 database directly into SQL Server 2008 R2 
 6. Raise database compatibility to SQL Server 2008 
 7. Backup database to a new file 
 8. Restore SQL 2008 R2 database directly into SQL Server 2014 
 9. Raise database compatibility to SQL Server 2014
 10. Run all scripts on database for software changes over time 
 11. Backup database to a new file 
 12. After installing software for customer, restore database over internet

So as you can imagine, a lot of returning customers would be a full-time job. I suppose I could build some sort of automation on this XP machine, but this is still a manual process I'm looking to eliminate. We don't want to even need to acquire their database, and many of them don't want us to see their data.

Is there a method which I can easily distribute onto our client's machine, as part of the installer, which can automatically perform the upgrade from any given database version from SQL 7 to SQL Server 2014?

----------

**EDIT**

To be a bit more clear on the installation process, this is an installer we already have capable of restoring databases directly from 2000 to 2008 R2. But we need to expand this installer to accommodate for upgrading databases ranging from MSSQL 7 to 2014. As I now understand, I can jump straight from 7 > 2005 > 2014. But that's still a hop I would like to eliminate, if possible. I would hate for our installer to load both 2005 and 2014 on our clients' machines.


                                

Jerry Dodge (353 rep)

Nov 2, 2016, 12:00 AM • Last activity: Feb 21, 2020, 08:17 PM

1 votes

1 answers

105 views

How would I maintain referential integrity across a database where I use versioned records while keeping data in sync?

mysql database-design data-synchronization data-versioning

I am building a database that consists of a few tables that are used as data sources for a form our application generates. ``` [form] [question] [responses] id id id name form_id question_id desc question response created created created modified modified modified ``` This is a loose example, but a...

I am building a database that consists of a few tables that are used as data sources for a form our application generates.

[form]     [question]       [responses]
id         id               id
name       form_id          question_id
desc       question         response
created    created          created
modified   modified         modified

This is a loose example, but a web form is generated using [form] and then questions generated using [question] table. The responses are captured in [responses] table. The problem I see with this, is if a response is captured, and then the question is modified, the historical data will be out of sync (meaning that the response would be directed at the original pre-modified question). This would be an auditors nightmare. What is the best way to approach this? I feel I need to keep some sort of version information or shadow records of some kind, but I am not sure the best way to go about it.

Barry Chapman (125 rep)

Nov 16, 2019, 02:15 PM • Last activity: Nov 18, 2019, 07:48 PM

13 votes

2 answers

12346 views

How to check if the SYSTEM_VERSIONING for a table is ON?

sql-server sql-server-2016 ddl data-versioning

I know that SQL Server 2016 let us to use the SYSTEM_VERSIONING like : CREATE TABLE EmpSalary ( EmpID int NOT NULL PRIMARY KEY CLUSTERED , SalaryAmt decimal (10,2) NULL , SalaryBeginDT datetime2 GENERATED ALWAYS AS ROW START NOT NULL , SalaryEndDT datetime2 GENERATED ALWAYS AS ROW END NOT NULL , PER...

                                  I know that SQL Server 2016 let us to use the SYSTEM_VERSIONING like :

    CREATE TABLE EmpSalary  
    (    
         EmpID int NOT NULL PRIMARY KEY CLUSTERED  
       , SalaryAmt decimal (10,2) NULL  
       , SalaryBeginDT datetime2 GENERATED ALWAYS AS ROW START NOT NULL  
       , SalaryEndDT datetime2 GENERATED ALWAYS AS ROW END NOT NULL  
       , PERIOD FOR SYSTEM_TIME (SalaryBeginDT, SalaryEndDT)     
    )    
    WITH (SYSTEM_VERSIONING = ON);

Also to deactivate this, it is only necessary to ALTER the table:

    ALTER TABLE EmpSalary SET (SYSTEM_VERSIONING = OFF );

My question is how to check if the SYSTEM_VERSIONING for a table is ON and then ALTER the table?

eandbsoftware (131 rep)

Sep 25, 2018, 02:29 PM • Last activity: Nov 9, 2019, 12:03 AM

Showing page 1 of 20 total questions