How to "merge" rows along with their foreign many-to-many relations without violating unique constraints?
3
votes
1
answer
276
views
Fiddle: https://dbfiddle.uk/-JLFuIrN
## Table
This however becomes an issue in relations table with unique constraints, since after updating the references there are duplicates.
So I went this path: 1. Select all relation rows related to the query, so all
"Merge" is in quotes because it doesn't look like what I am trying to do can be accomplished by [merge in docs](https://www.postgresql.org/docs/15/sql-merge.html) and it requires a newer version of postgresql anyway.
CREATE TABLE files (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name text
);
CREATE TABLE folders (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name text
);
CREATE TABLE file_folders (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
file_id bigint NOT NULL REFERENCES files,
folder_id bigint NOT NULL REFERENCES folders,
UNIQUE (file_id, folder_id)
);
## Query
/*
Merges
*/
WITH targets AS (
SELECT
ARRAY (
SELECT
id
FROM
folders TABLESAMPLE BERNOULLI (50)
LIMIT 3
) AS folders
),
-- basically a setup to ensure unique target/folder pairs
-- and no targets in the merges
input_folders AS (
SELECT
folders.id AS folder_id,
random_array_element(targets.folders) AS target_id
FROM
folders
CROSS JOIN
targets
WHERE
NOT (
folders.id = ANY (targets.folders)
)
),
input_files AS (
SELECT
file_folders.id,
file_folders.folder_id,
file_folders.file_id,
input_folders.target_id
FROM
input_folders
INNER JOIN
file_folders
ON
input_folders.folder_id = file_folders.folder_id
OR
input_folders.target_id = file_folders.folder_id
),
deleted_files AS (
WITH deletions AS (
SELECT
inputs.id
FROM
input_files AS inputs
INNER JOIN
input_files AS targets
ON
NOT (inputs.folder_id = targets.target_id)
AND
inputs.file_id = targets.file_id
)
DELETE
FROM
file_folders
WHERE
id IN (
SELECT
id
FROM
deletions
)
),
merged_files AS (
WITH merges AS (
SELECT
inputs.id,
inputs.folder_id,
inputs.target_id
FROM
input_files AS inputs
INNER JOIN
input_files AS targets
ON
NOT (inputs.folder_id = targets.target_id)
AND
NOT (inputs.file_id = targets.file_id)
)
UPDATE file_folders
SET
folder_id = merges.target_id
FROM
merges
WHERE
merges.id = file_folders.id
),
deleted_folders AS (
DELETE
FROM
folders
WHERE
id IN (
SELECT DISTINCT
folder_id
FROM
input_folders
)
)
SELECT
folders AS targets
FROM
targets
;
## Inputs
The array-transforming setup is me trying to replicate the JSON input of the application in pure SQL.
The input looks like this:
interface IQueryInput extends Array {};
interface IMergeInput {
target: IEntityID;
inputs: IEntityID[];
};
// postgresql bigints are treated as strings in the application
type IEntityID = string;
So the prepping query from above can be replaced with:
WITH inputs AS (
SELECT
input.*
FROM
-- the application interpolates JSON there
json_to_recordset($inputs$$inputs$) AS input(
target bigint,
inputs bigint[]
)
),
input_folders AS (
SELECT
inputs.target AS target_id,
merge.folder_id
FROM
inputs
CROSS JOIN
UNNEST(inputs.inputs) AS merge(
folder_id
)
)
It must run as a batch operation, so the application provides these guaranties for the query input:
- all target
values are unique.
- all inputs
concatenated result in unique values.
- target
values do not intersect with concatenated inputs
.
Therefore input_folders
always ends up as unique target_id
-folder_id
pairs.
The query is ran as a background task, so the speed and memory are of secondary importance.
The main requirement is of a typical transaction: either it should go through completely on success or reject completely on any error.
## The problem
I want to "merge" several folders into a single folder. So given a target_id
and an array of folder_id
s, replace all foreign references to folder_id
s with target_id
and remove non-target folder afterwards.This however becomes an issue in relations table with unique constraints, since after updating the references there are duplicates.
So I went this path: 1. Select all relation rows related to the query, so all
file_folders
with target_id
s and folder_id
s in them.
2. Separate them into two categories:
- Deletes
- the rows which will result in dupes when updated.
- Merges
- the rows which will not result in.
3. Delete the delete candidates.
4. Update the merge candidates.
5. Repeat previous 4 steps for all relations.
6. Delete rows in folders
with folder_id
s.
However I still stumble upon unique key violation error."Merge" is in quotes because it doesn't look like what I am trying to do can be accomplished by [merge in docs](https://www.postgresql.org/docs/15/sql-merge.html) and it requires a newer version of postgresql anyway.
Asked by Biller Builder
(288 rep)
Jan 13, 2023, 02:47 PM
Last activity: May 17, 2025, 05:08 PM
Last activity: May 17, 2025, 05:08 PM