Sample Header Ad - 728x90

How to "merge" rows along with their foreign many-to-many relations without violating unique constraints?

3 votes
1 answer
276 views
Fiddle: https://dbfiddle.uk/-JLFuIrN ## Table
CREATE TABLE files (
  id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  name text
);

CREATE TABLE folders (
  id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  name text
);

CREATE TABLE file_folders (
  id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  file_id bigint NOT NULL REFERENCES files,
  folder_id bigint NOT NULL REFERENCES folders,
  UNIQUE (file_id, folder_id)
);
## Query
/*
  Merges
*/

WITH targets AS (
  SELECT 
    ARRAY (
      SELECT
        id
      FROM
        folders TABLESAMPLE BERNOULLI (50)
      LIMIT 3
    ) AS folders
),
-- basically a setup to ensure unique target/folder pairs
-- and no targets in the merges
input_folders AS (
  SELECT
    folders.id AS folder_id,
    random_array_element(targets.folders) AS target_id
  FROM
    folders
    CROSS JOIN
    targets
  WHERE
    NOT ( 
      folders.id = ANY (targets.folders)
    ) 
),
input_files AS (
  SELECT
    file_folders.id,
    file_folders.folder_id,
    file_folders.file_id,
    input_folders.target_id
  FROM
    input_folders
    INNER JOIN
    file_folders
    ON
      input_folders.folder_id = file_folders.folder_id
      OR
      input_folders.target_id = file_folders.folder_id
),
deleted_files AS (
    WITH deletions AS (
    SELECT
      inputs.id
    FROM
      input_files AS inputs
      INNER JOIN
      input_files AS targets
      ON
        NOT (inputs.folder_id = targets.target_id)
        AND
        inputs.file_id = targets.file_id
  )
  DELETE
  FROM
    file_folders
  WHERE
    id IN (
      SELECT
        id
      FROM
        deletions
    )
),
merged_files AS (
  WITH merges AS (
    SELECT
      inputs.id,
      inputs.folder_id,
      inputs.target_id
    FROM
      input_files AS inputs
      INNER JOIN
      input_files AS targets
      ON
        NOT (inputs.folder_id = targets.target_id)
        AND
        NOT (inputs.file_id = targets.file_id)
  )
  UPDATE file_folders
  SET
    folder_id = merges.target_id
  FROM
    merges
  WHERE
    merges.id = file_folders.id
),
deleted_folders AS (
  DELETE
  FROM
    folders
  WHERE
    id IN (
      SELECT DISTINCT
        folder_id
      FROM
        input_folders
    )
)
SELECT
  folders AS targets
FROM
  targets
;
## Inputs The array-transforming setup is me trying to replicate the JSON input of the application in pure SQL. The input looks like this:
interface IQueryInput extends Array {};

interface IMergeInput {
  target: IEntityID;
  inputs: IEntityID[];
};

// postgresql bigints are treated as strings in the application
type IEntityID = string;
So the prepping query from above can be replaced with:
WITH inputs AS (
  SELECT
    input.*
  FROM
    -- the application interpolates JSON there
    json_to_recordset($inputs$$inputs$) AS input(
      target bigint,
      inputs bigint[]
    )
),
input_folders AS (
  SELECT
    inputs.target AS target_id,
    merge.folder_id
  FROM
    inputs
    CROSS JOIN
    UNNEST(inputs.inputs) AS merge(
      folder_id
    )
)
It must run as a batch operation, so the application provides these guaranties for the query input: - all target values are unique. - all inputs concatenated result in unique values. - target values do not intersect with concatenated inputs. Therefore input_folders always ends up as unique target_id-folder_id pairs. The query is ran as a background task, so the speed and memory are of secondary importance. The main requirement is of a typical transaction: either it should go through completely on success or reject completely on any error. ## The problem I want to "merge" several folders into a single folder. So given a target_id and an array of folder_ids, replace all foreign references to folder_ids with target_id and remove non-target folder afterwards.
This however becomes an issue in relations table with unique constraints, since after updating the references there are duplicates.
So I went this path: 1. Select all relation rows related to the query, so all file_folders with target_ids and folder_ids in them. 2. Separate them into two categories: - Deletes - the rows which will result in dupes when updated. - Merges - the rows which will not result in. 3. Delete the delete candidates. 4. Update the merge candidates. 5. Repeat previous 4 steps for all relations. 6. Delete rows in folders with folder_ids. However I still stumble upon unique key violation error.
"Merge" is in quotes because it doesn't look like what I am trying to do can be accomplished by [merge in docs](https://www.postgresql.org/docs/15/sql-merge.html) and it requires a newer version of postgresql anyway.
Asked by Biller Builder (288 rep)
Jan 13, 2023, 02:47 PM
Last activity: May 17, 2025, 05:08 PM