Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes

1 answers

143 views

accelerating a SQL quadruple self-join with a complex alternation in the WHERE clause

The following Sqlite query has a triple self-join on table t: SELECT "update_variable", lhs_acc.name_suffix || ":" || rhs_var.name_suffix, op.span, op.path FROM t op JOIN t lhs_acc ON (lhs_acc.path GLOB op.path || "?*") JOIN t rhs_acc ON (rhs_acc.path GLOB op.path || "?*") JOIN t rhs_var ON (rhs_var...

                                  The following Sqlite query has a triple self-join on table t:

    SELECT "update_variable",
        lhs_acc.name_suffix || ":" || rhs_var.name_suffix,
        op.span,
        op.path
    FROM t op
    JOIN t lhs_acc ON (lhs_acc.path GLOB op.path || "?*")
    JOIN t rhs_acc ON (rhs_acc.path GLOB op.path || "?*")
    JOIN t rhs_var ON (rhs_var.path GLOB op.path || "?*")
    WHERE (op.name_prefix = "assignment"
        AND lhs_acc.name_prefix = "assignment_lhs_identifier"
        AND rhs_acc.name_prefix = "assignment_rhs_atom"
        AND rhs_var.name_prefix = "assignment_rhs_atom"
        AND rhs_acc.name_suffix != rhs_var.name_suffix
        AND lhs_acc.name_suffix = rhs_acc.name_suffix)
    OR (op.name_prefix = "augmented_assignment"
        AND lhs_acc.name_prefix = "assignment_lhs_identifier"
        AND rhs_acc.name_prefix = "assignment_rhs_atom"
        AND rhs_var.name_prefix = "assignment_rhs_atom")
    OR (op.name_prefix = "method_call"
        AND rhs_acc.name_prefix = "method_call_name"
        AND lhs_acc.name_prefix = "method_call_object"
        AND rhs_var.name_prefix = "call_argument"
        AND rhs_acc.name_suffix != rhs_var.name_suffix
        AND rhs_acc.name_suffix REGEXP "(append|extend|insert|add|update)$")
    GROUP BY op.span,
            lhs_acc.name,
            rhs_var.name

It is far too slow for my application. I can accelerate it enough by restricting each instance of t to the lines I am interested in, which somehow duplicates the WHERE conditions. So, the FROM clause becomes:

    FROM
    (SELECT *
    FROM t
    WHERE t.name_prefix IN ("assignment",
                            "augmented_assignment",
                            "method_call")) op
    JOIN
    (SELECT *
    FROM t
    WHERE t.name_prefix IN ("assignment_lhs_identifier",
                            "method_call_object") ) lhs_acc ON (lhs_acc.path GLOB op.path || "?*")
    JOIN
    (SELECT *
    FROM t
    WHERE t.name_prefix IN ("assignment_rhs_atom",
                            "method_call_name") ) rhs_acc ON (rhs_acc.path GLOB op.path || "?*")
    JOIN
    (SELECT *
    FROM t
    WHERE t.name_prefix IN ("assignment_rhs_atom",
                            "call_argument") ) rhs_var ON (rhs_var.path GLOB op.path || "?*")

Is there some technique which would lead to a less verbose / ugly / redundant query ?

---

Edit: The test duplication (but not the ugliness) can be avoided as follows: 

    SELECT "update_variable",
        lhs_acc.name_suffix || ":" || rhs_var.name_suffix,
        op.span,
        op.path
    FROM
    (SELECT *
    FROM t
    WHERE t.name_prefix IN ("assignment",
                            "augmented_assignment",
                            "method_call")) op
    JOIN t lhs_acc ON (lhs_acc.name_prefix = (CASE op.name_prefix
                                                WHEN "method_call" THEN "method_call_object"
                                                ELSE "assignment_lhs_identifier"
                                            END)
                    AND (lhs_acc.path GLOB op.path || "?*"))
    JOIN t rhs_acc ON (rhs_acc.name_prefix = (CASE op.name_prefix
                                                WHEN "method_call" THEN "method_call_name"
                                                ELSE "assignment_rhs_atom"
                                            END)
                    AND (rhs_acc.path GLOB op.path || "?*"))
    JOIN t rhs_var ON (rhs_var.name_prefix = (CASE op.name_prefix
                                                WHEN "method_call" THEN "call_argument"
                                                ELSE "assignment_rhs_atom"
                                            END)
                    AND (rhs_var.path GLOB op.path || "?*"))
    WHERE op.name_prefix = "augmented_assignment"
    OR (op.name_prefix = "assignment"
        AND lhs_acc.name_suffix = rhs_acc.name_suffix
        AND rhs_acc.name_suffix != rhs_var.name_suffix)
    OR (op.name_prefix = "method_call"
        AND rhs_acc.name_suffix REGEXP "append|extend|insert|add|update)$"
        AND rhs_acc.name_suffix != rhs_var.name_suffix)
    GROUP BY op.span,
            lhs_acc.name,
            rhs_var.name

                                

Aristide (121 rep)

Feb 1, 2020, 12:59 PM • Last activity: Jul 26, 2025, 12:02 PM

0 votes

1 answers

416 views

MariaDB: Why is this still using temporary and filesort - is it because I'm using a self-join? How to fix?

index mariadb temporary-tables self-join

```EXPLAIN SELECT e2.personId, COUNT(*) FROM entries e1 JOIN entries e2 ON e2.categoryId = e1.categoryId AND e2.personId != e1.personId WHERE e1.personId = 1 GROUP BY e2.personId ``` gives: ``` id select_type table type key key_length ref rows extra 1 SIMPLE e1 ref personId_categoryId 4 const 59 Usi...

SELECT  e2.personId, COUNT(*)
   
FROM        entries e1
JOIN        entries e2   ON  e2.categoryId   =   e1.categoryId
                        AND  e2.personId    !=   e1.personId
   
WHERE       e1.personId  =  1
   
GROUP BY    e2.personId

gives:

id   select_type   table   type   key                   key_length   ref                     rows   extra
1    SIMPLE        e1      ref    personId_categoryId   4            const                   59     Using index; Using temporary; Using filesort
1    SIMPLE        e2      ref    categoryId_personId   4            project.e1.categoryId   8      Using where; Using index

If I remove the GROUP BY I get "Using index". What's the problem here? Is it something to do with joining a table onto itself? (The indexes present on the table are the two shown in the explain output, containing the columns that the names suggest.) DDL:

CREATE TABLE entries (
  id int(10) unsigned NOT NULL AUTO_INCREMENT,
  personId int(10) unsigned NOT NULL,
  categoryId int(10) unsigned NOT NULL,
  PRIMARY KEY (id),
  KEY personId_categoryId (personId,categoryId),
  KEY categoryId_personId (categoryId,personId),
  CONSTRAINT entries_ibfk_1 FOREIGN KEY (personId) REFERENCES people (id) ON UPDATE CASCADE,
  CONSTRAINT entries_ibfk_2 FOREIGN KEY (categoryId) REFERENCES categories (id) ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1465605 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Codemonkey (265 rep)

Jul 10, 2020, 07:48 AM • Last activity: May 4, 2025, 05:07 AM

1 votes

1 answers

1073 views

MySQL Indexes on all column combinations

mysql index mysql-5.7 self-join

I have a table that basically represents the log of records ```mysql CREATE TABLE records ( id varchar(36), -- uuid text blob, user_id bigint, cluster_id bigint, status_id tinyint, session_id varchar, -- uuid, type tinyint, start_time timestamp, duration int, PRIMARY KEY (user_id, start_time, id), K...

I have a table that basically represents the log of records

CREATE TABLE records
(
    id         varchar(36), -- uuid
    text       blob,
    user_id    bigint,
    cluster_id bigint,
    status_id  tinyint,
    session_id varchar,     -- uuid,
    type       tinyint,
    start_time timestamp,
    duration   int,
    PRIMARY KEY (user_id, start_time, id),
    KEY general_index (start_time, cluster_id, user_id, id),
    KEY endpoint_index (cluster_id, start_time, id)
);

The table is somehow large: 100 GB 10 GB index size 40 m records Now the table is tuned to support filtering by user_id, cluster_id and start_time. But I need to filter (and sort) efficiently by almost every column with different combinations. Adding more and more indexes representing search patterns (e.g. KEY general_index (start_time,duration,id)) doesn't seem like a good long term solution. Is there a best practice or something available for this for MySQL? (unfortunately, using Elasticsearch or any other database is not an option) First though I had is to have 1 PK index: PRIMARY KEY (start_time,id) and then many indexes that represent some search patterns like: KEY cluster_index (cluster_id, id), KEY user_index (user_id, id), KEY duration_index (duration, id), KEY status_index (status_id, id) When you need to query the date - just filter using appropriate index(es)

SELECT *
FROM records t
WHERE start_time >= {x}
INNER JOIN (
	SELECT id FROM records WHERE duration > 10000 -- supporting index duration_index
) t2 on t.id=t2.id
INNER JOIN (
	SELECT id FROM records WHERE user_id = 123 -- supporting index user_index
) t3 on t.id=t3.id
ORDER BY [...]
LIMIT 100

But that turns out to work pretty slow. Even if the solution above uses the correct index for a subquery it still has to do the HUGE join with t

Alexrrr (11 rep)

May 3, 2022, 12:06 PM • Last activity: Apr 23, 2025, 03:06 PM

2 votes

1 answers

677 views

Best practices for large JOINs - Warehouse or External Compute (e.g. Spark)

join self-join snowflake apache-spark

I am working on a problem that requires a very large join. The JOIN itself is pretty straightforward but the amount of data I am processing is very large. I am wondering for very large JOINs, is there a preferred type of technology. For example, is it more effective to a Data Warehouse (like Snowflake) or in some other MPP system like Spark? To make the problem more concrete I created a hypothetical problem similar to my actual problem. Assume I have a table that looks like this:

I am working on some logic that requires account pairs that have the same name. To find pairs of accounts with the same account I can easily do something like this:

SELECT 
	account1.name, 
    account2.name
FROM accounts as account1
JOIN accounts as account2 ON account1.name = account2.name AND account1.acount_id != account2.acount_id

The problem I am facing is due to the amount of data I am processing. There are roughly ~2 trillion records I am trying to self JOIN on. Obviously, this will take some time and some pretty serious compute. I have run a similar query in Snowflake using XL and 3XL warehouses but after several hours of running, I canceled them. I am hoping there is a more cost-effective or time-efficient way. Has anyone had success with massive JOINs? Are there any other tricks I could deploy? What tool did you find the most effective?

Arthur Putnam (553 rep)

Feb 24, 2022, 09:06 PM • Last activity: Dec 18, 2024, 12:01 PM

1 votes

1 answers

198 views

Issues with Self-Referencing Foreign Key in SQLite

foreign-key sqlite self-join sqlite3

I'm having trouble with a self-referencing foreign key in SQLite. I've created a table to store employee details, where each employee can have a manager who is also an employee. Here's the table definition: ```sql CREATE TABLE Employees ( Employee_ID INTEGER PRIMARY KEY, Name TEXT, Manager_ID INTEGE...

CREATE TABLE Employees (
    Employee_ID INTEGER PRIMARY KEY,
    Name TEXT,
    Manager_ID INTEGER,
    FOREIGN KEY (Manager_ID) REFERENCES Employees(Employee_ID)
);

I've enabled foreign key constraints with PRAGMA foreign_keys = ON. However, when I insert data with a non-existent Manager_ID, SQLite does not raise an error. Here are the steps I've taken: 1. Enabled foreign key constraints:

PRAGMA foreign_keys = ON;

2. Created the table:

CREATE TABLE Employees (
    Employee_ID INTEGER PRIMARY KEY,
    Name TEXT,
    Manager_ID INTEGER,
    FOREIGN KEY (Manager_ID) REFERENCES Employees(Employee_ID)
);

3. Inserted data:

INSERT INTO Employees (Employee_ID, Name, Manager_ID) VALUES (1, 'Fitch', NULL);
INSERT INTO Employees (Employee_ID, Name, Manager_ID) VALUES (2, 'Alley', 1);

// This should fail because 9 doesn't exist, but doesn't:
INSERT INTO Employees (Employee_ID, Name, Manager_ID) VALUES (6, 'Blake', 9);

Despite these steps, SQLite does not enforce the foreign key constraint for Manager_ID.

SELECT * FROM Employees

Gives: | Employee_ID | Name | Manager_ID | |-------------|-------|------------| | 1 | Fitch | NULL | | 2 | Alley | 1 | | 6 | Blake | 9 | **Are Self-Referencing Foreign Keys supported in SQLite? If so, how do I get it to work?**

reubenjohn (111 rep)

Nov 18, 2024, 04:28 AM • Last activity: Nov 19, 2024, 11:59 AM

1 votes

1 answers

34 views

Multiple parents for AspNetUsers table for Asp.Net Core 3.1 app

mysql database-design table users self-join

I have the following table in MySQL.: CREATE TABLE `aspnetusers` ( `Id` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL, `UserName` varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL, `NormalizedUserName` varchar(256) CHARACTER SET utf8mb4 COLLATE utf8m...

                                  I have the following table in MySQL.:

    CREATE TABLE aspnetusers (
      Id varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL,
      UserName varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
      NormalizedUserName varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
      Email varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
      NormalizedEmail varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
      EmailConfirmed tinyint(1) NOT NULL,
      PasswordHash longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci,
      SecurityStamp longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci,
      ConcurrencyStamp longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci,
      PhoneNumber longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci,
      PhoneNumberConfirmed tinyint(1) NOT NULL,
      TwoFactorEnabled tinyint(1) NOT NULL,
      LockoutEnd datetime(6) DEFAULT NULL,
      LockoutEnabled tinyint(1) NOT NULL,
      AccessFailedCount int NOT NULL,
      InsertUser varchar(100) DEFAULT NULL,
      InsertDateTime datetime DEFAULT NULL,
      UpdateUser varchar(100) DEFAULT NULL,
      UpdateDateTime datetime DEFAULT NULL,
      ParentId varchar(255) DEFAULT NULL,
      PRIMARY KEY (Id),
      UNIQUE KEY UserNameIndex (NormalizedUserName),
      KEY EmailIndex (NormalizedEmail)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

Now my Asp.Net Core 3.1 app uses the above table, but I want to change in that way that a user can have null, one or multiple parents. 

**What is best (and minimal) way to make this modification?** 

Should I create a new table that connects the UserID with ParentID?
                                

Hello World (11 rep)

Jun 20, 2024, 07:48 PM • Last activity: Jun 21, 2024, 04:00 PM

0 votes

1 answers

582 views

write SQL query that takes its own output as input

query oracle-sql-developer recursive self-join

In a table that displays how trades are related to each other, starting from one trade, I want to find all other trades that relate to the first trade: 713613 PostPTETradeIDs 718478 713613 PrePTETradeId 711314 718478 PostPTETradeIDs 739522 718478 PrePTETradeId 713613 In this example, trade `718478`...

                                  In a table that displays how trades are related to each other, starting from one trade, I want to find all other trades that relate to the first trade:

    713613	PostPTETradeIDs	718478
    713613	PrePTETradeId	711314
    718478	PostPTETradeIDs	739522
    718478	PrePTETradeId	713613

In this example, trade 718478 was created by splitting 713613 (row 1); and 739522 was created from 718478 (row 3), etc. (a PTE splits a trade into two).

    select trade_id, keyword_name, keyword_value 
    from trade_keyword 
    where (keyword_name like 'LinkedTo' or keyword_name like 'PrePTETradeId' or keyword_name like 'PostPTETradeIDs' )
        and trade_id in('718478', '739522','713613')
    order by trade_id;

My goal is to identify trades that belong together and come from one another.

Currently, using the above query, I must fill all the TRADE_IDs manually, i.e. start with 713613, run the query, see that 718478 came from it, then run the query again searching for 718478, too, to find that 739522 came from it, etc.

I would like to make this recursive, such that it will, in one query, given some starting TRADE_ID(s), find all TRADE_IDs that are related to these via any path of any of these relations. 

In other words, if column 1 is X and column 3 is f(X), then I want at once a list of all X, f(X), f(f(X)), f(f(f(X))), etc.

Preferably, there would be another column that lists the distance in steps upwards or downwards, where the PostPTETradeIDs relation is upward and the PrePTETradeID is downward (the LinkedTo relationship does not change the counter):

713613 is distance 0,
718478 is distance 1,
739522 is distance 2.

I am using Oracle SQL. Is this issue better adressed via a subquery, and if so, how can it be made recursive; or some other functionality?
                                

Marie. P. (101 rep)

Aug 29, 2019, 02:25 PM • Last activity: Sep 17, 2023, 09:36 AM

1 votes

1 answers

126 views

Bidirectional self join table

postgresql database-design self-join

Consider: ``` CREATE TABLE object ( id integer NOT NULL DEFAULT nextval('object_seq'), ..., CONSTRAINT object_pk PRIMARY KEY (id) ); CREATE TABLE pairing ( object1 integer NOT NULL, object2 integer NOT NULL, CONSTRAINT pairing_pk PRIMARY KEY (object1, object2) ); ALTER TABLE pairing ADD CONSTRAINT o...

Consider:

CREATE TABLE object (  
	id integer NOT NULL DEFAULT nextval('object_seq'),  
	...,  
	CONSTRAINT object_pk PRIMARY KEY (id)  
);

CREATE TABLE pairing (  
	object1 integer NOT NULL,  
	object2 integer NOT NULL,  
	CONSTRAINT pairing_pk PRIMARY KEY (object1, object2)
);

ALTER TABLE pairing ADD CONSTRAINT object1_fk FOREIGN KEY (object1) REFERENCES object (id);
ALTER TABLE pairing ADD CONSTRAINT object2_fk FOREIGN KEY (object2) REFERENCES object (id);

You have a table *object* with one column *id* and some other unimportant columns, and another table *pairing* which is joining the table *object* with itself, meaning two columns *object1* and *object2*, both foreign keys of object.id. These two columns are semantically identical. Nothing like an adjacency list, you know, with one column being *parent* and the other being *child*. It means the columns are reversible, and because of that, you have the same data twice because the order of the pairing inside the table *pairing* is irrelevant. Exemple: object1-object2 1-2 1-3 1-4 2-1 3-1 4-1 1-2 is identical to 2-1, 1-3 to 3-1, etc. You can query in any direction you want because what you want to know is with which object, an object is connected to. You can query either * SELECT object2 FROM pairing WHERE object1 IN (SELECT id FROM object WHERE ); * SELECT object1 FROM pairing WHERE object2 IN (SELECT id FROM object WHERE ); Given that, is there another way to do that without having half the table being "duplicates", or just another more efficient way? I could cut the table in half but it would force me to query in both direction at the same time and I fear 1/ it will degrade the performance given speed is of the essence 2/ I will have to make multiple self join against that very table and doing them in both directions, it might be doable but at best I feel it will be awful. Here is an example of such query where I want to know if one object is paired with other objects, at the same time:

SELECT object1 FROM pairing p1
JOIN pairing p2 ON p1.object1 = p2.object1
JOIN pairing p3 ON p1.object1 = p3.object1
...
WHERE p1.object2 IN (SELECT id FROM object WHERE )
AND p2.object2 IN (SELECT id FROM object WHERE )
AND p3.object2 IN (SELECT id FROM object WHERE )

Same with a recursive CTE or just joining a table after the other like JOIN ON p1.object2 = p2.object1. I don't feel like you can split a query in two and union the results later because when you use this table twice, it "spins it into a direction" and you can't use a separate query to see if it has the rest of the results in the other direction. Not sure I am clear there. Another solution could be to collate in the FROM clause the table with itself in the other direction, recreating the table with "duplicates" but given this virtual table will be used all the time, it will be cached so wouldn't it be better to just have it this way from the beginning and not care about the double size?

Some_user (61 rep)

May 8, 2023, 10:54 AM • Last activity: May 11, 2023, 05:34 PM

3 votes

1 answers

378 views

Return recursive self joins in PostgreSQL with aggregated data

postgresql postgresql-performance json recursive self-join

I have this simple table schema: [SQL Fiddle][1] **PostgreSQL 9.6 Schema Setup**: CREATE TABLE IF NOT EXISTS users( id INT NOT NULL PRIMARY KEY, email TEXT NOT NULL UNIQUE, password TEXT NOT NULL, first_name TEXT NOT NULL, last_name TEXT NOT NULL, is_active BOOLEAN DEFAULT FALSE, is_staff BOOLEAN DE...

I have this simple table schema: SQL Fiddle **PostgreSQL 9.6 Schema Setup**: CREATE TABLE IF NOT EXISTS users( id INT NOT NULL PRIMARY KEY, email TEXT NOT NULL UNIQUE, password TEXT NOT NULL, first_name TEXT NOT NULL, last_name TEXT NOT NULL, is_active BOOLEAN DEFAULT FALSE, is_staff BOOLEAN DEFAULT FALSE, is_superuser BOOLEAN DEFAULT FALSE ); INSERT INTO users ( id, email, password, first_name, last_name, is_active, is_staff, is_superuser ) VALUES ( 1, 'sir@a.com', '23456ses', 'John', 'Idogun', true, true, true ), ( 2, 'ma@a.com', '23456ses', 'Nelson', 'Idogun', true, true, true ); -- articles table CREATE TABLE IF NOT EXISTS articles( id INT NOT NULL PRIMARY KEY, fore_image TEXT NULL, title TEXT NOT NULL, slug TEXT NOT NULL UNIQUE, content TEXT NOT NULL, is_published BOOLEAN DEFAULT FALSE, created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), user_id INT NOT NULL, FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE ); INSERT INTO articles ( id, title, slug, content, is_published, user_id ) VALUES ( 1, 'Upgrading from v0.4 to v0.5 of rust rocket', 'Upgrading from v0.4 to v0.5 of rust rocket', 'Upgrading from v0.4 to v0.5 of rust rocket', true, 1 ); CREATE TABLE IF NOT EXISTS comment( id INT NOT NULL PRIMARY KEY, article_id INT NOT NULL REFERENCES articles(id) ON DELETE CASCADE, user_id INT NOT NULL REFERENCES users(id) ON DELETE CASCADE, responded_to_comment_id INT NULL REFERENCES comment(id) ON DELETE CASCADE, content TEXT NOT NULL ); INSERT INTO comment (id, article_id, user_id, responded_to_comment_id, content) VALUES (1, 1, 1, NULL, 'First comment'), ( 2, 1, 2, 1, 'First reply' ), (3, 1, 1, 2, 'Response to the first reply.'); **Query 1**: SELECT c.id AS c_id, c.content AS c_content, json_agg(r) as replies, u.id AS u_id FROM comment AS c JOIN users AS u ON c.user_id = u.id LEFT JOIN comment r ON r.id = c.responded_to_comment_id WHERE c.article_id = 1 GROUP BY c.id, u.id **Results **: | c_id | c_content | replies | u_id | |------|--------------------------|------------------------------------------------------------------------------------------------|------| | 1 | First comment | [null] | 1 | | 2 | First reply | [{"id":1,"article_id":1,"user_id":1,"responded_to_comment_id":null,"content":"First comment"}] | 2 | | 3 | Response to first reply. | [{"id":2,"article_id":1,"user_id":2,"responded_to_comment_id":1,"content":"First reply"}] | 1 | I am trying to write a query that selects comments under an article and recursively nests all comments having responded_to_comment_id inside their parent comment so I will have only one row returned: | c_id | c_content | replies | u_id | |------|---------------|-----------|------| | 1 | First comment | [...] | 1 | and that the results for the replies column will be something like this:

[
  {
    "id": 2,
    "user_id": 1,
    "responded_to_comment_id": 1,
    "content": "First reply",
    "replies": [
      {
        "id": 3,
        "article_id": 1,
        "user_id": 2,
        "responded_to_comment_id": 2,
        "content": "Response to the first reply."
      }
    ]
  }
]

Only top-level comments (comments whose responded_to_comment_id is NULL) should be at the top.

Sirjon (165 rep)

Mar 15, 2023, 10:18 PM • Last activity: Mar 19, 2023, 02:14 PM

7 votes

1 answers

1449 views

Recursivly get a Tree Via self joined table

postgresql recursive many-to-many self-join

Using other questions here and Postgresql documentation I've managed to build a many-to-many self joined table. However adding a `WHERE` clause is giving me trouble. **Problem:** A `Category` can have many child categories, and many parent categories. Given a `category.Id`, I want to retrieve the ca...

                                  Using other questions here and Postgresql documentation I've managed to build a many-to-many self joined table.

However adding a WHERE clause is giving me trouble.

**Problem:**

A Category can have many child categories, and many parent categories. Given a category.Id, I want to retrieve the category, the category children, children's children and so on.

**Example**: given this structure:

    child_1
        child_11
            child_111
            child_112
                child_1121
        child_21
    child_2

Given: a clause of id = child_11

Expected results:
child_11, child_111, child_112, child_1121, 

Actual results: child_11, child_111, child_112

Here is my attempt: http://sqlfiddle.com/#!17/3640f/2 

In case Sqlfiddle is down: https://www.db-fiddle.com/#&togetherjs=LhDjxfPHo6 

*Note: I don't care about duplicating the where clause, my application can handle that*

**Table structure:**

    CREATE TABLE Category(id SERIAL PRIMARY KEY, name VARCHAR(255));
    CREATE TABLE Categories(parent_id INTEGER, child_id INTEGER, PRIMARY KEY(parent_id, child_id));
    
    ALTER TABLE Categories ADD FOREIGN KEY (parent_id) REFERENCES category (id);
    ALTER TABLE Categories ADD FOREIGN KEY (child_id) REFERENCES category (id);

**Table data:**

    INSERT INTO Category(id, name) VALUES (1, 'parent_1');
    INSERT INTO Category(id, name) VALUES (2, 'child_1');
    INSERT INTO Category(id, name) VALUES (3, 'child_2');
    INSERT INTO Category(id, name) VALUES (4, 'child_3');
    INSERT INTO Category(id, name) VALUES (5, 'child_1_1');
    INSERT INTO Category(id, name) VALUES (6, 'child_1_2');
    INSERT INTO Category(id, name) VALUES (7, 'child_1_1_1');
    INSERT INTO Category(id, name) VALUES (10, 'child_of_many');
    INSERT INTO Category(id, name) VALUES (11, 'parent_1');
    INSERT INTO Category(id, name) VALUES (12, 'parent_2');
    
    INSERT INTO Categories(parent_id, child_id) VALUES (1, 2);
    INSERT INTO Categories(parent_id, child_id) VALUES (1, 3);
    INSERT INTO Categories(parent_id, child_id) VALUES (1, 4);
    INSERT INTO Categories(parent_id, child_id) VALUES (2, 5);
    INSERT INTO Categories(parent_id, child_id) VALUES (2, 6);
    INSERT INTO Categories(parent_id, child_id) VALUES (5, 7);

My query which is giving me the children, but not the children's children etc.
If I remove the WHERE clause's I can get all rows:

    WITH RECURSIVE categories_category AS (
      SELECT id, 'Category' AS COLUMN_TYPE, c1.name 
      FROM Category c1
      WHERE c1.id=2
    UNION
      SELECT c2.id, 'Category' AS COLUMN_TYPE, c2.name
       FROM Category c1
       INNER JOIN categories cs1 ON c1.id = cs1.parent_id
       INNER JOIN Category c2 ON c2.id = cs1.child_id
       WHERE cs1.parent_id = 2
    ) SELECT * FROM categories_category

**Edit: More Detailed example:**

Given the following **category row's**, I'd like to be able to run a query given a WHERE clause the matches the id of warmStoreAlcohol and get a result of:

    +---+------------------+
    |id |name              |
    +---+------------------+
    |2  |warmStoredAlcohol |
    |3  |vodka             |
    |4  |beer              |
    |5  |frozenBeer        |
    +---+------------------+



coldStoredAlcohol would give a result of:

    +---+------------------+
    |id |name              |
    +---+------------------+
    |6  |coldStoredAlcohol |
    |5  |frozenBeer        |
    |7  |cooler            |
    +---+------------------+


The database structure will not change often. In this example 'frozenBeer' has two parents, and should return for querying both warmStoredAlcohol and coldStoredAlcohol.

**I am open to changing the table structure, adding new tables and even upgrading postgres version etc.** The database will hold ~2,000 rows, therefore I value an easy-to-understand table structure over a super complicated optimal one. (But any solution is better than my broken one)

                                

dustytrash (171 rep)

Feb 21, 2020, 02:58 PM • Last activity: Nov 12, 2022, 06:58 PM

0 votes

0 answers

753 views

T-SQL, Fast self join on the next available date where dates skip weekends and multiple rows can have the same date

performance join self-join

TL;DR, I'll post back later with more info including table, view, and index definitions; before/after results; and query plans to help to answer this question. I'm trying to use a self-join to a view to get values from the next available day. The idea is to be able to calculate the change in values for a given day when values are only reported once at the beginning of the day. So the current day's beginning values = the prior day's ending values. In this case, the application is stock option open interest, but this type of JOIN can apply to other scenarios. The tricky part of the JOIN is that there are multiple rows for each day, and there are gaps between days for weekends and holidays. The query I'm writing currently SUMs values, then does a non-equi self-join to optionally filter out rows while still returning correct SUMs overall. (That self-join works and is not important to this question, I'm just explaining its existence in the query.) The query's results are then PIVOTed (not shown). Using CTEs with PIVOTs is a whole different beast that I'm trying to avoid. In any case, I think a self-join is the best and easiest solution, if only I can get the logic right. The view's columns contain raw data, aggregates, and analytic functions, and would be harder to edit to show aggregates of the next available day's data than editing the query. The underlying table has a clustered index on the columns AsOf, Expiration, and Strike, which are 3 of the 4 columns I'm joining on. I don't believe joining using ROW_NUMBER() would work because there are multiple rows with the same date, and multiple other columns being joined on, but I could be wrong. I've tried the LEAD() function in the query's SELECT, but Microsoft's LEAD() function doesn't support ignoring NULLs. Also, having to group by the LEAD()'s column ungroups everything else and throws off the SUMs. So my ideal solution is back to using a self-join. I've tried joining using EXISTS and IN in a CASE statement (current AsOf date +1, +2, etc until a match) but those were extremely slow and they joined not just the next available day's rows, but all of the consecutive dates until they hit a non-consecutive date going from Fri to Mon. Maybe that can work, but I couldn't get it to. I've now got a JOIN (shown below) that I believe is correct and less slow than testing individual EXISTS or INs in a CASE statement, but it's still too slow. At the bottom, I've got another JOIN using a COALESCE that's much faster, but flawed. It correctly joins once to the next day, but it doesn't work with non-consecutive dates. Any help is much appreciated. My brain is turning to mush :) I've simplified the query below. The self-join to get rows from the next available day (LEFT JOIN OCCALLS AS ND) returns the correct results, but it takes minutes to run for just a few days, which won't work over longer periods. For reference, the query returns results for all dates (1M+ rows in source table) in 0.00 seconds without the troublesome self-join (which is the same as the working self join above it, alias B, except for the logic on the AsOf date column).

SELECT
		 A.Ticker
		,'Call'				AS 'Option'
		,A.AsOf							--Data AsOf, contains one row per experation per strike price for each trading day
		,ND.AsOf			AS 'NextDay'--Data AsOf Next [available] Day
		,B.CallDelta
		,SUM(A.[CVol@Del])	AS 'CallVol'--Total Vol.  If B.CallDelta range is specified, 1 row returned for total B.CallDelta in range, and 1 row returned for total outside of range
		,SUM(A.OIB)			AS 'OIB'	--OI at BOD
		,SUM(ND.OIB)		AS 'OIE'	--OI at EOD = Next day BOD
		,A.[V:DTot]						--Total daily call vol
	FROM OCCALLS AS A
	LEFT JOIN OCCALLS AS B
		ON  A.Ticker		= B.Ticker
		AND A.AsOf			= B.AsOf
		AND A.Expiration	= B.Expiration
		AND A.Strike		= B.Strike
	--	AND B.CallDelta	BETWEEN 0.95 AND 1.00
	LEFT JOIN OCCALLS AS ND				--Next [available] Day
		ON  A.Ticker		= ND.Ticker
		AND (SELECT MIN(AsOf) FROM OCCALLS WHERE AsOf > A.AsOf)
							= ND.AsOf
		AND A.Expiration	= ND.Expiration
		AND A.Strike		= ND.Strike
	WHERE 1=1
		AND A.Ticker = 'SPY'
	--	AND A.AsOf >= '2020-01-01'
		AND A.AsOf BETWEEN '2022-07-07' AND '2022-07-11'
	GROUP BY
		 A.Ticker
		,A.AsOf
		,ND.AsOf
		,B.CallDelta
		,A.[V:DTot]
		,A.[Open]
		,A.[High]
		,A.[Low]
		,A.[Close]
	ORDER BY A.AsOf		, B.CallDelta

Another option is using COALESCE (shown below) or CASE in the JOIN's ON clause. With CASE, I found that it doesn't stop evaluating expressions after the first successful one and so joins all consecutive dates until it hits a gap between Fri and Mon. I got closer with COALESCE in that it only joins once to the next day, but it also only joins when the next date is consecutive. So COALESCE is the next closest I've come to a correct and fast join. I know my logic for it is incorrect, but I'm not sure how to fix it with valid syntax. I'm guessing it'll have to be rewritten back to a more complex CASE. I want to believe COALESCE or CASE can work and could be the fastest solutions, but maybe ROW_NUMBER() is an option too? In my example below, I'd limit it to checking 7 days ahead for a date for which rows exist, which should be more than enough.

LEFT JOIN OCCALLS AS ND				--Next [available] Day
		ON  A.Ticker		= ND.Ticker
		AND A.AsOf			= COALESCE( DATEADD(d,-1,ND.AsOf)
									   ,DATEADD(d,-2,ND.AsOf)
									   ,DATEADD(d,-3,ND.AsOf)
									   ,DATEADD(d,-4,ND.AsOf)
									   ,DATEADD(d,-5,ND.AsOf)
									   ,DATEADD(d,-6,ND.AsOf)
									   ,DATEADD(d,-7,ND.AsOf))
		AND A.Expiration	= ND.Expiration
		AND A.Strike		= ND.Strike

mastaofdisastax (13 rep)

Jul 20, 2022, 10:19 AM • Last activity: Jul 20, 2022, 05:11 PM

0 votes

1 answers

685 views

How to optimized self-join view of 50 millions rows table

postgresql join view postgresql-13 self-join

I've the following simplified table with ~50M rows. table sample ( id uuid not null primary key, measured_date timestamp with time zone not null, segment_id uuid not null, activity_id uuid not null, value integer not null ); Indexes: "sample_pkey" PRIMARY KEY, btree (id) "sample_idx" btree (segment_...

                                  I've the following simplified table with ~50M rows.

    table sample (
    	id uuid not null primary key,
    	measured_date timestamp with time zone not null,
    	segment_id uuid not null,
    	activity_id uuid not null,
        value integer not null
    );
    Indexes:
        "sample_pkey" PRIMARY KEY, btree (id)
        "sample_idx" btree (segment_id, measured_date)
        "sample_uniq" UNIQUE CONSTRAINT, btree (segment_id, activity_id, measured_date)
        "sample_activity_idx" btree (activity_id)

I would like to get for each measured gap (between two different dates) my calculated value.

My query is as follow:

    SELECT ROW_NUMBER () OVER () AS id,
            t1.segment_id AS segment_id,
            t1.activity_id AS activity_id,
            t1.measured_date AS from_date,
            t2.measured_date AS to_date,
            t2.value AS cumulative_progress,
            (t2.value - t1.value) AS marginal_progress,
        FROM sample AS t1 JOIN sample AS t2
        ON t1.activity_id = t2.activity_id AND t1.segment_id = t2.segment_id and t1.measured_date   WindowAgg  (cost=1.13..17.22 rows=1 width=154) (actual time=0.063..0.914 rows=171 loops=1)                                                                        |
    |        ->  Nested Loop  (cost=1.13..17.18 rows=1 width=124) (actual time=0.056..0.698 rows=171 loops=1)                                                                |
    |              Join Filter: (t1.measured_date   Index Scan using sample_uniq on sample t1  (cost=0.56..8.58 rows=1 width=70) (actual time=0.021..0.043 rows=19 loops=1)                               |
    |                    Index Cond: ((segment_id = '00021c8d-7162-467d-8e6a-4cb62926bf53'::uuid) AND (activity_id = '34a4b908-4613-422a-b6d0-4bb112737f09'::uuid))          |
    |              ->  Index Scan using sample_uniq on sample t2  (cost=0.56..8.58 rows=1 width=86) (actual time=0.005..0.030 rows=19 loops=19)                              |
    |                    Index Cond: ((segment_id = '00021c8d-7162-467d-8e6a-4cb62926bf53'::uuid) AND (activity_id = '34a4b908-4613-422a-b6d0-4bb112737f09'::uuid))          |
    |Planning Time: 0.321 ms                                                                                                                                                 |
    |Execution Time: 1.097 ms                                                                                                                                                |
    +------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

However, when I create a view to use this query often the results are poor.

Creating the view:

    CREATE OR REPLACE VIEW sample_marginal AS
            SELECT ROW_NUMBER () OVER () AS id,
            t1.segment_id AS segment_id,
            t1.activity_id AS activity_id,
            t1.measured_date AS from_date,
            t2.measured_date AS to_date,
            t2.value AS cumulative_progress,
            (t2.value - t1.value) AS marginal_progress,
        FROM sample AS t1 JOIN sample AS t2
        ON t1.activity_id = t2.activity_id AND t1.segment_id = t2.segment_id and t1.measured_date   WindowAgg  (cost=14106816.44..22564147.32 rows=51297073 width=154)                                                                                                  |
    |        ->  Gather  (cost=14106816.44..20768749.77 rows=51297073 width=124)                                                                                               |
    |              Workers Planned: 2                                                                                                                                          |
    |              ->  Merge Join  (cost=14105816.44..15638042.47 rows=21373780 width=124)                                                                                     |
    |                    Merge Cond: ((t2.activity_id = t1.activity_id) AND (t2.segment_id = t1.segment_id))                                                                   |
    |                    Join Filter: (t1.updated_by_date   Sort  (cost=4751689.83..4797948.93 rows=18503642 width=86)                                                                                        |
    |                          Sort Key: t2.activity_id, t2.segment_id                                                                                                         |
    |                          ->  Parallel Seq Scan on sample t2  (cost=0.00..1632749.42 rows=18503642 width=86)                                                              |
    |                    ->  Materialize  (cost=9354126.62..9576170.32 rows=44408740 width=70)                                                                                 |
    |                          ->  Sort  (cost=9354126.62..9465148.47 rows=44408740 width=70)                                                                                  |
    |                                Sort Key: t1.activity_id, t1.segment_id                                                                                                   |
    |                                ->  Seq Scan on sample t1  (cost=0.00..1891800.40 rows=44408740 width=70)                                                                 |
    +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I actually never waited for this query to finish since it took extremely long time, and I thought of fixing it before I can use it.

Running count(*) however, resulted in 1.8B rows (execution was slow of course).

Any idea on how to improve my view?




                                

Cowabunga (145 rep)

Apr 27, 2022, 09:52 PM • Last activity: Apr 28, 2022, 03:05 AM

0 votes

2 answers

144 views

Improving query (Doing a nasty self join on a range with ctes)

mysql mariadb join cte self-join

-- The CTE1 is just to create columns that dictate the bound of what is considered the same entry -- Also I do a dense rank by ACT_TIME, and a PARITION BY on ID1, ID2 -- so all ID1/ID2 combos are ranked by when they ran WITH cte1 AS (SELECT *, (X-2) AS X_START, (X+2) AS X_END, (Y-2) AS Y_START, (Y+2...

                                      -- The CTE1 is just to create columns that dictate the bound of what is considered the same entry
    -- Also I do a dense rank by ACT_TIME, and a PARITION BY on ID1, ID2
    -- so all ID1/ID2 combos are ranked by when they ran
    WITH 
    cte1 AS (SELECT *, (X-2) AS X_START, (X+2) AS X_END, (Y-2) AS Y_START, (Y+2) AS Y_END, (Z*1.2) AS Z_MAX,
    	DENSE_RANK() OVER (PARTITION BY ID1, ID2 ORDER BY ACT_TIME) AS DENSE_RANK
    	FROM data
    	ORDER BY data.ACT_TIME)
    
    ,cte2 AS (
    -- Create new set of column as comparisons
    SELECT 
    ID AS ID_COMP,
    ID1 AS ID1_COMP,
    ID2 AS ID2_COMP,
    X_START AS X_START_COMP,
    X_END AS X_END_COMP,
    Y_START AS Y_START_COMP,
    Y_END AS Y_END_COMP,
    Z AS Z_MAX_COMP,
    DENSE_RANK AS DENSE_RANK_COMP
    FROM cte1)
    , cte3 AS (
    -- join cte1 on cte2 only when X and Y value from cte1 is between the limits of cte2 AND 
    -- Z max value from cte 2 is larger than Z value from cte1 AND ID1/ID2 match
    -- The result will have an ID of a row that should be removed since their x and y was in between the limits 
    -- Then remove any rows where rank from cte2 is higher than cte1
    --	Remove any rows that were joined onto it self 
    SELECT 
    cte1.* ,
    cte2.*
    FROM cte1
    JOIN cte2 ON (( cte2.X_END_COMP >= cte1.X  AND  cte1.X >= cte2.X_START_COMP) AND (cte2.Y_END_COMP >= cte1.Y AND cte1.Y>= cte2.Y_START_COMP) 
    					AND (cte1.Z  cte2.ID_COMP AND cte2.DENSE_RANK_COMP <= cte1.DENSE_RANK)
    -- Any IDs that shows up in cte3 remove from the final result
    SELECT data.*
    FROM data
    WHERE ID NOT IN (SELECT DISTINCT ID FROM cte3)
    ORDER BY data.ACT_TIME

Here's my create table

    CREATE TABLE data (
    	ID INT(11) NOT NULL AUTO_INCREMENT,
    	ID1 VARCHAR(10) NULL DEFAULT NULL COLLATE 'latin1_swedish_ci',
    	ID2 INT(11) NULL DEFAULT NULL,
    	ACT_TIME TIMESTAMP NULL DEFAULT NULL,
    	X FLOAT(12) NULL DEFAULT NULL,
    	Y FLOAT(12) NULL DEFAULT NULL,
    	Z FLOAT(12) NULL DEFAULT NULL,
    	PRIMARY KEY (ID) USING BTREE,
    	INDEX ID1 (ID1) USING BTREE,
    	INDEX ACT_TIME (ACT_TIME) USING BTREE
    );
Here's the EXPLAIN {query} result


Here's how the query should work. 

I want to remove any rows (that have the same ID1 and ID2) that occur later on where X and Y are in between +-2 and a Z less than 1.2*z

Here's an example input





Example output




This query takes about 5min with 2.5M rows.

I am on MariDB 10.5.5

Any and all help is appreciated!

EDIT for Rick James here's your explain {query}
here is the explain result




                                

mike_gundy123 (103 rep)

Oct 6, 2021, 03:50 PM • Last activity: Oct 7, 2021, 04:55 PM

0 votes

1 answers

490 views

Self Join a Table to Get a Pair of Value But Could Not Receive the Expected Results

mysql self-join

I am trying to self join the relation casts(movie_id,cast_id,cast_name). I want to query one pair of actors who have appeared in at least 1 movies together. The self pair must be excluded, and keep the mirror pair. This is the relation table: |movie_id|cast_id|cast_name| |:-------|:------|:--------| |12345|10|Trung| |12340| 10| Trung| |12341| 10| Trung| |12342| 10| Trung| |12343| 11| Thi| |12340| 12| Nhi| |12341| 12| Nhi| |12342| 12| Nhi| |12343| 12| Nhi| |12340| 13| David| |12345| 12| Nhi| This is what I've tried:

SELECT c.cast_id, c2.cast_id, c.movie_id
FROM casts c, casts c2
WHERE c.cast_id != c2.cast_id AND c.movie_id = c2.movie_id;

Instead of getting pairs of different cast_id, I got pairs of same cast_id. |cast_id|cast_id|movie_id| |:-------|:------|:--------| |10| 10| 12345| |10| 10| 12340| |10| 10| 12340| |10| 10| 12341| |10 |10 |12342| |11 |11 |12343| |12| 12| 12340| |12 |12| 12340| |12| 12| 12341| |12| 12| 12342| |12| 12| 12343| |13 |13 |12340| |13| 13| 12340| |12| 12| 12345| However, I got different pairs of actors if I try replace one of the attributes to cast_name like:

SELECT c.cast_name, c2.cast_id, c.movie_id
FROM casts c, casts c2
WHERE c.cast_id != c2.cast_id AND c.movie_id = c2.movie_id;

|cast_id|cast_id|movie_id| |:-------|:------|:--------| |Nhi| 10| 12345| |David| 10| 12340| |Nhi| 10| 12340| |Nhi| 10| 12341| |Nhi |10 |12342| |Nhi |11 |12343| |David| 12| 12340| |Trung |12| 12340| |Trung| 12| 12341| |Trung| 12| 12342| |Thi| 12| 12343| |Nhi |13 |12340| |Trung| 13| 12340| |Trung| 12| 12345| Could you please explain the reason for this behavior and how to fix it to receive the expected results? Thank you!

Trung Nguyen (3 rep)

Oct 1, 2021, 06:10 AM • Last activity: Oct 1, 2021, 05:35 PM

2 votes

0 answers

920 views

SQL Server - Self join on table is very slow

sql-server query-performance self-join

Working with Microsoft SQL Server 2017 (RTM-GDR) (KB4505224) - 14.0.2027.2 (X64) We have a single table, which is planned to contain about 1.5 - 2, maybe max 3 million records at a time. Records will continuously be moved into historical tables. Here is the table definition: ``` CREATE TABLE [dbo].[...

CREATE TABLE [dbo].[MyTable](
	[Id] [int] IDENTITY(1,1) NOT NULL,
	[MainKey] [nvarchar](20) NOT NULL,
	[SecondaryKey] [nvarchar](100) NOT NULL,
	[Value] [nvarchar](max) NULL,
 CONSTRAINT [PK_MyTable] PRIMARY KEY CLUSTERED 
(
	[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
 CONSTRAINT [UC_MyTable] UNIQUE NONCLUSTERED 
(
	[MainKey] ASC,
	[SecondaryKey] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO

The table will basically store key-value pairs belonging to multiple entities.

is the key of the entity,

is the key in the key-value pair, and

is obviously the value. Now, the problem: we need to select all key-value pairs for all entities, where there is at least one key-value pair in which both the key and the value matches certain filters. Unfortunately, for the key filter, we need to use

because there is a number in the middle of the string, and it can be any number. This is the query we have now:

select *
from  MyTable
where
	MainKey in (
		select MainKey
		from MyTable
		where
			SecondaryKey LIKE 'begin.%.end'
			and Value in ('special_value1','special_value7','special_value10')
	)
order by MainKey, SecondaryKey;

This query runs in 17-21 seconds for a total of 3 003 216 records at the moment, and returns 914 040 matching records (using random-generated dummy data). This query will run many times, so I'm trying to make it run faster. I don't have a very specific goal for execution time, it should be "as fast as possible", but let's say around 5 seconds would be an achievement already. My initial idea was the LIKE and the IN filtering in the inner query, however when I'm running only that part, it returns 15 160 records un 1-2 seconds. So it seems it is getting slower with the self-join. This is the actual execution plan:

Unfortunately this doesn't say too much to me. I can also export the plan and upload it as XML if it helps. Finally I'm adding the script I used to generate the random data:

declare @count integer = 1;
declare @count2 integer;
declare @count3 integer;
declare @random integer;
declare @value_match_random integer = 30;

-- total of 50 000 separate entity keys
while @count <= 50000
begin
    -- between 30 and 60 key-value pairs for each entity
	select @count2 = abs(checksum(newid()) % 60) + 30;
	set @count3 = 1;

    -- insert random records which will never match the criteria
	while @count3 <= @count2
	begin
		insert into MyTable (MainKey, SecondaryKey, Value) values (@count, concat('SecondaryKey_', @count3), concat('Value_', @count3));
		set @count3 = @count3 + 1;
	end;

	select @random = abs(checksum(newid()) % 100) + 1;

    -- give @value_match_random % chance to insert a matching key-value pair also
	if @random <= @value_match_random
	begin
		insert into MyTable (MainKey, SecondaryKey, Value) values (@count, 'begin.1.end', 'special_value1');
	end;

	set @count = @count + 1;
end;

Added execution plan here: https://www.brentozar.com/pastetheplan/?id=r12VQEqbt Another update is that today (after 1 week when I was on leave), I ran the same query again on the same table with same contents, and SSMS suggested that I create the following index:

CREATE NONCLUSTERED INDEX idx_MyTable_SecondaryKey
ON [dbo].[MyTable] ([SecondaryKey])
INCLUDE ([MainKey],[Value])
GO

I created it, and collected the above execution plan after this index is created. 1 week earlier, when I posted this question, I don't remember seeing this index suggestion.

Gábor Major (163 rep)

Aug 19, 2021, 05:06 PM • Last activity: Aug 30, 2021, 10:34 AM

0 votes

0 answers

1363 views

Flatten multiple rows from One to Many to single row

mariadb join eav self-join

This questions is similar to what I need but the answer seems rather specific to the question rather than a general answer. https://dba.stackexchange.com/questions/270402/one-to-many-relationship-query-result-in-a-single-row-and-dynamic-multiple-colum I have a table of games (we will call this table...

                                  This questions is similar to what I need but the answer seems rather specific to the question rather than a general answer.
https://dba.stackexchange.com/questions/270402/one-to-many-relationship-query-result-in-a-single-row-and-dynamic-multiple-colum 

I have a table of games (we will call this table matches)

    gameID     Home Team     Away Team     DateTime     
    -----    -----------    ----------     ----------    
     1        Barcelona     Real Madrid    12-08-21 10:00:00  
     2        Valencia      Sevilla        12-08-21 10:00:00  

Each game has potentially several attributes, half time score, full Tim score, extra time score, penalty score, attendance etc.

This is stored in a one to many table which looks like (we will call this match attributes)

    id       game id     attribute id     attribute value          
    -----    --------   -----------     -----------------        
     1           1            1                3    
     2           1            2                3 
     3           1            5                32123
     4           2            1                4    
     5           2            2                0 

Where attribute id is a foreign key for this table (we will call this attributes)
     
    attribute id     attribute name           
    ----------      ---------------          
        1           Home Team Extra Time Score    
        2           Away Team Extra Time Score
        5           Game Attendance
        8           Home Team Penalty Score 
        5           Away Team Penalty Score

At the moment when I want to show a match to a user I query the matches table and separately query the match attributes table, ideally however I could make one join that merges the several results from the match attributes table into columns of the matches table. Here are two examples.

Querying 1 game (Where gameId = 1)

    gameID     Home Team     Away Team     DateTime            Home Team Extra Time Score     Away Team Extra Time Score         Attendance
    -----    -----------    ----------     ----------          ----------------------------    ----------------------------    ----------
     1        Barcelona     Real Madrid    12-08-21 10:00:00            3                                     3                     32123


Querying multiple games (game id = 1 or 2)

    gameID     Home Team     Away Team     DateTime            Home Team Extra Time Score     Away Team Extra Time Score         Attendance
    -----    -----------    ----------     ----------          ----------------------------    ----------------------------    ----------
     1        Barcelona     Real Madrid    12-08-21 10:00:00            3                                     3                     32123
     2        Valencia     Sevilla         12-08-21 10:00:00            4                                     0                    null




How would this be possible, I don't even know what functionality I could use to do this, as I don't think joins would work with multiple rows like this 
                                

Marc-9 (23 rep)

Jun 29, 2021, 04:02 AM • Last activity: Jun 30, 2021, 07:37 AM

0 votes

0 answers

24 views

How do I join rows from the same table with slightly different info?

mysql self-join

We have a new sector of our company that is being put into our employee record feed. However this sector is listed as contractors as there are still parts of the sector that need access to systems that require domain access... so... I have: TABLE EMP EmpID, email, INFO1, INFO2, INF03... and for Tom...

                                  We have a new sector of our company that is being put into our employee record feed.   However this sector is listed as contractors as there are still parts of the sector that need access to systems that require domain access...    so... 

I have:

TABLE EMP

EmpID,
email,
INFO1,
INFO2,
INF03...

and for Tom Smith
5763,
tom.smith@co1.com,
blah blah,
blah blah,
blah blah...

and also for Tom Smith
ZZ98401,
tom.smith@co2.com,
null,
null,
null...

Not everyone on the feed is like this, only about 10%.   Most are in case number 2.

What I am looking for is the easiest way to equate the two rows associated with Tom and fill in as much info on Tom as I can because some fields come from one account and other fields come from the other.   The only thing that can accurately make the two rows common is the beginning of the email address - this is standardized.   So there should be a few thousand people (rows of duplicates) on my feed that I can combine if it can be done right.

I would also like to capture both EmpIDs somehow in the same row and both email addresses.      


                                

LOSTinDB (551 rep)

Jun 24, 2021, 09:24 PM

0 votes

2 answers

47 views

T SQL, Get new customer added or deleted from historical table

sql-server-2016 change-data-capture self-join

I have a Table containing historical data. Every day the same staff list will be added with the new dates (between 1200-12500 staff each day however only a few staff might be newly added todaysDate or they may have left the company yesterday and not showing on todays list). I want to know each day which staff wasn't on yesterday list, it is new staff and the staff was showing yesterday but not today because they left company. I have tried the following but no joy please help my historical table is like below and I need a script to get the rows for ID4 and ID5 only. | StaffID | StaffName | CensusDate | | | --- | --- | --- | --- | | ID1 | Staff1 | 02/03/2021 | | ID1 | Staff1 | 03/03/2021 | | ID2 | Staff2 | 02/03/2021 | | ID2 | Staff2 | 03/03/2021 | | ID3 | Staff3 | 02/03/2021 | | ID3 | Staff3 | 03/03/2021 | | ID4 | Staff4 | 02/03/2021 | not showing today | ID5 | Staff5 | 03/03/2021 | is newly added today | ID6 | Staff6 | 02/03/2021 | | ID6 | Staff6 | 03/03/2021 | | ID7 | Staff7 | 02/03/2021 | | ID7 | Staff7 | 03/03/2021 |

SELECT StaffID, StaffNAME, CensusDate
FROM table1 AS A 
LEFT JOIN table1 AS B 

ON A.StaffID = B.StaffID AND A.date = GETDATE())  AS date) 
and B.date = CAST(DATEADD(day, -1, convert(date, GETDATE())) AS date) 
AND A.staffID IS NULL

Nemo (3 rep)

Mar 8, 2021, 11:36 PM • Last activity: Mar 9, 2021, 09:07 AM

0 votes

2 answers

131 views

Explaining what the OR operator does in the following code

postgresql join operator self-join

I ran into the following SQL code which (successfully) outputs a list of parents and the age of their youngest child: SELECT parents.name AS name, MIN(children.age) AS age FROM people children INNER JOIN people parents ON (parents.id = children.fatherId OR parents.id = children.motherId) GROUP BY pa...

                                  I ran into the following SQL code which (successfully) outputs a list of parents and the age of their youngest child: 

    SELECT parents.name AS name, MIN(children.age) AS age FROM people children
    INNER JOIN people parents ON (parents.id = children.fatherId OR parents.id = children.motherId)
    GROUP BY parents.id

The code self joins a table named "people" on itself. I just wanted to ask how does the OR operator work here? I know OR as a logical operator but here it seems it does something else. It takes two arguments and just joins on both of them. What does it have to do with logical OR?
                                

Friedman (103 rep)

Feb 23, 2021, 04:12 PM • Last activity: Feb 23, 2021, 07:09 PM

-2 votes

2 answers

632 views

Nontrivial joining a relation with itself

join relational-algebra self-join

I am a philosopher and am doing my first steps in relational algebra. So here is a puzzle: Let say that we have a relation R with attributes 'R.a' and 'R.b'. Is it possible to make a renaming and natural join operations on R and receive a triple (a, b, a) as a result? If I make the following steps:...

                                  I am a philosopher and am doing my first steps in relational algebra. So here is a puzzle: 

Let say that we have a relation R with attributes 'R.a' and 'R.b'. Is it possible to make a renaming and natural join operations on R and receive a triple (a, b, a) as a result? 

If I make the following steps: 

1. rename R to S.
2. rename all attributes in S to 'S.a', 'S.b'. 
3. rename 'S.b' on 'R.b' and 
4. make a natural join of R and S. 

Will it work? How do I write such a request properly?
                                

Olga (1 rep)

Jul 31, 2018, 06:27 PM • Last activity: Dec 27, 2020, 05:31 AM

Showing page 1 of 20 total questions