Database Administrators
Q&A for database professionals who wish to improve their database skills
Latest Questions
0
votes
1
answers
143
views
accelerating a SQL quadruple self-join with a complex alternation in the WHERE clause
The following Sqlite query has a triple self-join on table t: SELECT "update_variable", lhs_acc.name_suffix || ":" || rhs_var.name_suffix, op.span, op.path FROM t op JOIN t lhs_acc ON (lhs_acc.path GLOB op.path || "?*") JOIN t rhs_acc ON (rhs_acc.path GLOB op.path || "?*") JOIN t rhs_var ON (rhs_var...
The following Sqlite query has a triple self-join on table t:
SELECT "update_variable",
lhs_acc.name_suffix || ":" || rhs_var.name_suffix,
op.span,
op.path
FROM t op
JOIN t lhs_acc ON (lhs_acc.path GLOB op.path || "?*")
JOIN t rhs_acc ON (rhs_acc.path GLOB op.path || "?*")
JOIN t rhs_var ON (rhs_var.path GLOB op.path || "?*")
WHERE (op.name_prefix = "assignment"
AND lhs_acc.name_prefix = "assignment_lhs_identifier"
AND rhs_acc.name_prefix = "assignment_rhs_atom"
AND rhs_var.name_prefix = "assignment_rhs_atom"
AND rhs_acc.name_suffix != rhs_var.name_suffix
AND lhs_acc.name_suffix = rhs_acc.name_suffix)
OR (op.name_prefix = "augmented_assignment"
AND lhs_acc.name_prefix = "assignment_lhs_identifier"
AND rhs_acc.name_prefix = "assignment_rhs_atom"
AND rhs_var.name_prefix = "assignment_rhs_atom")
OR (op.name_prefix = "method_call"
AND rhs_acc.name_prefix = "method_call_name"
AND lhs_acc.name_prefix = "method_call_object"
AND rhs_var.name_prefix = "call_argument"
AND rhs_acc.name_suffix != rhs_var.name_suffix
AND rhs_acc.name_suffix REGEXP "(append|extend|insert|add|update)$")
GROUP BY op.span,
lhs_acc.name,
rhs_var.name
It is far too slow for my application. I can accelerate it enough by restricting each instance of t to the lines I am interested in, which somehow duplicates the WHERE conditions. So, the FROM clause becomes:
FROM
(SELECT *
FROM t
WHERE t.name_prefix IN ("assignment",
"augmented_assignment",
"method_call")) op
JOIN
(SELECT *
FROM t
WHERE t.name_prefix IN ("assignment_lhs_identifier",
"method_call_object") ) lhs_acc ON (lhs_acc.path GLOB op.path || "?*")
JOIN
(SELECT *
FROM t
WHERE t.name_prefix IN ("assignment_rhs_atom",
"method_call_name") ) rhs_acc ON (rhs_acc.path GLOB op.path || "?*")
JOIN
(SELECT *
FROM t
WHERE t.name_prefix IN ("assignment_rhs_atom",
"call_argument") ) rhs_var ON (rhs_var.path GLOB op.path || "?*")
Is there some technique which would lead to a less verbose / ugly / redundant query ?
---
Edit: The test duplication (but not the ugliness) can be avoided as follows:
SELECT "update_variable",
lhs_acc.name_suffix || ":" || rhs_var.name_suffix,
op.span,
op.path
FROM
(SELECT *
FROM t
WHERE t.name_prefix IN ("assignment",
"augmented_assignment",
"method_call")) op
JOIN t lhs_acc ON (lhs_acc.name_prefix = (CASE op.name_prefix
WHEN "method_call" THEN "method_call_object"
ELSE "assignment_lhs_identifier"
END)
AND (lhs_acc.path GLOB op.path || "?*"))
JOIN t rhs_acc ON (rhs_acc.name_prefix = (CASE op.name_prefix
WHEN "method_call" THEN "method_call_name"
ELSE "assignment_rhs_atom"
END)
AND (rhs_acc.path GLOB op.path || "?*"))
JOIN t rhs_var ON (rhs_var.name_prefix = (CASE op.name_prefix
WHEN "method_call" THEN "call_argument"
ELSE "assignment_rhs_atom"
END)
AND (rhs_var.path GLOB op.path || "?*"))
WHERE op.name_prefix = "augmented_assignment"
OR (op.name_prefix = "assignment"
AND lhs_acc.name_suffix = rhs_acc.name_suffix
AND rhs_acc.name_suffix != rhs_var.name_suffix)
OR (op.name_prefix = "method_call"
AND rhs_acc.name_suffix REGEXP "append|extend|insert|add|update)$"
AND rhs_acc.name_suffix != rhs_var.name_suffix)
GROUP BY op.span,
lhs_acc.name,
rhs_var.name
Aristide
(121 rep)
Feb 1, 2020, 12:59 PM
• Last activity: Jul 26, 2025, 12:02 PM
0
votes
1
answers
416
views
MariaDB: Why is this still using temporary and filesort - is it because I'm using a self-join? How to fix?
```EXPLAIN SELECT e2.personId, COUNT(*) FROM entries e1 JOIN entries e2 ON e2.categoryId = e1.categoryId AND e2.personId != e1.personId WHERE e1.personId = 1 GROUP BY e2.personId ``` gives: ``` id select_type table type key key_length ref rows extra 1 SIMPLE e1 ref personId_categoryId 4 const 59 Usi...
SELECT e2.personId, COUNT(*)
FROM entries e1
JOIN entries e2 ON e2.categoryId = e1.categoryId
AND e2.personId != e1.personId
WHERE e1.personId = 1
GROUP BY e2.personId
gives:
id select_type table type key key_length ref rows extra
1 SIMPLE e1 ref personId_categoryId 4 const 59 Using index; Using temporary; Using filesort
1 SIMPLE e2 ref categoryId_personId 4 project.e1.categoryId 8 Using where; Using index
If I remove the GROUP BY
I get "Using index".
What's the problem here? Is it something to do with joining a table onto itself?
(The indexes present on the table are the two shown in the explain output, containing the columns that the names suggest.)
DDL:
CREATE TABLE entries
(
id
int(10) unsigned NOT NULL AUTO_INCREMENT,
personId
int(10) unsigned NOT NULL,
categoryId
int(10) unsigned NOT NULL,
PRIMARY KEY (id
),
KEY personId_categoryId
(personId
,categoryId
),
KEY categoryId_personId
(categoryId
,personId
),
CONSTRAINT entries_ibfk_1
FOREIGN KEY (personId
) REFERENCES people
(id
) ON UPDATE CASCADE,
CONSTRAINT entries_ibfk_2
FOREIGN KEY (categoryId
) REFERENCES categories
(id
) ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1465605 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Codemonkey
(265 rep)
Jul 10, 2020, 07:48 AM
• Last activity: May 4, 2025, 05:07 AM
1
votes
1
answers
1073
views
MySQL Indexes on all column combinations
I have a table that basically represents the log of records ```mysql CREATE TABLE records ( id varchar(36), -- uuid text blob, user_id bigint, cluster_id bigint, status_id tinyint, session_id varchar, -- uuid, type tinyint, start_time timestamp, duration int, PRIMARY KEY (user_id, start_time, id), K...
I have a table that basically represents the log of records
CREATE TABLE records
(
id varchar(36), -- uuid
text blob,
user_id bigint,
cluster_id bigint,
status_id tinyint,
session_id varchar, -- uuid,
type tinyint,
start_time timestamp,
duration int,
PRIMARY KEY (user_id, start_time, id),
KEY general_index (start_time, cluster_id, user_id, id),
KEY endpoint_index (cluster_id, start_time, id)
);
The table is somehow large:
100 GB
10 GB index size
40 m records
Now the table is tuned to support filtering by user_id
, cluster_id
and start_time
.
But I need to filter (and sort) efficiently by almost every column with different combinations.
Adding more and more indexes representing search patterns (e.g. KEY general_index (start_time,duration,id)
) doesn't seem like a good long term solution.
Is there a best practice or something available for this for MySQL?
(unfortunately, using Elasticsearch or any other database is not an option)
First though I had is to have 1 PK index:
PRIMARY KEY (start_time,id)
and then many indexes that represent some search patterns like:
KEY cluster_index (cluster_id, id)
,
KEY user_index (user_id, id)
,
KEY duration_index (duration, id)
,
KEY status_index (status_id, id)
When you need to query the date - just filter using appropriate index(es)
SELECT *
FROM records t
WHERE start_time >= {x}
INNER JOIN (
SELECT id FROM records WHERE duration > 10000 -- supporting index duration_index
) t2 on t.id=t2.id
INNER JOIN (
SELECT id FROM records WHERE user_id = 123 -- supporting index user_index
) t3 on t.id=t3.id
ORDER BY [...]
LIMIT 100
But that turns out to work pretty slow. Even if the solution above uses the correct index for a subquery it still has to do the HUGE join with t
Alexrrr
(11 rep)
May 3, 2022, 12:06 PM
• Last activity: Apr 23, 2025, 03:06 PM
2
votes
1
answers
677
views
Best practices for large JOINs - Warehouse or External Compute (e.g. Spark)
I am working on a problem that requires a very large join. The JOIN itself is pretty straightforward but the amount of data I am processing is very large. I am wondering for very large JOINs, is there a preferred type of technology. For example, is it more effective to a Data Warehouse (like Snowfla...
I am working on a problem that requires a very large join. The JOIN itself is pretty straightforward but the amount of data I am processing is very large. I am wondering for very large JOINs, is there a preferred type of technology. For example, is it more effective to a Data Warehouse (like Snowflake) or in some other MPP system like Spark?
To make the problem more concrete I created a hypothetical problem similar to my actual problem. Assume I have a table that looks like this:
I am working on some logic that requires account pairs that have the same name. To find pairs of accounts with the same account I can easily do something like this:

SELECT
account1.name,
account2.name
FROM accounts as account1
JOIN accounts as account2 ON account1.name = account2.name AND account1.acount_id != account2.acount_id
The problem I am facing is due to the amount of data I am processing. There are roughly ~2 trillion records I am trying to self JOIN on. Obviously, this will take some time and some pretty serious compute. I have run a similar query in Snowflake using XL and 3XL warehouses but after several hours of running, I canceled them. I am hoping there is a more cost-effective or time-efficient way.
Has anyone had success with massive JOINs? Are there any other tricks I could deploy? What tool did you find the most effective?
Arthur Putnam
(553 rep)
Feb 24, 2022, 09:06 PM
• Last activity: Dec 18, 2024, 12:01 PM
1
votes
1
answers
198
views
Issues with Self-Referencing Foreign Key in SQLite
I'm having trouble with a self-referencing foreign key in SQLite. I've created a table to store employee details, where each employee can have a manager who is also an employee. Here's the table definition: ```sql CREATE TABLE Employees ( Employee_ID INTEGER PRIMARY KEY, Name TEXT, Manager_ID INTEGE...
I'm having trouble with a self-referencing foreign key in SQLite. I've created a table to store employee details, where each employee can have a manager who is also an employee. Here's the table definition:
CREATE TABLE Employees (
Employee_ID INTEGER PRIMARY KEY,
Name TEXT,
Manager_ID INTEGER,
FOREIGN KEY (Manager_ID) REFERENCES Employees(Employee_ID)
);
I've enabled foreign key constraints with PRAGMA foreign_keys = ON. However, when I insert data with a non-existent Manager_ID, SQLite does not raise an error. Here are the steps I've taken:
1. Enabled foreign key constraints:
PRAGMA foreign_keys = ON;
2. Created the table:
CREATE TABLE Employees (
Employee_ID INTEGER PRIMARY KEY,
Name TEXT,
Manager_ID INTEGER,
FOREIGN KEY (Manager_ID) REFERENCES Employees(Employee_ID)
);
3. Inserted data:
INSERT INTO Employees (Employee_ID, Name, Manager_ID) VALUES (1, 'Fitch', NULL);
INSERT INTO Employees (Employee_ID, Name, Manager_ID) VALUES (2, 'Alley', 1);
// This should fail because 9 doesn't exist, but doesn't:
INSERT INTO Employees (Employee_ID, Name, Manager_ID) VALUES (6, 'Blake', 9);
Despite these steps, SQLite does not enforce the foreign key constraint for Manager_ID.
SELECT * FROM Employees
Gives:
| Employee_ID | Name | Manager_ID |
|-------------|-------|------------|
| 1 | Fitch | NULL |
| 2 | Alley | 1 |
| 6 | Blake | 9 |
**Are Self-Referencing Foreign Keys supported in SQLite? If so, how do I get it to work?**
reubenjohn
(111 rep)
Nov 18, 2024, 04:28 AM
• Last activity: Nov 19, 2024, 11:59 AM
1
votes
1
answers
34
views
Multiple parents for AspNetUsers table for Asp.Net Core 3.1 app
I have the following table in MySQL.: CREATE TABLE `aspnetusers` ( `Id` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL, `UserName` varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL, `NormalizedUserName` varchar(256) CHARACTER SET utf8mb4 COLLATE utf8m...
I have the following table in MySQL.:
CREATE TABLE
aspnetusers
(
Id
varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL,
UserName
varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
NormalizedUserName
varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
Email
varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
NormalizedEmail
varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
EmailConfirmed
tinyint(1) NOT NULL,
PasswordHash
longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci,
SecurityStamp
longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci,
ConcurrencyStamp
longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci,
PhoneNumber
longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci,
PhoneNumberConfirmed
tinyint(1) NOT NULL,
TwoFactorEnabled
tinyint(1) NOT NULL,
LockoutEnd
datetime(6) DEFAULT NULL,
LockoutEnabled
tinyint(1) NOT NULL,
AccessFailedCount
int NOT NULL,
InsertUser
varchar(100) DEFAULT NULL,
InsertDateTime
datetime DEFAULT NULL,
UpdateUser
varchar(100) DEFAULT NULL,
UpdateDateTime
datetime DEFAULT NULL,
ParentId
varchar(255) DEFAULT NULL,
PRIMARY KEY (Id
),
UNIQUE KEY UserNameIndex
(NormalizedUserName
),
KEY EmailIndex
(NormalizedEmail
)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
Now my Asp.Net Core 3.1 app uses the above table, but I want to change in that way that a user can have null, one or multiple parents.
**What is best (and minimal) way to make this modification?**
Should I create a new table that connects the UserID with ParentID?
Hello World
(11 rep)
Jun 20, 2024, 07:48 PM
• Last activity: Jun 21, 2024, 04:00 PM
0
votes
1
answers
582
views
write SQL query that takes its own output as input
In a table that displays how trades are related to each other, starting from one trade, I want to find all other trades that relate to the first trade: 713613 PostPTETradeIDs 718478 713613 PrePTETradeId 711314 718478 PostPTETradeIDs 739522 718478 PrePTETradeId 713613 In this example, trade `718478`...
In a table that displays how trades are related to each other, starting from one trade, I want to find all other trades that relate to the first trade:
713613 PostPTETradeIDs 718478
713613 PrePTETradeId 711314
718478 PostPTETradeIDs 739522
718478 PrePTETradeId 713613
In this example, trade
718478
was created by splitting 713613
(row 1); and 739522
was created from 718478
(row 3), etc. (a PTE splits a trade into two).
select trade_id, keyword_name, keyword_value
from trade_keyword
where (keyword_name like 'LinkedTo' or keyword_name like 'PrePTETradeId' or keyword_name like 'PostPTETradeIDs' )
and trade_id in('718478', '739522','713613')
order by trade_id;
My goal is to identify trades that belong together and come from one another.
Currently, using the above query, I must fill all the TRADE_ID
s manually, i.e. start with 713613
, run the query, see that 718478
came from it, then run the query again searching for 718478
, too, to find that 739522
came from it, etc.
I would like to make this recursive, such that it will, in one query, given some starting TRADE_ID
(s), find all TRADE_ID
s that are related to these via any path of any of these relations.
In other words, if column 1 is X
and column 3 is f(X)
, then I want at once a list of all X
, f(X)
, f(f(X))
, f(f(f(X)))
, etc.
Preferably, there would be another column that lists the distance in steps upwards or downwards, where the PostPTETradeIDs
relation is upward and the PrePTETradeID
is downward (the LinkedTo
relationship does not change the counter):
713613
is distance 0,
718478
is distance 1,
739522
is distance 2.
I am using Oracle SQL. Is this issue better adressed via a subquery, and if so, how can it be made recursive; or some other functionality?
Marie. P.
(101 rep)
Aug 29, 2019, 02:25 PM
• Last activity: Sep 17, 2023, 09:36 AM
1
votes
1
answers
126
views
Bidirectional self join table
Consider: ``` CREATE TABLE object ( id integer NOT NULL DEFAULT nextval('object_seq'), ..., CONSTRAINT object_pk PRIMARY KEY (id) ); CREATE TABLE pairing ( object1 integer NOT NULL, object2 integer NOT NULL, CONSTRAINT pairing_pk PRIMARY KEY (object1, object2) ); ALTER TABLE pairing ADD CONSTRAINT o...
Consider:
CREATE TABLE object (
id integer NOT NULL DEFAULT nextval('object_seq'),
...,
CONSTRAINT object_pk PRIMARY KEY (id)
);
CREATE TABLE pairing (
object1 integer NOT NULL,
object2 integer NOT NULL,
CONSTRAINT pairing_pk PRIMARY KEY (object1, object2)
);
ALTER TABLE pairing ADD CONSTRAINT object1_fk FOREIGN KEY (object1) REFERENCES object (id);
ALTER TABLE pairing ADD CONSTRAINT object2_fk FOREIGN KEY (object2) REFERENCES object (id);
You have a table *object* with one column *id* and some other unimportant columns, and another table *pairing* which is joining the table *object* with itself, meaning two columns *object1* and *object2*, both foreign keys of object.id.
These two columns are semantically identical. Nothing like an adjacency list, you know, with one column being *parent* and the other being *child*. It means the columns are reversible, and because of that, you have the same data twice because the order of the pairing inside the table *pairing* is irrelevant.
Exemple:
object1-object2
1-2
1-3
1-4
2-1
3-1
4-1
1-2 is identical to 2-1, 1-3 to 3-1, etc.
You can query in any direction you want because what you want to know is with which object, an object is connected to. You can query either
* SELECT object2 FROM pairing WHERE object1 IN (SELECT id FROM object WHERE );
* SELECT object1 FROM pairing WHERE object2 IN (SELECT id FROM object WHERE );
Given that, is there another way to do that without having half the table being "duplicates", or just another more efficient way?
I could cut the table in half but it would force me to query in both direction at the same time and I fear 1/ it will degrade the performance given speed is of the essence 2/ I will have to make multiple self join against that very table and doing them in both directions, it might be doable but at best I feel it will be awful.
Here is an example of such query where I want to know if one object is paired with other objects, at the same time:
SELECT object1 FROM pairing p1
JOIN pairing p2 ON p1.object1 = p2.object1
JOIN pairing p3 ON p1.object1 = p3.object1
...
WHERE p1.object2 IN (SELECT id FROM object WHERE )
AND p2.object2 IN (SELECT id FROM object WHERE )
AND p3.object2 IN (SELECT id FROM object WHERE )
Same with a recursive CTE or just joining a table after the other like JOIN ON p1.object2 = p2.object1.
I don't feel like you can split a query in two and union the results later because when you use this table twice, it "spins it into a direction" and you can't use a separate query to see if it has the rest of the results in the other direction. Not sure I am clear there.
Another solution could be to collate in the FROM clause the table with itself in the other direction, recreating the table with "duplicates" but given this virtual table will be used all the time, it will be cached so wouldn't it be better to just have it this way from the beginning and not care about the double size?
Some_user
(61 rep)
May 8, 2023, 10:54 AM
• Last activity: May 11, 2023, 05:34 PM
3
votes
1
answers
378
views
Return recursive self joins in PostgreSQL with aggregated data
I have this simple table schema: [SQL Fiddle][1] **PostgreSQL 9.6 Schema Setup**: CREATE TABLE IF NOT EXISTS users( id INT NOT NULL PRIMARY KEY, email TEXT NOT NULL UNIQUE, password TEXT NOT NULL, first_name TEXT NOT NULL, last_name TEXT NOT NULL, is_active BOOLEAN DEFAULT FALSE, is_staff BOOLEAN DE...
I have this simple table schema:
SQL Fiddle
**PostgreSQL 9.6 Schema Setup**:
CREATE TABLE IF NOT EXISTS users(
id INT NOT NULL PRIMARY KEY, email TEXT NOT NULL UNIQUE,
password TEXT NOT NULL, first_name TEXT NOT NULL,
last_name TEXT NOT NULL, is_active BOOLEAN DEFAULT FALSE,
is_staff BOOLEAN DEFAULT FALSE, is_superuser BOOLEAN DEFAULT FALSE
);
INSERT INTO users (
id, email, password, first_name, last_name,
is_active, is_staff, is_superuser
)
VALUES
(
1, 'sir@a.com', '23456ses', 'John',
'Idogun', true, true, true
),
(
2, 'ma@a.com', '23456ses', 'Nelson',
'Idogun', true, true, true
);
-- articles table
CREATE TABLE IF NOT EXISTS articles(
id INT NOT NULL PRIMARY KEY,
fore_image TEXT NULL,
title TEXT NOT NULL,
slug TEXT NOT NULL UNIQUE,
content TEXT NOT NULL,
is_published BOOLEAN DEFAULT FALSE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
user_id INT NOT NULL,
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
);
INSERT INTO articles (
id, title, slug, content, is_published,
user_id
)
VALUES
(
1, 'Upgrading from v0.4 to v0.5 of rust rocket',
'Upgrading from v0.4 to v0.5 of rust rocket',
'Upgrading from v0.4 to v0.5 of rust rocket',
true, 1
);
CREATE TABLE IF NOT EXISTS comment(
id INT NOT NULL PRIMARY KEY,
article_id INT NOT NULL REFERENCES articles(id) ON DELETE CASCADE,
user_id INT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
responded_to_comment_id INT NULL REFERENCES comment(id) ON DELETE CASCADE,
content TEXT NOT NULL
);
INSERT INTO comment (id, article_id, user_id, responded_to_comment_id, content)
VALUES
(1, 1, 1, NULL, 'First comment'),
(
2, 1, 2, 1, 'First reply'
),
(3, 1, 1, 2, 'Response to the first reply.');
**Query 1**:
SELECT
c.id AS c_id,
c.content AS c_content,
json_agg(r) as replies,
u.id AS u_id
FROM comment AS c
JOIN users AS u ON c.user_id = u.id
LEFT JOIN comment r ON r.id = c.responded_to_comment_id
WHERE c.article_id = 1
GROUP BY c.id, u.id
**Results **:
| c_id | c_content | replies | u_id |
|------|--------------------------|------------------------------------------------------------------------------------------------|------|
| 1 | First comment | [null] | 1 |
| 2 | First reply | [{"id":1,"article_id":1,"user_id":1,"responded_to_comment_id":null,"content":"First comment"}] | 2 |
| 3 | Response to first reply. | [{"id":2,"article_id":1,"user_id":2,"responded_to_comment_id":1,"content":"First reply"}] | 1 |
I am trying to write a query that selects comments under an article and recursively nests all comments having
responded_to_comment_id
inside their parent comment so I will have only one row returned:
| c_id | c_content | replies | u_id |
|------|---------------|-----------|------|
| 1 | First comment | [...] | 1 |
and that the results for the replies
column will be something like this:
[
{
"id": 2,
"user_id": 1,
"responded_to_comment_id": 1,
"content": "First reply",
"replies": [
{
"id": 3,
"article_id": 1,
"user_id": 2,
"responded_to_comment_id": 2,
"content": "Response to the first reply."
}
]
}
]
Only top-level comments (comments whose responded_to_comment_id
is NULL
) should be at the top.
Sirjon
(165 rep)
Mar 15, 2023, 10:18 PM
• Last activity: Mar 19, 2023, 02:14 PM
7
votes
1
answers
1449
views
Recursivly get a Tree Via self joined table
Using other questions here and Postgresql documentation I've managed to build a many-to-many self joined table. However adding a `WHERE` clause is giving me trouble. **Problem:** A `Category` can have many child categories, and many parent categories. Given a `category.Id`, I want to retrieve the ca...
Using other questions here and Postgresql documentation I've managed to build a many-to-many self joined table.
However adding a
WHERE
clause is giving me trouble.
**Problem:**
A Category
can have many child categories, and many parent categories. Given a category.Id
, I want to retrieve the category, the category children, children's children and so on.
**Example**: given this structure:
child_1
child_11
child_111
child_112
child_1121
child_21
child_2
Given: a clause of id = child_11
Expected results:
child_11, child_111, child_112, child_1121
,
Actual results: child_11, child_111, child_112
Here is my attempt: http://sqlfiddle.com/#!17/3640f/2
In case Sqlfiddle is down: https://www.db-fiddle.com/#&togetherjs=LhDjxfPHo6
*Note: I don't care about duplicating the where clause, my application can handle that*
**Table structure:**
CREATE TABLE Category(id SERIAL PRIMARY KEY, name VARCHAR(255));
CREATE TABLE Categories(parent_id INTEGER, child_id INTEGER, PRIMARY KEY(parent_id, child_id));
ALTER TABLE Categories ADD FOREIGN KEY (parent_id) REFERENCES category (id);
ALTER TABLE Categories ADD FOREIGN KEY (child_id) REFERENCES category (id);
**Table data:**
INSERT INTO Category(id, name) VALUES (1, 'parent_1');
INSERT INTO Category(id, name) VALUES (2, 'child_1');
INSERT INTO Category(id, name) VALUES (3, 'child_2');
INSERT INTO Category(id, name) VALUES (4, 'child_3');
INSERT INTO Category(id, name) VALUES (5, 'child_1_1');
INSERT INTO Category(id, name) VALUES (6, 'child_1_2');
INSERT INTO Category(id, name) VALUES (7, 'child_1_1_1');
INSERT INTO Category(id, name) VALUES (10, 'child_of_many');
INSERT INTO Category(id, name) VALUES (11, 'parent_1');
INSERT INTO Category(id, name) VALUES (12, 'parent_2');
INSERT INTO Categories(parent_id, child_id) VALUES (1, 2);
INSERT INTO Categories(parent_id, child_id) VALUES (1, 3);
INSERT INTO Categories(parent_id, child_id) VALUES (1, 4);
INSERT INTO Categories(parent_id, child_id) VALUES (2, 5);
INSERT INTO Categories(parent_id, child_id) VALUES (2, 6);
INSERT INTO Categories(parent_id, child_id) VALUES (5, 7);
My query which is giving me the children, but not the children's children etc.
If I remove the WHERE clause's I can get all rows:
WITH RECURSIVE categories_category AS (
SELECT id, 'Category' AS COLUMN_TYPE, c1.name
FROM Category c1
WHERE c1.id=2
UNION
SELECT c2.id, 'Category' AS COLUMN_TYPE, c2.name
FROM Category c1
INNER JOIN categories cs1 ON c1.id = cs1.parent_id
INNER JOIN Category c2 ON c2.id = cs1.child_id
WHERE cs1.parent_id = 2
) SELECT * FROM categories_category
**Edit: More Detailed example:**
Given the following **category row's**, I'd like to be able to run a query given a WHERE clause the matches the id of warmStoreAlcohol
and get a result of:
+---+------------------+
|id |name |
+---+------------------+
|2 |warmStoredAlcohol |
|3 |vodka |
|4 |beer |
|5 |frozenBeer |
+---+------------------+
coldStoredAlcohol
would give a result of:
+---+------------------+
|id |name |
+---+------------------+
|6 |coldStoredAlcohol |
|5 |frozenBeer |
|7 |cooler |
+---+------------------+
The database structure will not change often. In this example 'frozenBeer' has two parents, and should return for querying both warmStoredAlcohol
and coldStoredAlcohol
.
**I am open to changing the table structure, adding new tables and even upgrading postgres version etc.** The database will hold ~2,000 rows, therefore I value an easy-to-understand table structure over a super complicated optimal one. (But any solution is better than my broken one)

dustytrash
(171 rep)
Feb 21, 2020, 02:58 PM
• Last activity: Nov 12, 2022, 06:58 PM
0
votes
0
answers
753
views
T-SQL, Fast self join on the next available date where dates skip weekends and multiple rows can have the same date
TL;DR, I'll post back later with more info including table, view, and index definitions; before/after results; and query plans to help to answer this question. I'm trying to use a self-join to a view to get values from the next available day. The idea is to be able to calculate the change in values...
TL;DR,
I'll post back later with more info including table, view, and index definitions; before/after results; and query plans to help to answer this question.
I'm trying to use a self-join to a view to get values from the next available day. The idea is to be able to calculate the change in values for a given day when values are only reported once at the beginning of the day. So the current day's beginning values = the prior day's ending values. In this case, the application is stock option open interest, but this type of JOIN can apply to other scenarios.
The tricky part of the JOIN is that there are multiple rows for each day, and there are gaps between days for weekends and holidays. The query I'm writing currently SUMs values, then does a non-equi self-join to optionally filter out rows while still returning correct SUMs overall. (That self-join works and is not important to this question, I'm just explaining its existence in the query.) The query's results are then PIVOTed (not shown). Using CTEs with PIVOTs is a whole different beast that I'm trying to avoid. In any case, I think a self-join is the best and easiest solution, if only I can get the logic right.
The view's columns contain raw data, aggregates, and analytic functions, and would be harder to edit to show aggregates of the next available day's data than editing the query. The underlying table has a clustered index on the columns AsOf, Expiration, and Strike, which are 3 of the 4 columns I'm joining on. I don't believe joining using ROW_NUMBER() would work because there are multiple rows with the same date, and multiple other columns being joined on, but I could be wrong. I've tried the LEAD() function in the query's SELECT, but Microsoft's LEAD() function doesn't support ignoring NULLs. Also, having to group by the LEAD()'s column ungroups everything else and throws off the SUMs. So my ideal solution is back to using a self-join. I've tried joining using EXISTS and IN in a CASE statement (current AsOf date +1, +2, etc until a match) but those were extremely slow and they joined not just the next available day's rows, but all of the consecutive dates until they hit a non-consecutive date going from Fri to Mon. Maybe that can work, but I couldn't get it to.
I've now got a JOIN (shown below) that I believe is correct and less slow than testing individual EXISTS or INs in a CASE statement, but it's still too slow. At the bottom, I've got another JOIN using a COALESCE that's much faster, but flawed. It correctly joins once to the next day, but it doesn't work with non-consecutive dates. Any help is much appreciated. My brain is turning to mush :)
I've simplified the query below. The self-join to get rows from the next available day (LEFT JOIN OCCALLS AS ND) returns the correct results, but it takes minutes to run for just a few days, which won't work over longer periods. For reference, the query returns results for all dates (1M+ rows in source table) in 0.00 seconds without the troublesome self-join (which is the same as the working self join above it, alias B, except for the logic on the AsOf date column).
SELECT
A.Ticker
,'Call' AS 'Option'
,A.AsOf --Data AsOf, contains one row per experation per strike price for each trading day
,ND.AsOf AS 'NextDay'--Data AsOf Next [available] Day
,B.CallDelta
,SUM(A.[CVol@Del]) AS 'CallVol'--Total Vol. If B.CallDelta range is specified, 1 row returned for total B.CallDelta in range, and 1 row returned for total outside of range
,SUM(A.OIB) AS 'OIB' --OI at BOD
,SUM(ND.OIB) AS 'OIE' --OI at EOD = Next day BOD
,A.[V:DTot] --Total daily call vol
FROM OCCALLS AS A
LEFT JOIN OCCALLS AS B
ON A.Ticker = B.Ticker
AND A.AsOf = B.AsOf
AND A.Expiration = B.Expiration
AND A.Strike = B.Strike
-- AND B.CallDelta BETWEEN 0.95 AND 1.00
LEFT JOIN OCCALLS AS ND --Next [available] Day
ON A.Ticker = ND.Ticker
AND (SELECT MIN(AsOf) FROM OCCALLS WHERE AsOf > A.AsOf)
= ND.AsOf
AND A.Expiration = ND.Expiration
AND A.Strike = ND.Strike
WHERE 1=1
AND A.Ticker = 'SPY'
-- AND A.AsOf >= '2020-01-01'
AND A.AsOf BETWEEN '2022-07-07' AND '2022-07-11'
GROUP BY
A.Ticker
,A.AsOf
,ND.AsOf
,B.CallDelta
,A.[V:DTot]
,A.[Open]
,A.[High]
,A.[Low]
,A.[Close]
ORDER BY A.AsOf , B.CallDelta
Another option is using COALESCE (shown below) or CASE in the JOIN's ON clause. With CASE, I found that it doesn't stop evaluating expressions after the first successful one and so joins all consecutive dates until it hits a gap between Fri and Mon. I got closer with COALESCE in that it only joins once to the next day, but it also only joins when the next date is consecutive. So COALESCE is the next closest I've come to a correct and fast join. I know my logic for it is incorrect, but I'm not sure how to fix it with valid syntax. I'm guessing it'll have to be rewritten back to a more complex CASE. I want to believe COALESCE or CASE can work and could be the fastest solutions, but maybe ROW_NUMBER() is an option too? In my example below, I'd limit it to checking 7 days ahead for a date for which rows exist, which should be more than enough.
LEFT JOIN OCCALLS AS ND --Next [available] Day
ON A.Ticker = ND.Ticker
AND A.AsOf = COALESCE( DATEADD(d,-1,ND.AsOf)
,DATEADD(d,-2,ND.AsOf)
,DATEADD(d,-3,ND.AsOf)
,DATEADD(d,-4,ND.AsOf)
,DATEADD(d,-5,ND.AsOf)
,DATEADD(d,-6,ND.AsOf)
,DATEADD(d,-7,ND.AsOf))
AND A.Expiration = ND.Expiration
AND A.Strike = ND.Strike
mastaofdisastax
(13 rep)
Jul 20, 2022, 10:19 AM
• Last activity: Jul 20, 2022, 05:11 PM
0
votes
1
answers
685
views
How to optimized self-join view of 50 millions rows table
I've the following simplified table with ~50M rows. table sample ( id uuid not null primary key, measured_date timestamp with time zone not null, segment_id uuid not null, activity_id uuid not null, value integer not null ); Indexes: "sample_pkey" PRIMARY KEY, btree (id) "sample_idx" btree (segment_...
I've the following simplified table with ~50M rows.
table sample (
id uuid not null primary key,
measured_date timestamp with time zone not null,
segment_id uuid not null,
activity_id uuid not null,
value integer not null
);
Indexes:
"sample_pkey" PRIMARY KEY, btree (id)
"sample_idx" btree (segment_id, measured_date)
"sample_uniq" UNIQUE CONSTRAINT, btree (segment_id, activity_id, measured_date)
"sample_activity_idx" btree (activity_id)
I would like to get for each measured gap (between two different dates) my calculated value.
My query is as follow:
SELECT ROW_NUMBER () OVER () AS id,
t1.segment_id AS segment_id,
t1.activity_id AS activity_id,
t1.measured_date AS from_date,
t2.measured_date AS to_date,
t2.value AS cumulative_progress,
(t2.value - t1.value) AS marginal_progress,
FROM sample AS t1 JOIN sample AS t2
ON t1.activity_id = t2.activity_id AND t1.segment_id = t2.segment_id and t1.measured_date WindowAgg (cost=1.13..17.22 rows=1 width=154) (actual time=0.063..0.914 rows=171 loops=1) |
| -> Nested Loop (cost=1.13..17.18 rows=1 width=124) (actual time=0.056..0.698 rows=171 loops=1) |
| Join Filter: (t1.measured_date Index Scan using sample_uniq on sample t1 (cost=0.56..8.58 rows=1 width=70) (actual time=0.021..0.043 rows=19 loops=1) |
| Index Cond: ((segment_id = '00021c8d-7162-467d-8e6a-4cb62926bf53'::uuid) AND (activity_id = '34a4b908-4613-422a-b6d0-4bb112737f09'::uuid)) |
| -> Index Scan using sample_uniq on sample t2 (cost=0.56..8.58 rows=1 width=86) (actual time=0.005..0.030 rows=19 loops=19) |
| Index Cond: ((segment_id = '00021c8d-7162-467d-8e6a-4cb62926bf53'::uuid) AND (activity_id = '34a4b908-4613-422a-b6d0-4bb112737f09'::uuid)) |
|Planning Time: 0.321 ms |
|Execution Time: 1.097 ms |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
However, when I create a view to use this query often the results are poor.
Creating the view:
CREATE OR REPLACE VIEW sample_marginal AS
SELECT ROW_NUMBER () OVER () AS id,
t1.segment_id AS segment_id,
t1.activity_id AS activity_id,
t1.measured_date AS from_date,
t2.measured_date AS to_date,
t2.value AS cumulative_progress,
(t2.value - t1.value) AS marginal_progress,
FROM sample AS t1 JOIN sample AS t2
ON t1.activity_id = t2.activity_id AND t1.segment_id = t2.segment_id and t1.measured_date WindowAgg (cost=14106816.44..22564147.32 rows=51297073 width=154) |
| -> Gather (cost=14106816.44..20768749.77 rows=51297073 width=124) |
| Workers Planned: 2 |
| -> Merge Join (cost=14105816.44..15638042.47 rows=21373780 width=124) |
| Merge Cond: ((t2.activity_id = t1.activity_id) AND (t2.segment_id = t1.segment_id)) |
| Join Filter: (t1.updated_by_date Sort (cost=4751689.83..4797948.93 rows=18503642 width=86) |
| Sort Key: t2.activity_id, t2.segment_id |
| -> Parallel Seq Scan on sample t2 (cost=0.00..1632749.42 rows=18503642 width=86) |
| -> Materialize (cost=9354126.62..9576170.32 rows=44408740 width=70) |
| -> Sort (cost=9354126.62..9465148.47 rows=44408740 width=70) |
| Sort Key: t1.activity_id, t1.segment_id |
| -> Seq Scan on sample t1 (cost=0.00..1891800.40 rows=44408740 width=70) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I actually never waited for this query to finish since it took extremely long time, and I thought of fixing it before I can use it.
Running count(*) however, resulted in 1.8B rows (execution was slow of course).
Any idea on how to improve my view?
Cowabunga
(145 rep)
Apr 27, 2022, 09:52 PM
• Last activity: Apr 28, 2022, 03:05 AM
0
votes
2
answers
144
views
Improving query (Doing a nasty self join on a range with ctes)
-- The CTE1 is just to create columns that dictate the bound of what is considered the same entry -- Also I do a dense rank by ACT_TIME, and a PARITION BY on ID1, ID2 -- so all ID1/ID2 combos are ranked by when they ran WITH cte1 AS (SELECT *, (X-2) AS X_START, (X+2) AS X_END, (Y-2) AS Y_START, (Y+2...
-- The CTE1 is just to create columns that dictate the bound of what is considered the same entry
-- Also I do a dense rank by ACT_TIME, and a PARITION BY on ID1, ID2
-- so all ID1/ID2 combos are ranked by when they ran
WITH
cte1 AS (SELECT *, (X-2) AS X_START, (X+2) AS X_END, (Y-2) AS Y_START, (Y+2) AS Y_END, (Z*1.2) AS Z_MAX,
DENSE_RANK() OVER (PARTITION BY ID1, ID2 ORDER BY ACT_TIME) AS DENSE_RANK
FROM data
ORDER BY data.ACT_TIME)
,cte2 AS (
-- Create new set of column as comparisons
SELECT
ID AS ID_COMP,
ID1 AS ID1_COMP,
ID2 AS ID2_COMP,
X_START AS X_START_COMP,
X_END AS X_END_COMP,
Y_START AS Y_START_COMP,
Y_END AS Y_END_COMP,
Z AS Z_MAX_COMP,
DENSE_RANK AS DENSE_RANK_COMP
FROM cte1)
, cte3 AS (
-- join cte1 on cte2 only when X and Y value from cte1 is between the limits of cte2 AND
-- Z max value from cte 2 is larger than Z value from cte1 AND ID1/ID2 match
-- The result will have an ID of a row that should be removed since their x and y was in between the limits
-- Then remove any rows where rank from cte2 is higher than cte1
-- Remove any rows that were joined onto it self
SELECT
cte1.* ,
cte2.*
FROM cte1
JOIN cte2 ON (( cte2.X_END_COMP >= cte1.X AND cte1.X >= cte2.X_START_COMP) AND (cte2.Y_END_COMP >= cte1.Y AND cte1.Y>= cte2.Y_START_COMP)
AND (cte1.Z cte2.ID_COMP AND cte2.DENSE_RANK_COMP <= cte1.DENSE_RANK)
-- Any IDs that shows up in cte3 remove from the final result
SELECT data.*
FROM data
WHERE ID NOT IN (SELECT DISTINCT ID FROM cte3)
ORDER BY data.ACT_TIME
Here's my create table
CREATE TABLE
Here's how the query should work.
I want to remove any rows (that have the same ID1 and ID2) that occur later on where X and Y are in between +-2 and a Z less than 1.2*z
Here's an example input
Example output
This query takes about 5min with 2.5M rows.
I am on MariDB 10.5.5
Any and all help is appreciated!
EDIT for Rick James here's your explain {query}
here is the explain result
data
(
ID
INT(11) NOT NULL AUTO_INCREMENT,
ID1
VARCHAR(10) NULL DEFAULT NULL COLLATE 'latin1_swedish_ci',
ID2
INT(11) NULL DEFAULT NULL,
ACT_TIME
TIMESTAMP NULL DEFAULT NULL,
X
FLOAT(12) NULL DEFAULT NULL,
Y
FLOAT(12) NULL DEFAULT NULL,
Z
FLOAT(12) NULL DEFAULT NULL,
PRIMARY KEY (ID
) USING BTREE,
INDEX ID1
(ID1
) USING BTREE,
INDEX ACT_TIME
(ACT_TIME
) USING BTREE
);
Here's the EXPLAIN {query} result




mike_gundy123
(103 rep)
Oct 6, 2021, 03:50 PM
• Last activity: Oct 7, 2021, 04:55 PM
0
votes
1
answers
490
views
Self Join a Table to Get a Pair of Value But Could Not Receive the Expected Results
I am trying to self join the relation casts(movie_id,cast_id,cast_name). I want to query one pair of actors who have appeared in at least 1 movies together. The self pair must be excluded, and keep the mirror pair. This is the relation table: |movie_id|cast_id|cast_name| |:-------|:------|:--------|...
I am trying to self join the relation casts(movie_id,cast_id,cast_name). I want to query one pair of actors who have appeared in at least 1 movies together. The self pair must be excluded, and keep the mirror pair.
This is the relation table:
|movie_id|cast_id|cast_name|
|:-------|:------|:--------|
|12345|10|Trung|
|12340| 10| Trung|
|12341| 10| Trung|
|12342| 10| Trung|
|12343| 11| Thi|
|12340| 12| Nhi|
|12341| 12| Nhi|
|12342| 12| Nhi|
|12343| 12| Nhi|
|12340| 13| David|
|12345| 12| Nhi|
This is what I've tried:
SELECT c.cast_id, c2.cast_id, c.movie_id
FROM casts c, casts c2
WHERE c.cast_id != c2.cast_id AND c.movie_id = c2.movie_id;
Instead of getting pairs of different cast_id
, I got pairs of same cast_id
.
|cast_id|cast_id|movie_id|
|:-------|:------|:--------|
|10| 10| 12345|
|10| 10| 12340|
|10| 10| 12340|
|10| 10| 12341|
|10 |10 |12342|
|11 |11 |12343|
|12| 12| 12340|
|12 |12| 12340|
|12| 12| 12341|
|12| 12| 12342|
|12| 12| 12343|
|13 |13 |12340|
|13| 13| 12340|
|12| 12| 12345|
However, I got different pairs of actors if I try replace one of the attributes to cast_name
like:
SELECT c.cast_name, c2.cast_id, c.movie_id
FROM casts c, casts c2
WHERE c.cast_id != c2.cast_id AND c.movie_id = c2.movie_id;
|cast_id|cast_id|movie_id|
|:-------|:------|:--------|
|Nhi| 10| 12345|
|David| 10| 12340|
|Nhi| 10| 12340|
|Nhi| 10| 12341|
|Nhi |10 |12342|
|Nhi |11 |12343|
|David| 12| 12340|
|Trung |12| 12340|
|Trung| 12| 12341|
|Trung| 12| 12342|
|Thi| 12| 12343|
|Nhi |13 |12340|
|Trung| 13| 12340|
|Trung| 12| 12345|
Could you please explain the reason for this behavior and how to fix it to receive the expected results?
Thank you!
Trung Nguyen
(3 rep)
Oct 1, 2021, 06:10 AM
• Last activity: Oct 1, 2021, 05:35 PM
2
votes
0
answers
920
views
SQL Server - Self join on table is very slow
Working with Microsoft SQL Server 2017 (RTM-GDR) (KB4505224) - 14.0.2027.2 (X64) We have a single table, which is planned to contain about 1.5 - 2, maybe max 3 million records at a time. Records will continuously be moved into historical tables. Here is the table definition: ``` CREATE TABLE [dbo].[...
Working with Microsoft SQL Server 2017 (RTM-GDR) (KB4505224) - 14.0.2027.2 (X64)
We have a single table, which is planned to contain about 1.5 - 2, maybe max 3 million records at a time. Records will continuously be moved into historical tables. Here is the table definition:
Unfortunately this doesn't say too much to me. I can also export the plan and upload it as XML if it helps.
Finally I'm adding the script I used to generate the random data:
CREATE TABLE [dbo].[MyTable](
[Id] [int] IDENTITY(1,1) NOT NULL,
[MainKey] [nvarchar](20) NOT NULL,
[SecondaryKey] [nvarchar](100) NOT NULL,
[Value] [nvarchar](max) NULL,
CONSTRAINT [PK_MyTable] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
CONSTRAINT [UC_MyTable] UNIQUE NONCLUSTERED
(
[MainKey] ASC,
[SecondaryKey] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
The table will basically store key-value pairs belonging to multiple entities.
is the key of the entity,
is the key in the key-value pair, and
is obviously the value.
Now, the problem: we need to select all key-value pairs for all entities, where there is at least one key-value pair in which both the key and the value matches certain filters. Unfortunately, for the key filter, we need to use
because there is a number in the middle of the string, and it can be any number. This is the query we have now:
select *
from MyTable
where
MainKey in (
select MainKey
from MyTable
where
SecondaryKey LIKE 'begin.%.end'
and Value in ('special_value1','special_value7','special_value10')
)
order by MainKey, SecondaryKey;
This query runs in 17-21 seconds for a total of 3 003 216 records at the moment, and returns 914 040 matching records (using random-generated dummy data). This query will run many times, so I'm trying to make it run faster. I don't have a very specific goal for execution time, it should be "as fast as possible", but let's say around 5 seconds would be an achievement already.
My initial idea was the LIKE and the IN filtering in the inner query, however when I'm running only that part, it returns 15 160 records un 1-2 seconds. So it seems it is getting slower with the self-join. This is the actual execution plan:

declare @count integer = 1;
declare @count2 integer;
declare @count3 integer;
declare @random integer;
declare @value_match_random integer = 30;
-- total of 50 000 separate entity keys
while @count <= 50000
begin
-- between 30 and 60 key-value pairs for each entity
select @count2 = abs(checksum(newid()) % 60) + 30;
set @count3 = 1;
-- insert random records which will never match the criteria
while @count3 <= @count2
begin
insert into MyTable (MainKey, SecondaryKey, Value) values (@count, concat('SecondaryKey_', @count3), concat('Value_', @count3));
set @count3 = @count3 + 1;
end;
select @random = abs(checksum(newid()) % 100) + 1;
-- give @value_match_random % chance to insert a matching key-value pair also
if @random <= @value_match_random
begin
insert into MyTable (MainKey, SecondaryKey, Value) values (@count, 'begin.1.end', 'special_value1');
end;
set @count = @count + 1;
end;
Added execution plan here: https://www.brentozar.com/pastetheplan/?id=r12VQEqbt
Another update is that today (after 1 week when I was on leave), I ran the same query again on the same table with same contents, and SSMS suggested that I create the following index:
CREATE NONCLUSTERED INDEX idx_MyTable_SecondaryKey
ON [dbo].[MyTable] ([SecondaryKey])
INCLUDE ([MainKey],[Value])
GO
I created it, and collected the above execution plan after this index is created. 1 week earlier, when I posted this question, I don't remember seeing this index suggestion.
Gábor Major
(163 rep)
Aug 19, 2021, 05:06 PM
• Last activity: Aug 30, 2021, 10:34 AM
0
votes
0
answers
1363
views
Flatten multiple rows from One to Many to single row
This questions is similar to what I need but the answer seems rather specific to the question rather than a general answer. https://dba.stackexchange.com/questions/270402/one-to-many-relationship-query-result-in-a-single-row-and-dynamic-multiple-colum I have a table of games (we will call this table...
This questions is similar to what I need but the answer seems rather specific to the question rather than a general answer.
https://dba.stackexchange.com/questions/270402/one-to-many-relationship-query-result-in-a-single-row-and-dynamic-multiple-colum
I have a table of games (we will call this table matches)
gameID Home Team Away Team DateTime
----- ----------- ---------- ----------
1 Barcelona Real Madrid 12-08-21 10:00:00
2 Valencia Sevilla 12-08-21 10:00:00
Each game has potentially several attributes, half time score, full Tim score, extra time score, penalty score, attendance etc.
This is stored in a one to many table which looks like (we will call this match attributes)
id game id attribute id attribute value
----- -------- ----------- -----------------
1 1 1 3
2 1 2 3
3 1 5 32123
4 2 1 4
5 2 2 0
Where attribute id is a foreign key for this table (we will call this attributes)
attribute id attribute name
---------- ---------------
1 Home Team Extra Time Score
2 Away Team Extra Time Score
5 Game Attendance
8 Home Team Penalty Score
5 Away Team Penalty Score
At the moment when I want to show a match to a user I query the matches table and separately query the match attributes table, ideally however I could make one join that merges the several results from the match attributes table into columns of the matches table. Here are two examples.
Querying 1 game (Where gameId = 1)
gameID Home Team Away Team DateTime Home Team Extra Time Score Away Team Extra Time Score Attendance
----- ----------- ---------- ---------- ---------------------------- ---------------------------- ----------
1 Barcelona Real Madrid 12-08-21 10:00:00 3 3 32123
Querying multiple games (game id = 1 or 2)
gameID Home Team Away Team DateTime Home Team Extra Time Score Away Team Extra Time Score Attendance
----- ----------- ---------- ---------- ---------------------------- ---------------------------- ----------
1 Barcelona Real Madrid 12-08-21 10:00:00 3 3 32123
2 Valencia Sevilla 12-08-21 10:00:00 4 0 null
How would this be possible, I don't even know what functionality I could use to do this, as I don't think joins would work with multiple rows like this
Marc-9
(23 rep)
Jun 29, 2021, 04:02 AM
• Last activity: Jun 30, 2021, 07:37 AM
0
votes
0
answers
24
views
How do I join rows from the same table with slightly different info?
We have a new sector of our company that is being put into our employee record feed. However this sector is listed as contractors as there are still parts of the sector that need access to systems that require domain access... so... I have: TABLE EMP EmpID, email, INFO1, INFO2, INF03... and for Tom...
We have a new sector of our company that is being put into our employee record feed. However this sector is listed as contractors as there are still parts of the sector that need access to systems that require domain access... so...
I have:
TABLE EMP
EmpID,
email,
INFO1,
INFO2,
INF03...
and for Tom Smith
5763,
tom.smith@co1.com,
blah blah,
blah blah,
blah blah...
and also for Tom Smith
ZZ98401,
tom.smith@co2.com,
null,
null,
null...
Not everyone on the feed is like this, only about 10%. Most are in case number 2.
What I am looking for is the easiest way to equate the two rows associated with Tom and fill in as much info on Tom as I can because some fields come from one account and other fields come from the other. The only thing that can accurately make the two rows common is the beginning of the email address - this is standardized. So there should be a few thousand people (rows of duplicates) on my feed that I can combine if it can be done right.
I would also like to capture both EmpIDs somehow in the same row and both email addresses.
LOSTinDB
(551 rep)
Jun 24, 2021, 09:24 PM
0
votes
2
answers
47
views
T SQL, Get new customer added or deleted from historical table
I have a Table containing historical data. Every day the same staff list will be added with the new dates (between 1200-12500 staff each day however only a few staff might be newly added todaysDate or they may have left the company yesterday and not showing on todays list). I want to know each day w...
I have a Table containing historical data. Every day the same staff list will be added with the new dates (between 1200-12500 staff each day however only a few staff might be newly added todaysDate or they may have left the company yesterday and not showing on todays list).
I want to know each day which staff wasn't on yesterday list, it is new staff and the staff was showing yesterday but not today because they left company. I have tried the following but no joy please help
my historical table is like below and I need a script to get the rows for ID4 and ID5 only.
| StaffID | StaffName | CensusDate | |
| --- | --- | --- | --- |
| ID1 | Staff1 | 02/03/2021 |
| ID1 | Staff1 | 03/03/2021 |
| ID2 | Staff2 | 02/03/2021 |
| ID2 | Staff2 | 03/03/2021 |
| ID3 | Staff3 | 02/03/2021 |
| ID3 | Staff3 | 03/03/2021 |
| ID4 | Staff4 | 02/03/2021 | not showing today
| ID5 | Staff5 | 03/03/2021 | is newly added today
| ID6 | Staff6 | 02/03/2021 |
| ID6 | Staff6 | 03/03/2021 |
| ID7 | Staff7 | 02/03/2021 |
| ID7 | Staff7 | 03/03/2021 |
SELECT StaffID, StaffNAME, CensusDate
FROM table1 AS A
LEFT JOIN table1 AS B
ON A.StaffID = B.StaffID AND A.date = GETDATE()) AS date)
and B.date = CAST(DATEADD(day, -1, convert(date, GETDATE())) AS date)
AND A.staffID IS NULL
Nemo
(3 rep)
Mar 8, 2021, 11:36 PM
• Last activity: Mar 9, 2021, 09:07 AM
0
votes
2
answers
131
views
Explaining what the OR operator does in the following code
I ran into the following SQL code which (successfully) outputs a list of parents and the age of their youngest child: SELECT parents.name AS name, MIN(children.age) AS age FROM people children INNER JOIN people parents ON (parents.id = children.fatherId OR parents.id = children.motherId) GROUP BY pa...
I ran into the following SQL code which (successfully) outputs a list of parents and the age of their youngest child:
SELECT parents.name AS name, MIN(children.age) AS age FROM people children
INNER JOIN people parents ON (parents.id = children.fatherId OR parents.id = children.motherId)
GROUP BY parents.id
The code self joins a table named "people" on itself. I just wanted to ask how does the OR operator work here? I know OR as a logical operator but here it seems it does something else. It takes two arguments and just joins on both of them. What does it have to do with logical OR?
Friedman
(103 rep)
Feb 23, 2021, 04:12 PM
• Last activity: Feb 23, 2021, 07:09 PM
-2
votes
2
answers
632
views
Nontrivial joining a relation with itself
I am a philosopher and am doing my first steps in relational algebra. So here is a puzzle: Let say that we have a relation R with attributes 'R.a' and 'R.b'. Is it possible to make a renaming and natural join operations on R and receive a triple (a, b, a) as a result? If I make the following steps:...
I am a philosopher and am doing my first steps in relational algebra. So here is a puzzle:
Let say that we have a relation R with attributes 'R.a' and 'R.b'. Is it possible to make a renaming and natural join operations on R and receive a triple (a, b, a) as a result?
If I make the following steps:
1. rename R to S.
2. rename all attributes in S to 'S.a', 'S.b'.
3. rename 'S.b' on 'R.b' and
4. make a natural join of R and S.
Will it work? How do I write such a request properly?
Olga
(1 rep)
Jul 31, 2018, 06:27 PM
• Last activity: Dec 27, 2020, 05:31 AM
Showing page 1 of 20 total questions