Sample Header Ad - 728x90

Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes
3 answers
3751 views
Is it a good idea to join data from two different databases?
I am building an application that requires a lot of tables in one database and while the joining and aggregation of data is really nice and seamless, I am starting to wonder if I am creating too many tables in one database rather than organizing better by creating multiple databases. 1. Is this stan...
I am building an application that requires a lot of tables in one database and while the joining and aggregation of data is really nice and seamless, I am starting to wonder if I am creating too many tables in one database rather than organizing better by creating multiple databases. 1. Is this standard practice in enterprise level companies? 2. How do you usually join data from two different databases if its normal to do so? 3. Does this cause latency issues? Any help or guidance would help,
jaffer_syed (13 rep)
Sep 21, 2023, 11:22 PM • Last activity: Mar 25, 2025, 05:30 PM
4 votes
2 answers
439 views
If LATERAL is optional for table-valued functions, then why does this query error without it?
# Setup ```sql CREATE TABLE persons ( person_id int not null, name TEXT ); INSERT INTO persons VALUES (1, 'Adam'), (2, 'Paul'), (3, 'Tye'), (4, 'Sarah'); CREATE TABLE json_to_parse ( person_id int not null, block json ); INSERT INTO json_to_parse VALUES (1, '{"size": "small", "love": "x"}'), (2, '{"...
# Setup
CREATE TABLE persons
(
  person_id int not null,
  name TEXT 
);

INSERT INTO persons VALUES
(1, 'Adam'),
(2, 'Paul'),
(3, 'Tye'),
(4, 'Sarah');

CREATE TABLE json_to_parse
(
  person_id int not null,
  block json
);

INSERT INTO json_to_parse VALUES
  (1, '{"size": "small", "love": "x"}'),
  (2, '{"size": "medium", "love": "xx"}'),
  (3, '{"size": "big", "love": "xxx"}');
# The Error This runs without issue
SELECT
  *
FROM
  json_to_parse
CROSS JOIN LATERAL
  json_to_record(json_to_parse.block) AS my_json(size TEXT, love TEXT)
INNER JOIN
  persons
ON
  persons.person_id = json_to_parse.person_id;
but this does not
SELECT
  *
FROM
  json_to_parse,
  json_to_record(json_to_parse.block) AS my_json(size TEXT, love TEXT)
INNER JOIN
  persons
ON
  persons.person_id = json_to_parse.person_id;
and I get the error "invalid reference to FROM-clause entry for table "json_to_parse"" Why does this second query error? [The docs](https://www.postgresql.org/docs/current/queries-table-expressions.html#QUERIES-LATERAL) make it quite clear that LATERAL is optional for table-valued functions > Table functions appearing in FROM can also be preceded by the key word LATERAL, but for functions the key word is optional; the function's arguments can contain references to columns provided by preceding FROM items in any case. [dbfiddle](https://dbfiddle.uk/An86bWru)
J. Mini (1237 rep)
Dec 1, 2024, 03:30 PM • Last activity: Dec 2, 2024, 12:00 AM
0 votes
0 answers
223 views
PostgreSQL query optimization over multiple tables with moderately large data
We have a data structure in PostgreSQL 16 which is moderately complex, I have a simplified diagram showing the join columns here: [![enter image description here][1]][1] The 3 SURVEY tables on the top left are exceptions because they are not joined with foreign key, rather we have a SURVEY_ID column...
We have a data structure in PostgreSQL 16 which is moderately complex, I have a simplified diagram showing the join columns here: enter image description here The 3 SURVEY tables on the top left are exceptions because they are not joined with foreign key, rather we have a SURVEY_ID column which can point to the ID of any of this 3 SURVEY tables (this is due to historic reasons, another column in the SURVEY_INPUT table decides which of these 3 to use). SURVEY_INPUT table also has the same SURVEY_ID column, which is not present here. Here are the row counts: - SURVEY_INPUT: 160K - SURVEY_INPUT_RELATED_ID: 110K - SURVEY_INPUT_RELATED_DATE: 68K - SURVEY_INPUT_RELATED_CHARACTERISTICS: 355K - SURVEY_INPUT_RELATED_PARTY: 400K - SURVEY_RESPONSE: 29K - SURVEY_RESPONSE_DATA: 76K When these tables were designed there were no such requirements, but now we are looking at some specific reports which are ungodly slow. It must be filtered by the highlighted columns. At the moment we have maybe 1 week's data in these tables and the report can take 10 minutes to load, and we need to support 3 month's data... What I'm basically looking for as step 1 is to whether focus on the application side queries or the database performance... In our original implementation some entity relationships were eager in JPA, some were lazy, which resulted in lots and lots of simple and quick queries being launched, which ultimately resulted in the long execution. Now the application developer has did some magic, set everything to be eager and now JPA created a monster query to get all the data with all the filtering in a single step, as a huge cross join. This is now faster, but still it can take anywhere from 3 to 6 minutes, depending on selected time range. I tried to work on this large query as well, modified it a bit and added some indexes on the highlighted colors as well as all join columns, but had minimal impact. Here is a link to my current progress on the query, this version with these parameters took 00:02:59.990 to run and resulted in 209450 rows: https://explain.depesz.com/s/ROVz It says the sorting in step 6 is using disk space due to not enough work mem, but even if I remove the ORDER BY completely the query is the same speed, or even a few seconds slower.
Gábor Major (163 rep)
Jun 6, 2024, 02:05 PM
0 votes
1 answers
97 views
Is it possible to sort rows by values from generate_series()?
I multiply result rows of a `SELECT` with `CROSS JOIN generate_series(1, max)`. My query is like: ~~~pgsql select id, name from person CROSS JOIN generate_series(1, 4) where ; ~~~ I have this result: id|name -|- 1|name1 1|name1 1|name1 1|name1 2|name2 2|name2 2|name2 2|name2 3|name3 3|name3 3|name3...
I multiply result rows of a SELECT with CROSS JOIN generate_series(1, max). My query is like: ~~~pgsql select id, name from person CROSS JOIN generate_series(1, 4) where ; ~~~ I have this result: id|name -|- 1|name1 1|name1 1|name1 1|name1 2|name2 2|name2 2|name2 2|name2 3|name3 3|name3 3|name3 3|name3 I want the result like this: id|name -|- 1|name1 2|name2 3|name3 1|name1 2|name2 3|name3 1|name1 2|name2 3|name3 1|name1 2|name2 3|name3 Is that possible?
nanocellule (3 rep)
May 1, 2023, 09:13 PM • Last activity: May 2, 2023, 05:24 PM
0 votes
0 answers
74 views
Joining Two Tables via JSONB column (date)
Given the following data: ``` create table datas (id int, data jsonb); insert into categories (id, budget) values (1, '{"2022-07-01": {"budget_amount": value1, "some": "thing1"},"2022-08-01": {"budget_amount": value2, "some": "thing2"}}'), (2, '{"2022-08-01": {"budget_amount": value3, "some": "thing...
Given the following data:
create table datas (id int, data jsonb);

insert into categories (id, budget) 
values 
  (1, '{"2022-07-01": {"budget_amount": value1, "some": "thing1"},"2022-08-01": {"budget_amount": value2, "some": "thing2"}}'), 
  (2, '{"2022-08-01": {"budget_amount": value3, "some": "thing3"},"2022-08-01": {"budget_amount": value4, "some": "thing4"}}');

insert into transactions (id, transaction) 
values 
  (1, '{"vendor","equalto.categories.id2",amount, date}'), 
  (2, '{"vendor","equalto.categories.id1",amount, date}'),
  (3, '{"vendor","equalto.categories.id3",amount, date}'), 
  (4, '{"vendor","equalto.categories.id2",amount, date}');
|id|budget| |--|------| |1|{"2022-07-01": {"budget_amount": value1, "some": "thing1"},"2022-08-01": {"budget_amount": value2, "some": "thing2"}}| |2|{"2022-08-01": {"budget_amount": value3, "some": "thing3"},"2022-08-01": {"budget_amount": value4, "some": "thing4"}}| |id|vendor|category_id|amount|date| |-|-|-|-|-| |1|vendor|equalto.categories.id1|amount|date| |2|vendor|equalto.categories.id2|amount|date| |3|vendor|equalto.categories.id3|amount|date| |4|vendor|equalto.categories.id2|amount|date| For a particular category I'm looking to do something like the following:
SELECT categories_id, DATE_TRUNC('month',transaction.date) as tmonth, SUM(transactions.amount),
       jsonb_path_query(budget.data, '$.*.budget_amount') #>> '{}' as budget_amount
from categories, transactions
JOIN tmonth on jsonb_path_query(budget.data, '$.keyvalue()') ->> 'key'
|Category ID|Month|budget_amount|spend| |:--------------:|:------------:|:------------:|:------------:| |categories.id|DATE_TRUNC('month',transaction.date)|budget_amount|sum(transactions.amount)| Essentially I'm looking to join the top level key in datas (first of the month in YYYY-MM-DD format) to the column date (timestamp) in the transaction table. The goal being to get per row the budget.id, month as YYYY-MM-01, budget, sum of monthly transactions for that budget.id Issue being that I can't figure out how to join the two tables on a jsonb column. Could someone please recommend a way to combine this data? Any help would be appriciated.
RMcLellan (43 rep)
Feb 9, 2023, 06:38 PM
2 votes
1 answers
280 views
SQL Server JOIN vs sparse table
I am in the need to query a relation between an entity called article (it's a product sold in a shop) and it's characterstics (a set of attributes). The result will always presented in a paged manner (so only few articles record are read each time). The goal of the query is to fetch a page of articl...
I am in the need to query a relation between an entity called article (it's a product sold in a shop) and it's characterstics (a set of attributes). The result will always presented in a paged manner (so only few articles record are read each time). The goal of the query is to fetch a page of articles that have a set of characteristics (a few dozen per query). My first idea is to use a join among those two entities, I already have a table representing articles, a table representing characteristics, and a cross-join table that map the relations of the previous two. Problem is that articles may be many(up to 1 million), while characteristics may be quite numerous (a few hundred). That does not seem a big deal, but I am afraid because distribution is not uniform, for example, may happen that some characteristics are common to almost all articles (while other may be barely used). Another aspect is that both article and characteristics tables have few json fields that may be quite big (a few KB of data each). I am still in the design phase, but I am asking to myself if a join is the best approach in this scenario or if it's cheaper to build an auxiliary "index table" (with only a searchable field, so without json field) to use for the query. I mean programmatically build/rebuild an auxiliary table, do one query over that table to fetch the article IDs of items in the "current page" and then do another query over article tables (limited to that page for that pool of IDs). To achieve that I thought about sparse table, I mean a sparse table could have a few hundred columns, so maybe I can create/recreate (periodically) such a search table in which I have one row for each article, and one column for each characteristic (each characteristic column is just a bit, and I have application constraint so I am guaranteed that the characteristics are less than one thousand). Given a specific convention (eg: characteristic with ID 345 will become column 'C345' in the index table) I can create dynamically a query over such a sparse table. In that way, I have a guarantee that this "index table" will have only 1 million records in the worst-case scenario, while the join may have hundred of millions if I am particularly unlucky. That said, I have never used sparse tables like that before, and I am not sure they perform well, so I would ask if this is a good idea accordingly to your experience. Here is the schema of tables I already have (articles, characteristics, and the cross-join table) : /****** Object: Table [dbo].[tbl_ana_Articles] Script Date: 1/30/2023 6:37:55 PM ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[tbl_ana_Articles]( [ID] [int] IDENTITY(1,1) NOT NULL, [CODE1] [nvarchar](10) NOT NULL, --optional searchable codes [CODE2] [nvarchar](10) NOT NULL, --optional searchable codes [CODE3] [nvarchar](10) NOT NULL, --optional searchable codes [CODE4] [nvarchar](10) NOT NULL, --optional searchable codes [_JSON1] [nvarchar](max) NOT NULL, --json fields, quite big (few KB each) not searchable [_JSON2] [nvarchar](max) NOT NULL, --json fields, quite big (few KB each) not searchable [_JSON3] [nvarchar](max) NOT NULL, --json fields, quite big (few KB each) not searchable CONSTRAINT [PK_tbl_ana_articles] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY] GO /****** Object: Table [dbo].[tbl_ana_Characteristics] Script Date: 1/30/2023 6:37:55 PM ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[tbl_ana_Characteristics]( [ID] [int] IDENTITY(1,1) NOT NULL, [ID_FILTER] [int] NOT NULL, --characteristics are grouped by "kind" eg : the kind "size" may contain the set of characteristic {"big","medium","small"} this ID keep track of the one to many association (not relevant in this scenario) [CODE1] [nvarchar](10) NOT NULL, --optional searchable codes [CODE2] [nvarchar](10) NOT NULL, --optional searchable codes [_JSON1] [nvarchar](max) NOT NULL, --json fields, quite big (few KB each) not searchable [_JSON2] [nvarchar](max) NOT NULL, --json fields, quite big (few KB each) not searchable CONSTRAINT [PK_tbl_ana_characteristics] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY] GO /****** Object: Table [dbo].[tbl_src_ArticlesCharacteristics] Script Date: 1/30/2023 6:37:55 PM ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[tbl_src_ArticlesCharacteristics]( [ID_CHARACTERISTIC] [int] NOT NULL, [ID_ARTICLE] [int] NOT NULL, [ID_FILTER] [int] NOT NULL, CONSTRAINT [PK_tbl_src_ArticlesCharacteristics] PRIMARY KEY CLUSTERED ( [ID_CHARACTERISTIC] ASC, [ID_ARTICLE] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] GO Here how i build the sparse table for the lookup (i take up to 10K column just for testing pourpose) /****** Object: Table [dbo].[ArticleCarat] Script Date: 1/30/2023 6:53:11 PM ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[ArticleCarat]( [ArtID] [int] NOT NULL, [CARAT_1] [bit] SPARSE NULL, [CARAT_2] [bit] SPARSE NULL, [CARAT_3] [bit] SPARSE NULL, [CARAT_4] [bit] SPARSE NULL, ... .. . [CARAT_10132] [bit] SPARSE NULL, [CARAT_10133] [bit] SPARSE NULL, [CARAT_10134] [bit] SPARSE NULL, [SpecialPurposeColumns] [xml] COLUMN_SET FOR ALL_SPARSE_COLUMNS NULL, PRIMARY KEY CLUSTERED ( [ArtID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY] GO **---Adding more context :** As requested I have provided more context here. The article table will hold data about items that can be purchased in a shop. So entries may be shoes, clothes, or another thing. Each thing has many codes (the meaning is configurable in the installation phase) used during search and has attached few not searchable metadata (the json fields, those data include descriptions in many languages and pricing information). The last thing about codes is that they should be unique, but I really don't know (they are provided by the customer and sometimes data is messy, showing two articles with the same code is not an issue). Json fields are not important here, I mentioned because seems to me that they will affect search performance even when not in the select statement. For that reason, I rarely do a search over the article table but use a special table for searches over codes. That said there are Characteristics, that are searchable attributes of the article. For example, shoes may have the following characteristics "blue - 42EU - running". Characteristics are more like tags here in StackOverflow. Here is a dataset for example (keep in mind data and DB schema reflect my real case, but are fictional, due to company policy I can't disclose real data) : https://dbfiddle.uk/aBnv_GK2 Here the code to generate the example, i have integrated information also about a filter table, filters are just a set of characteristics related to the same topic. SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[tbl_ana_Articles]( [ID] [int] IDENTITY(1,1) NOT NULL, [CODE1] [nvarchar](10) NOT NULL, [CODE2] [nvarchar](10) NOT NULL, [CODE3] [nvarchar](10) NOT NULL, [CODE4] [nvarchar](10) NOT NULL, [_JSON1] [nvarchar](max) NOT NULL, [_JSON2] [nvarchar](max) NOT NULL, [_JSON3] [nvarchar](max) NOT NULL, CONSTRAINT [PK_tbl_ana_articles] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[tbl_ana_Characteristics]( [ID] [int] IDENTITY(1,1) NOT NULL, [ID_FILTER] [int] NOT NULL, [CODE1] [nvarchar](10) NOT NULL, [CODE2] [nvarchar](10) NOT NULL, [_JSON1] [nvarchar](max) NOT NULL, [_JSON2] [nvarchar](max) NOT NULL, CONSTRAINT [PK_tbl_ana_characteristics] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[tbl_ana_Filter]( [ID] [int] IDENTITY(1,1) NOT NULL, [CODE] [nvarchar](20) NOT NULL, CONSTRAINT [PK_tbl_ana_filter] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[tbl_src_ArticlesCharacteristics]( [ID_CHARACTERISTIC] [int] NOT NULL, [ID_ARTICLE] [int] NOT NULL, [ID_FILTER] [int] NOT NULL, CONSTRAINT [PK_tbl_src_ArticlesCharacteristics] PRIMARY KEY CLUSTERED ( [ID_CHARACTERISTIC] ASC, [ID_ARTICLE] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] GO SET IDENTITY_INSERT [dbo].[tbl_ana_Articles] ON GO INSERT [dbo].[tbl_ana_Articles] ([ID], [CODE1], [CODE2], [CODE3], [CODE4], [_JSON1], [_JSON2], [_JSON3]) VALUES (2, N'C0001', N'EAN13_1', N'BATCH_1', N'HU_1', N'{ "description":"shoes for running"}', N'{"price" : "40€"}', N'{"other" : "other"}') GO INSERT [dbo].[tbl_ana_Articles] ([ID], [CODE1], [CODE2], [CODE3], [CODE4], [_JSON1], [_JSON2], [_JSON3]) VALUES (3, N'C0002', N'EAN13_2', N'BATCH_2', N'HU_2', N'{ "description":"t-shirt"}', N'{"price" : "10€"}', N'{"other" : "other"}') GO INSERT [dbo].[tbl_ana_Articles] ([ID], [CODE1], [CODE2], [CODE3], [CODE4], [_JSON1], [_JSON2], [_JSON3]) VALUES (4, N'C0003', N'EAN13_3', N'BATCH_3', N'HU_3', N'{ "description":"sun glasses"}', N'{"price" : "15€"}', N'{"other" : "other"}') GO SET IDENTITY_INSERT [dbo].[tbl_ana_Articles] OFF GO SET IDENTITY_INSERT [dbo].[tbl_ana_Characteristics] ON GO INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (1, 1, N'Size 38 EU', N'-nd-', N'-nd-', N'-nd-') GO INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (2, 1, N'Size 39 EU', N'-nd-', N'-nd-', N'-nd-') GO INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (3, 1, N'Size 40 EU', N'-nd-', N'-nd-', N'-nd-') GO INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (4, 1, N'Size 41 EU', N'-nd-', N'-nd-', N'-nd-') GO INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (5, 1, N'Size 42 EU', N'-nd-', N'-nd-', N'-nd-') GO INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (6, 1, N'Size 43 EU', N'-nd-', N'-nd-', N'-nd-') GO INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (7, 1, N'Size 44 EU', N'-nd-', N'-nd-', N'-nd-') GO INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (9, 2, N'Classic', N'-nd-', N'-nd-', N'-nd-') GO INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (10, 2, N'Running', N'-nd-', N'-nd-', N'-nd-') GO INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (11, 2, N'Trekking', N'-nd-', N'-nd-', N'-nd-') GO INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (12, 3, N'Red', N'-nd-', N'-nd-', N'-nd-') GO INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (13, 3, N'Blue', N'-nd-', N'-nd-', N'-nd-') GO INSERT [dbo].[tbl_ana_Characteristics] ([ID], [ID_FILTER], [CODE1], [CODE2], [_JSON1], [_JSON2]) VALUES (14, 3, N'Green', N'-nd-', N'-nd-', N'-nd-') GO SET IDENTITY_INSERT [dbo].[tbl_ana_Characteristics] OFF GO SET IDENTITY_INSERT [dbo].[tbl_ana_Filter] ON GO INSERT [dbo].[tbl_ana_Filter] ([ID], [CODE]) VALUES (1, N'Size') GO INSERT [dbo].[tbl_ana_Filter] ([ID], [CODE]) VALUES (2, N'Pourpose') GO INSERT [dbo].[tbl_ana_Filter] ([ID], [CODE]) VALUES (3, N'Color') GO SET IDENTITY_INSERT [dbo].[tbl_ana_Filter] OFF GO INSERT [dbo].[tbl_src_ArticlesCharacteristics] ([ID_CHARACTERISTIC], [ID_ARTICLE], [ID_FILTER]) VALUES (3, 2, 1) GO INSERT [dbo].[tbl_src_ArticlesCharacteristics] ([ID_CHARACTERISTIC], [ID_ARTICLE], [ID_FILTER]) VALUES (9, 3, 2) GO INSERT [dbo].[tbl_src_ArticlesCharacteristics] ([ID_CHARACTERISTIC], [ID_ARTICLE], [ID_FILTER]) VALUES (10, 2, 2) GO INSERT [dbo].[tbl_src_ArticlesCharacteristics] ([ID_CHARACTERISTIC], [ID_ARTICLE], [ID_FILTER]) VALUES (12, 4, 3) GO INSERT [dbo].[tbl_src_ArticlesCharacteristics] ([ID_CHARACTERISTIC], [ID_ARTICLE], [ID_FILTER]) VALUES (13, 2, 3) GO INSERT [dbo].[tbl_src_ArticlesCharacteristics] ([ID_CHARACTERISTIC], [ID_ARTICLE], [ID_FILTER]) VALUES (13, 3, 3) GO
Skary (368 rep)
Jan 30, 2023, 05:52 PM • Last activity: Feb 1, 2023, 08:58 AM
0 votes
0 answers
92 views
How to show all the columns data in full outer join
I have 2 tables. [![Table 1][1]][1] [![Table 2][2]][2] The Result is [![Output][3]][3] [1]: https://i.sstatic.net/ddRpC.png [2]: https://i.sstatic.net/XcIBY.png [3]: https://i.sstatic.net/0kc04.png Issue is EnteredBy=23 is not displaying because I am using CT1.EnteredBy. How to display CT11.EnteredB...
I have 2 tables. Table 1 Table 2 The Result is Output Issue is EnteredBy=23 is not displaying because I am using CT1.EnteredBy. How to display CT11.EnteredBy value instead of NULL The Actual Query I am using is given below With CT1 As( Select C1.EnteredBy,CodingTimeInMin =Sum(ROUND(CAST(C1.CODINGTIME AS FLOAT)/(CAST(60 AS FLOAT)),4)) from [dbo].[CodedDatas] C1 INNER JOIN (Select Document_Id,Max(LastModified) As LastModified from [DBO].[CODEDDATAS] Group By Document_Id) C2 on C1.Document_ID=C2.Document_ID And C1.Coded=1 And C1.Revision=0 --And C1.LastModified=C2.LastModified group by C1.EnteredBy ), CT11 As( Select C1.EnteredBy,ReviewedTimeInMin= Sum(ROUND(CAST(C1.QATime AS FLOAT)/(CAST(60 AS FLOAT)),4)) from [dbo].[CodedDatas] C1 INNER JOIN (Select Document_Id,Max(LastModified) As LastModified from [DBO].[CODEDDATAS] Group By Document_Id) C2 on C1.Document_ID=C2.Document_ID And C1.Coded=1 And C1.Revision=1 And C1.LastModified=C2.LastModified group by C1.EnteredBy ), CT2 As( Select CodingTimeInHR= ROUND(CT1.CodingTimeInMin/CAST(60 AS FLOAT), 4) , ReviewedTimeInHR= ROUND(CT11.ReviewedTimeInMin/CAST(60 AS FLOAT), 4) , EnteredBy=CT1.EnteredBy from CT1 full outer join CT11 on CT1.EnteredBy=CT11.EnteredBy ) select * from CT2;
Diya Rawat (23 rep)
Dec 9, 2022, 04:27 AM • Last activity: Dec 9, 2022, 10:17 AM
0 votes
1 answers
45 views
Using the correct JOIN
I am trying to create an app for the salespeople who work for my company to keep track of their sales/commission When a salesperson brings on an account, their name is noted on the account and the table might look like Table B | Account | Salesperson | |:---------:|:-----------:| | Account A | A | |...
I am trying to create an app for the salespeople who work for my company to keep track of their sales/commission When a salesperson brings on an account, their name is noted on the account and the table might look like Table B | Account | Salesperson | |:---------:|:-----------:| | Account A | A | | Account B | B | | Account C | B | | Account D | A | Then the jobs list might look like Table A | Account | Job details | Price | |:---------:|:-----------:|:-----:| | Account A | xyz | £x | | Account B | xyz | £x | | Account C | xyz | £x | | Account D | xyz | £x | | Account A | xyz | £x | | Account C | xyz | £x | I am looking for a way to filter the results of table A based on the sales person noted next to the Account in Table B I have looked at all of the JOIN options but am unsure I am using the correct one. I have opted, currently, for SELECT J.*, A.Company, A.SalesPerson FROM TableA A INNER JOIN TableB B ON A.Account= B.Account Where A.Salesperson = "B" This seems to work for the time being, however I am unsure wether this is the best Join to use and whether it will last the test of time
PaulMcF87 (177 rep)
Nov 29, 2022, 03:50 PM • Last activity: Nov 29, 2022, 11:45 PM
0 votes
1 answers
313 views
Doesn't T-SQL support correlated CROSS JOINs?
This query works SELECT QMC.HAUSKEY, t1.STRNAME, t2.HAUS_NR FROM SWOPS.MIGR.QMCAddresses QMC CROSS JOIN (SELECT STRNAME FROM SWOPS.MIGR.EB_DICT_STREET_QMC WHERE FK=8055909) t1 CROSS JOIN (SELECT HAUS_NR FROM SWOPS.MIGR.EB_DICT_HAUSNR_QMC WHERE FK=8055909) t2 WHERE 1=1 AND QMC.HAUSKEY=8055909 ORDER B...
This query works SELECT QMC.HAUSKEY, t1.STRNAME, t2.HAUS_NR FROM SWOPS.MIGR.QMCAddresses QMC CROSS JOIN (SELECT STRNAME FROM SWOPS.MIGR.EB_DICT_STREET_QMC WHERE FK=8055909) t1 CROSS JOIN (SELECT HAUS_NR FROM SWOPS.MIGR.EB_DICT_HAUSNR_QMC WHERE FK=8055909) t2 WHERE 1=1 AND QMC.HAUSKEY=8055909 ORDER BY HAUSKEY, t1.STRNAME, t2.HAUS_NR enter image description here However, when I replace the constant 8055909 in the CROSS JOIN subqueries with the column reference QMC.HAUSKEY the query reports an error `Msg 4104, Level 16, State 1, Line 56 The multi-part identifier "QMC.HAUSKEY" could not be bound.` Why is that? BACKGROUND: QMCAddresses contains the buildings and their addresses EB_DICT_STREET_QMC contains the possible variations of the street-names of the addresses in QMCAddresses EB_DICT_STREET_HAUSNR contains the possible variations of the house-numbers of the addresses in QMCAddresses What I am trying to do is to create a cartesian product for each building address consisting of all possible variations of its street-name and all possible variations of its house-number. e.g. enter image description here
Mediterrano (1 rep)
Sep 7, 2022, 02:11 PM • Last activity: Sep 7, 2022, 09:49 PM
0 votes
0 answers
54 views
Pivot table but with cross product
I' currently in the following situation: I have a `measurements` table that looks something like this: ``` +----+-----------+-------+---------+ | id | germplasm | trait | m_value | +----+-----------+-------+---------+ | 1 | g1 | t1 | 1 | | 2 | g1 | t1 | 2 | | 3 | g1 | t2 | 3 | +----+-----------+----...
I' currently in the following situation: I have a measurements table that looks something like this:
+----+-----------+-------+---------+
| id | germplasm | trait | m_value |
+----+-----------+-------+---------+
|  1 | g1        | t1    |       1 |
|  2 | g1        | t1    |       2 |
|  3 | g1        | t2    |       3 |
+----+-----------+-------+---------+
And I need to get something of this structure to be able to easily generate a [SPLOM plot](https://plotly.com/javascript/splom/) (multi-dimensional scatter plot):
+-----------+----+----+
| germplasm | t1 | t2 |
+-----------+----+----+
Now I know I can use a pivot query like this:
select id, germplasm, MAX(case when trait = 't1' then value end) as 't1', MAX(case when trait = 't2' then value end) as 't2' from measurements group by germplasm
Which gives me:
+-----------+----+----+
| germplasm | t1 | t2 |
+-----------+----+----+
| g1        |  2 |  3 |
+-----------+----+----+
However, the row with id 1 gets lost in this because of the MAX. What I'd like to get is this result:
+-----------+----+----+
| germplasm | t1 | t2 |
+-----------+----+----+
| g1        |  1 |  3 |
| g1        |  2 |  3 |
+-----------+----+----+
So basically a cross-product between the trait values for the germplasm (but still with the trait names as columns). Note that not all traits will have numeric values, some are categorical. If I add the id to the group by, I receive everything in it's own row which also isn't what I want. I tried searching for "mysql pivot cross product" or "mysql pivot duplicate values", but couldn't find anything. Does anyone have an idea how to achieve this? Obviously my actual scenario is more complex with additional columns in the measurements table that are used in the group by like treatment, replicate etc, but I should be able to deal with those at a later date.
Baz (165 rep)
Jul 22, 2022, 08:54 AM
0 votes
0 answers
140 views
Is there a cross join in these CTEs?
I have a query that is constructed from several CTEs (in order to organize the query). I constructed it in my SQL workbench and was happy with the test results on a smaller dataset, so I converted it to SQLalchemy code for executing it on the production database. Now, SQLalchemy warns me of a cross...
I have a query that is constructed from several CTEs (in order to organize the query). I constructed it in my SQL workbench and was happy with the test results on a smaller dataset, so I converted it to SQLalchemy code for executing it on the production database. Now, SQLalchemy warns me of a cross join: SELECT statement has a cartesian product between FROM element(s) "vm_per_bus" and FROM element "voltages". Apply join condition(s) between each element to resolve. In my query (see below), this happens in the last CTE. My question is now twofold: 1. Is there really an accidential cartesian join? 2. How do I join the two CTEs? This is the query:
WITH voltages AS
  (SELECT muscle_actions.id AS id,
          regexp_replace(CAST(experiment_runs.document_json['schedule_config']['phase_0']['environments']['environment']['params']['reward']['name'] AS TEXT), '^[^:]+:([^"]+)"', '\1') AS reward_function,
          agents.uid AS agent,
          jsonb_array_elements(jsonb_path_query_array(CAST(muscle_actions.sensor_readings AS JSONB), '$.*.value')) AS vm
   FROM muscle_actions
   JOIN agents ON agents.id = muscle_actions.agent_id
   JOIN experiment_run_phases ON experiment_run_phases.id = agents.experiment_run_phase_id
   JOIN experiment_run_instances ON experiment_run_instances.id = experiment_run_phases.experiment_run_instance_id
   JOIN experiment_runs ON experiment_runs.id = experiment_run_instances.experiment_run_id
   WHERE experiment_runs.id IN (8,
                                11,
                                19,
                                25,
                                29,
                                33,
                                37,
                                41,
                                45,
                                49,
                                53,
                                57,
                                61,
                                65,
                                69,
                                73,
                                77,
                                81,
                                85,
                                89,
                                93,
                                97,
                                120,
                                122,
                                137,
                                127,
                                132,
                                142,
                                147,
                                152,
                                157,
                                162,
                                199,
                                204,
                                209,
                                214,
                                219,
                                224,
                                229)
     AND experiment_run_phases.uid = 'phase_1'),
     vm_per_bus AS
  (SELECT voltages.id AS id,
          voltages.reward_function AS reward_function,
          voltages.agent AS agent,
          voltages.vm AS vm,
          abs(1.0 - CAST(voltages.vm AS REAL)) AS vm_deviation,
          abs(1.0 - CAST(voltages.vm AS REAL)) >= 0.15 AS vm_violation
   FROM voltages),
     vm AS
  (SELECT vm_per_bus.id AS id,
          max(voltages.reward_function) AS reward_function,
          max(voltages.agent) AS agent,
          array_agg(vm_per_bus.vm) AS array_agg_1,
          max(vm_per_bus.vm_deviation) AS max_vm_deviation,
          sum(CAST(vm_per_bus.vm_violation AS INTEGER)) AS num_vm_violations
   FROM vm_per_bus,
        voltages
   GROUP BY vm_per_bus.id)
SELECT vm.reward_function,
       vm.agent,
       max(vm.max_vm_deviation) AS max_abs_violation,
       sum(vm.num_vm_violations) AS num_violations
FROM vm
GROUP BY vm.reward_function,
         vm.agent
Technaton (171 rep)
May 27, 2022, 08:06 PM
3 votes
1 answers
2657 views
How do I create a SQL loop that generates future dates based on different frequencies and intervals for each item until a specified end date?
I have a data set that appears like this: next_generation_date | procedure | interval | frequency :-- | :-- | :-- | :-- 2021-01-17 00:00:00.000 | Clean Restroom | 1 | day 2021-01-17 00:00:00.000 | Vacuum | 2 | week 2021-02-01 00:00:00.000 | Inspect Fire Extinguisher | 3 | month 2021-10-01 00:00:00.0...
I have a data set that appears like this: next_generation_date | procedure | interval | frequency :-- | :-- | :-- | :-- 2021-01-17 00:00:00.000 | Clean Restroom | 1 | day 2021-01-17 00:00:00.000 | Vacuum | 2 | week 2021-02-01 00:00:00.000 | Inspect Fire Extinguisher | 3 | month 2021-10-01 00:00:00.000 | Test Generator | 4 | year My goal is to generate multiple date rows for each **procedure** by going off of the **next_generation_date**, **interval** and **frequency** columns until a specified end date. This specified end date would be the same date throughout the entire table. In this example, let's make the specified end date 2025-12-31. My end goal is for the table to appear similar to the below. Please note since this is an example, I didn't include every row as to avoid having hundreds of rows listed below. next_generation_date | procedure | interval | frequency :-- | :-- | :-- | :-- 2021-01-17 00:00:00.000 | Clean Restroom | 1 | day 2021-01-18 00:00:00.000 | Clean Restroom | 1 | day 2021-01-19 00:00:00.000 | Clean Restroom | 1 | day 2021-01-20 00:00:00.000 | Clean Restroom | 1 | day 2021-01-17 00:00:00.000 | Vacuum | 2 | week 2021-01-31 00:00:00.000 | Vacuum | 2 | week 2021-02-14 00:00:00.000 | Vacuum | 2 | week 2021-02-28 00:00:00.000 | Vacuum | 2 | week 2021-02-01 00:00:00.000 | Inspect Fire Extinguisher | 3 | month 2021-05-01 00:00:00.000 | Inspect Fire Extinguisher | 3 | month 2021-08-01 00:00:00.000 | Inspect Fire Extinguisher | 3 | month 2021-11-01 00:00:00.000 | Inspect Fire Extinguisher | 3 | month 2021-10-01 00:00:00.000 | Test Generator | 4 | year 2025-10-01 00:00:00.000 | Test Generator | 4 | year To summarize the above table, future "Clean Restroom" dates are each day, "Vacuum" dates are every two weeks, "Inspect Fire Extinguisher" dates are every three months, and "Test Generator" dates are every four years. Below are two of my most recent attempts in Microsoft SQL Server Management Studio. The first attempt creates the loop but my procedures only increase by an interval of 1, not the unique interval found within the **interval** column. My second attempt generates the next **next_generation_date** with the correct interval but doesn't loop.
CREATE TABLE ##many_integers (idx INT);
WITH units(units) AS (
SELECT 0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION 
SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
)
,tens             AS(SELECT units *        10 AS tens             FROM units       )
,hundreds         AS(SELECT tens  *        10 AS hundreds         FROM tens        )
,thousands        AS(SELECT hundreds *     10 AS thousands        FROM hundreds    )
,tenthousands     AS(SELECT thousands *    10 AS tenthousands     FROM thousands   )
,hundredthousands AS(SELECT tenthousands * 10 AS hundredthousands FROM tenthousands)                                                                                                                                        
INSERT                                                                                                                                                                                                                      
INTO ##many_integers
SELECT  hundredthousands +tenthousands +thousands +hundreds +tens +units 
FROM       units 
CROSS JOIN tens 
CROSS JOIN hundreds 
CROSS JOIN thousands 
CROSS JOIN tenthousands 
CROSS JOIN hundredthousands ;

SELECT [procedure], [frequency], [interval], 
  CASE [frequency]
    WHEN 'day'   THEN  DATEADD( day   , idx , [next_generation_date])
    WHEN 'week'  THEN  DATEADD( week  , idx , [next_generation_date])
    WHEN 'month' THEN  DATEADD( month , idx , [next_generation_date])
    WHEN 'year'  THEN  DATEADD( year  , idx , [next_generation_date])
    ELSE NULL
  END AS [next_generation_date]
FROM [data].[fact] CROSS JOIN ##many_integers
WHERE idx < 5000
AND
  CASE [frequency]
    WHEN 'day'   THEN  DATEADD( day   , idx , [next_generation_date])
    WHEN 'week'  THEN  DATEADD( week  , idx , [next_generation_date])
    WHEN 'month' THEN  DATEADD( month , idx , [next_generation_date])
    WHEN 'year'  THEN  DATEADD( year  , idx , [next_generation_date])
    ELSE NULL
  END <=  '2023-12-31 00:00:00'
ORDER BY 1
;
SELECT [procedure], [frequency], [interval], [next_generation_date] FROM [data].[fact]
UNION ALL
SELECT [procedure], [frequency], [interval],
CASE
       WHEN [frequency] = 'day' THEN DATEADD(day, [interval], [next_generation_date])
       WHEN [frequency] = 'week' THEN DATEADD(week, [interval], [next_generation_date])
       WHEN [frequency] = 'month' THEN DATEADD(month, [interval], [next_generation_date])
       WHEN [frequency] = 'year' THEN DATEADD(year, [interval], [next_generation_date])
       ELSE NULL
END AS [next_generation_date]
FROM [data].[fact]
ORDER BY 1;
Any and all suggestions are greatly appreciated as I'm new to SQL. Thank you.
Gulfhawk (31 rep)
May 24, 2022, 07:45 PM • Last activity: May 26, 2022, 10:13 AM
5 votes
2 answers
9650 views
Is there such a thing as a LEFT CROSS JOIN?
Is there a way to do what I want to call a "LEFT CROSS JOIN"? I.e., is there a way to specify a CROSS JOIN ON, but return at least one row in the result for every row in the left, using nulls for the right-side columns when the right-side table does not contain a match?
Is there a way to do what I want to call a "LEFT CROSS JOIN"? I.e., is there a way to specify a CROSS JOIN ON, but return at least one row in the result for every row in the left, using nulls for the right-side columns when the right-side table does not contain a match?
Joshua Goldberg (227 rep)
Dec 16, 2021, 04:45 PM • Last activity: Dec 22, 2021, 08:44 PM
1 votes
1 answers
101 views
Pairing Query Store Read/Write Status with Enabled Status
I am able to find if Query Store is enabled: SELECT name, is_query_store_on from sys.databases I am able to query a specific database to determine the Query Store Read/Write Status: SELECT actual_state_desc from sys.database_query_store_options How do I join the two results to have the information i...
I am able to find if Query Store is enabled: SELECT name, is_query_store_on from sys.databases I am able to query a specific database to determine the Query Store Read/Write Status: SELECT actual_state_desc from sys.database_query_store_options How do I join the two results to have the information in one table? They do not have a common field, and I assume I could write a CROSS JOIN to match the two tables up, but am unsure how without that common field.
CrushingIT (33 rep)
Dec 16, 2021, 07:00 PM • Last activity: Dec 16, 2021, 08:02 PM
0 votes
1 answers
281 views
Add a list of names to a generated series
I generated a date series this way: ``` SELECT generate_series(date '2020-01-01', date '2021-01-21', '1 day'); ``` Additionally I have a list with name (Max Meyer, Anna Smith, Peter Gardner). How could I add the name list to the date series, such that the name appears for every date in the series?
I generated a date series this way:
SELECT generate_series(date '2020-01-01', date '2021-01-21', '1 day');
Additionally I have a list with name (Max Meyer, Anna Smith, Peter Gardner). How could I add the name list to the date series, such that the name appears for every date in the series?
wladi (1 rep)
Jul 2, 2021, 03:49 PM • Last activity: Jul 23, 2021, 08:46 AM
Showing page 1 of 15 total questions