Database Administrators
Q&A for database professionals who wish to improve their database skills
Latest Questions
0
votes
1
answers
203
views
Update row data with dynamic lag
I have a table of items and restaurants. We assume that if the `restaurant` entry is missing, it has to be completed with the last non-null `restaurant` value in the previous rows (as defined by ascending `id` values). CREATE TABLE food_items ( id SERIAL PRIMARY KEY, item VARCHAR(255) NOT NULL, rest...
I have a table of items and restaurants. We assume that if the
restaurant
entry is missing, it has to be completed with the last non-null restaurant
value in the previous rows (as defined by ascending id
values).
CREATE TABLE food_items (
id SERIAL PRIMARY KEY,
item VARCHAR(255) NOT NULL,
restaurant VARCHAR(255)
);
INSERT INTO food_items(item, restaurant)
VALUES ('Salad','McDonalds'), ('Burger',''),('Fries',''),('Salad','Taco Bell'),('Quesadilla','');
The tricky part is that the number of rows for each restaurant is variable, so it cannot be captured with a fixed lag.
eddie
(101 rep)
Jan 2, 2024, 08:17 AM
• Last activity: Jun 23, 2025, 09:03 AM
0
votes
1
answers
192
views
Determine Available Blocks of Contiguous Ids
We have a SQL Server table that has an int autoincrement primary key column on it. The table's primary key space is fragmented. For example, ids 1 through 10 are in use but then ids 11 through 100,000 are not in use. Ids 1,000,000 through 1,100,000 are in use but ids 1,100,000 through 50,000,000 are...
We have a SQL Server table that has an int autoincrement primary key column on it.
The table's primary key space is fragmented. For example, ids 1 through 10 are in use but then ids 11 through 100,000 are not in use. Ids 1,000,000 through 1,100,000 are in use but ids 1,100,000 through 50,000,000 are not in use.
I am trying to figure out any and all such available range of ids. Once the ranges are determined, then we may reseed the primary key column to start at the beginning of the widest range.
Any tool or utility or SQL script out there for determining such available ranges?
Manny Siddiqui
(11 rep)
Sep 23, 2021, 06:13 PM
• Last activity: Jun 22, 2025, 08:09 PM
0
votes
1
answers
209
views
Gaps and Islands Across Fields With Start and End Timestamps
I am working with GPS data which I want to group by locations (to six decimal places of the coordinate value) and generate `from` and `until` times for. For simplicity in this example I'm making the coordinates pre-formatted string values. I am using Postgresql. ``` CREATE TABLE locations ( location...
I am working with GPS data which I want to group by locations (to six decimal places of the coordinate value) and generate
from
and until
times for.
For simplicity in this example I'm making the coordinates pre-formatted string values.
I am using Postgresql.
CREATE TABLE locations
(
location_id serial PRIMARY KEY,
latitude VARCHAR (10),
longitude VARCHAR (10),
generated_at TIMESTAMP
);
INSERT INTO locations (latitude, longitude, generated_at)
VALUES
('51.123456', '-1.123456', '2020-04-27 17:00:00.000'),
('51.123456', '-1.123456', '2020-04-27 17:05:00.000'),
('51.654321', '-1.654321', '2020-04-27 17:10:00.000'),
('51.654321', '-1.654321', '2020-04-27 17:15:00.000'),
('51.654321', '-1.654321', '2020-04-27 17:20:00.000'),
('51.444555', '-1.444555', '2020-04-27 17:25:00.000'),
('51.123456', '-1.123456', '2020-04-27 17:30:00.000'),
('51.123456', '-1.123456', '2020-04-27 17:35:00.000'),
('51.123456', '-1.123456', '2020-04-27 17:40:00.000'),
('51.444555', '-1.444555', '2020-04-27 17:45:00.000'),
('51.444555', '-1.444555', '2020-04-27 17:50:00.000'),
('51.654321', '-1.654321', '2020-04-27 17:55:00.000');
What I would like to get back from the query would be:
'51.123456' | '-1.123456' | '2020-04-27 17:00:00.000' | '2020-04-27 17:10:00.000'
'51.654321' | '-1.654321' | '2020-04-27 17:10:00.000' | '2020-04-27 17:25:00.000'
'51.444555' | '-1.444555' | '2020-04-27 17:25:00.000' | '2020-04-27 17:30:00.000'
'51.123456' | '-1.123456' | '2020-04-27 17:30:00.000' | '2020-04-27 17:45:00.000'
'51.444555' | '-1.444555' | '2020-04-27 17:45:00.000' | '2020-04-27 17:55:00.000'
'51.654321' | '-1.654321' | '2020-04-27 17:55:00.000' |
The idea is to group by 'locations' and a new location is defined as being when the coordinates change from the previously reported location. If the coordinates are the same as an earlier location then that is still a new location.
The third column is the timestamp when the updated location was first reported and the fourth is the timestamp when the next location was first reported.
I can get part-way there with this:
SELECT
latitude,
longitude,
generated_at
FROM (
SELECT l.*,
LAG (latitude) OVER w AS lagLatitude,
LAG (longitude) OVER w AS lagLongitude
from locations l
WINDOW w as (ORDER BY generated_at)
) x WHERE
latitude is distinct from lagLatitude
or longitude is distinct from lagLongitude;
That will give me the first three columns I need but not the last one where the timestamp is from the next record.
Any help at this point would be wonderful because I am now just going round and round in circles and think it can be done but have no idea how.
sgaw
(101 rep)
Apr 27, 2020, 04:51 PM
• Last activity: Jun 12, 2025, 07:06 AM
0
votes
1
answers
381
views
Possible to GROUP BY across different columns in mariadb/mysql
I have a table full of events similar to this: ``` `date` | start_time | end_time | length -------------------------------------------- 2022-01-10 | 07:00:00 | 16:00:00 | 9 2022-01-11 | 07:00:00 | 16:00:00 | 9 2022-01-11 | 16:00:00 | 20:00:00 | 5 2022-01-12 | 07:00:00 | 12:00:00 | 5 2022-01-12 | 16:...
I have a table full of events similar to this:
date
| start_time | end_time | length
--------------------------------------------
2022-01-10 | 07:00:00 | 16:00:00 | 9
2022-01-11 | 07:00:00 | 16:00:00 | 9
2022-01-11 | 16:00:00 | 20:00:00 | 5
2022-01-12 | 07:00:00 | 12:00:00 | 5
2022-01-12 | 16:00:00 | 20:00:00 | 5
I want to group by contiguous events on each date. See how on 2022-01-11 there are two entries, one ending at the same time the next one starts? (but not on 12th)
I'd like to get something like this:
select date
, SUM(length) from events
group by date
, ????
date
| length
--------------------
2022-01-10 | 9
2022-01-11 | 14
2022-01-12 | 5
2022-01-12 | 5
Is this even possible?
MrWainscotting
(1 rep)
Jan 7, 2022, 12:36 AM
• Last activity: Jan 12, 2025, 07:07 AM
2
votes
2
answers
487
views
How to select blocks with sequential data and aggregate the ids
I have the following table: id(int) startDate(timestamp) endDate(timestamp) plan_id(int) planned(bool) machine(int) -------------------------------------------------------------------------------------------------------------- 2005 '2019-01-16 08:29:24.872736' '2019-01-16 08:30:23.529706' 34 true 6...
I have the following table:
id(int) startDate(timestamp) endDate(timestamp) plan_id(int) planned(bool) machine(int)
--------------------------------------------------------------------------------------------------------------
2005 '2019-01-16 08:29:24.872736' '2019-01-16 08:30:23.529706' 34 true 6
2004 '2019-01-16 08:19:28.011148' '2019-01-16 08:29:22.680828' 34 true 6
2003 '2019-01-16 08:18:27.074312' '2019-01-16 08:19:25.753475' 34 true 6
2002 '2019-01-16 08:08:30.206288' '2019-01-16 08:18:24.856308' 34 true 6
2001 '2019-01-16 08:07:29.163124' '2019-01-16 08:08:27.949013' 34 true 6
2000 '2019-01-16 07:59:03.221309' '2019-01-16 08:00:14.654391' null false 7
1999 '2019-01-16 08:00:00.986367' '2019-01-16 08:00:03.221309' null false 6
1998 '2019-01-16 07:57:30.711044' '2019-01-16 07:59:58.778444' null false 6
1997 '2019-01-16 07:56:32.466508' '2019-01-16 07:57:28.489287' null false 6
1996 '2019-01-16 07:50:06.887349' '2019-01-16 07:56:30.237725' null false 6
1995 '2019-01-16 07:46:34.327582' '2019-01-16 07:50:04.619592' 33 true 6
1994 '2019-01-16 07:45:33.813483' '2019-01-16 07:46:32.014849' 33 true 6
1993 '2019-01-16 07:24:39.267365' '2019-01-16 07:39:23.786911' null false 6
1992 '2019-01-16 07:23:39.646218' '2019-01-16 07:24:37.093414' null false 6
1991 '2019-01-16 07:13:41.166337' '2019-01-16 07:23:37.403375' null false 6
1990 '2019-01-16 07:12:39.961234' '2019-01-16 07:13:38.907838' null false 6
1989 '2019-01-16 07:10:46.984236' '2019-01-16 07:12:37.647108' null false 6
1988 '2019-01-15 17:05:59.832834' '2019-01-15 17:08:21.603931' 31 true 6
1987 '2019-01-15 17:04:59.567046' '2019-01-15 17:05:57.565188' 31 true 6
1986 '2019-01-15 17:00:01.411266' '2019-01-15 17:10:57.255158' 31 true 7
I have to select the IDs of the blocks of unplanned records for a specific machine. I have been trying using window function, unfortunately, I couldn't work out the logic of it!
The problem here is that since we have different machines, we cannot rely on sequential ids, just that the
endDate
of a sequence is very close to next startDate
(it is ok to set a tolerance constant e.g. 3 seconds).
I would like to have a query where the result would be: the min startDate
, the max endDate
and the IDs of the block. For this sample with machine = 6
, it would be:
blockStartDate blockEndDate ids
-------------------------------------------------------------------------------
"2019-01-16 07:50:06.887349" "2019-01-16 08:00:03.221309" [1999,1998,1997,1996]
"2019-01-16 07:10:46.984236" "2019-01-16 07:39:23.786911" [1989,1990,1991,1992,1993]
Note that the answer, in this case, has sequential IDs but this is not always the case. I am working on providing real data where 2 machines are producing data at the same time and the ids become useless.
Attyla Fellipe
(23 rep)
Jan 17, 2019, 02:00 PM
• Last activity: Nov 14, 2024, 01:44 AM
0
votes
0
answers
242
views
Calculate the sum of minutes between statuses Clickhouse
There is a table in ClickHouse that is constantly updated, format: ``` date_time | shop_id | item_id | status | balance --------------------------------------------------------------- 2022-09-09 13:00:01 | abc | 1234 | 0 | 0 2022-09-09 13:00:00 | abc | 1234 | 1 | 3 2022-09-09 12:50:00 | abc | 1234 |...
There is a table in ClickHouse that is constantly updated, format:
date_time | shop_id | item_id | status | balance
---------------------------------------------------------------
2022-09-09 13:00:01 | abc | 1234 | 0 | 0
2022-09-09 13:00:00 | abc | 1234 | 1 | 3
2022-09-09 12:50:00 | abc | 1234 | 1 | 10
The table stores statuses and balances for each item_id
, when the balance is changed, a new record with status, time and balance is added. If the balance = 0
, the status
changes to 0.
Need to calculate how much time (how many minutes) every item_id
in the shop was available for the day. The status may change several times a day.
Please help me calculate this.
Kirill_K
(1 rep)
Sep 12, 2022, 09:59 AM
• Last activity: Sep 3, 2024, 11:17 AM
-1
votes
1
answers
141
views
How to sum values within blocks with Postgres?
Given this table called `prices`: | my_date | my_value | my_separator | |:---- |:------:| -----:| | 2012-02-10 | 100 | 1 | | 2012-02-11 | 120 | 1 | | 2012-02-12 | 120 | 1 | | 2012-02-13 | 110 | 1 | | 2012-02-14 | 100 | 0 | | 2012-02-15 | 115 | 0 | | 2012-02-16 | 130 | 1 | | 2012-02-17 | 190 | 1 | |...
Given this table called
prices
:
| my_date | my_value | my_separator |
|:---- |:------:| -----:|
| 2012-02-10 | 100 | 1 |
| 2012-02-11 | 120 | 1 |
| 2012-02-12 | 120 | 1 |
| 2012-02-13 | 110 | 1 |
| 2012-02-14 | 100 | 0 |
| 2012-02-15 | 115 | 0 |
| 2012-02-16 | 130 | 1 |
| 2012-02-17 | 190 | 1 |
| 2012-02-18 | 180 | 1 |
| 2012-02-19 | 150 | 0 |
| 2012-02-20 | 130 | 1 |
I need to sum the values in my_value
column based on the condition that my_separator
is 1
within that group (so excluding the values where my_separator = 0
). That should return the following values:
* for the date range 2012-02-10-2012-02-13
=> 100 + 120 + 120 + 110 = 450
* for the date range 2012-02-16-2012-02-18
=> 130 + 190 + 180 = 500
* for the date 2012-02-20
=> 130
I tried using window functions but I could not come up with anything valid. Any help?
Randomize
(1203 rep)
Aug 2, 2024, 01:39 PM
• Last activity: Aug 2, 2024, 06:21 PM
9
votes
2
answers
18867
views
Generate a series of dates for each group in a table
I have a `balances` table in PostgreSQL 9.3 that looks like this: ~~~pgsql CREATE TABLE balances ( user_id INT , balance INT , as_of_date DATE ); INSERT INTO balances (user_id, balance, as_of_date) VALUES (1, 100, '2016-01-03') , (1, 50, '2016-01-02') , (1, 10, '2016-01-01') , (2, 200, '2016-01-01')...
I have a
balances
table in PostgreSQL 9.3 that looks like this:
~~~pgsql
CREATE TABLE balances (
user_id INT
, balance INT
, as_of_date DATE
);
INSERT INTO balances (user_id, balance, as_of_date) VALUES
(1, 100, '2016-01-03')
, (1, 50, '2016-01-02')
, (1, 10, '2016-01-01')
, (2, 200, '2016-01-01')
, (3, 30, '2016-01-03')
;
~~~
It only contains balances for dates that a user has made a transaction. I need it to contain a row for each user with their balance on each date in a given date range.
* If the user does not have a row for a given date in the range, I need to use their balance from the day before.
* If the user created their account after a given date in the range, I need to avoid creating a row for that user/date combination.
I can reference an accounts
table to get users' create_date
:
~~~pgsql
CREATE TABLE accounts (
user_id INT
, create_date DATE
);
INSERT INTO accounts (user_id, create_date) VALUES
(1, '2015-12-01')
, (2, '2015-12-31')
, (3, '2016-01-03')
;
~~~
My desired result looks like this:
+---------+---------+--------------------------+
| user_id | balance | as_of_date |
+---------+---------+--------------------------+
| 1 | 100 | 2016-01-03T00:00:00.000Z |
| 1 | 50 | 2016-01-02T00:00:00.000Z |
| 1 | 10 | 2016-01-01T00:00:00.000Z |
| 2 | 200 | 2016-01-03T00:00:00.000Z |
| 2 | 200 | 2016-01-02T00:00:00.000Z |
| 2 | 200 | 2016-01-01T00:00:00.000Z |
| 3 | 30 | 2016-01-03T00:00:00.000Z |
+---------+---------+--------------------------+
Note that rows have been added for user 2 for 2016-01-02
and 2016-01-03
, carrying over the previous balance from 2016-01-01
; and that no rows have been added for user 3, who was created on 2016-01-03
.
To generate a series of dates in a date range, I know I can use:
SELECT d.date FROM GENERATE_SERIES('2016-01-01', '2016-01-03', '1 day'::INTERVAL) d
...but I am struggling with LEFT JOIN
ing that series with each set of rows grouped by user_id
.
Shaun Scovil
(341 rep)
Mar 25, 2016, 08:48 PM
• Last activity: May 31, 2024, 03:29 AM
0
votes
1
answers
123
views
Total days on consecutive date records up to gap
I have a classic Gaps and Islands issue whereby I need to total the time across the most recent records as far back as the first gap in dates Looking at , this provides a very neat solution to my question except that it does not work for sql server, there are syntax errors. Does anyone have an equiv...
I have a classic Gaps and Islands issue whereby I need to total the time across the most recent records as far back as the first gap in dates
Looking at , this provides a very neat solution to my question except that it does not work for sql server, there are syntax errors.
Does anyone have an equivalent solution that works for sql server?
The data is something like this
| REF |TNCY-START| TNCY-END |
|:----: |:--------:|:--------:|
| 1 |2022-08-22| 2024-04-22|
| 1 |2019-04-29| 2022-08-21|
| 1 |2017-06-26| 2019-04-28|
| 1 |2009-06-29| 2010-01-31|
| 2 |2020-07-13| NULL |
| 2 |2020-05-18| 2020-07-12|
| 2 |2020-01-13| 2020-05-17|
| 2 |2010-12-06| 2016-08-28|
| 2 |2003-09-29| 2009-06-01|
| 3 |2019-03-25| NULL |
| 4 |2022-08-22| NULL |
| 4 |2019-04-29| 2022-08-21|
| 4 |2017-06-26| 2019-04-28|
| 4 |2009-06-29| 2010-01-31|
create table #test ([PERSON-REF] INT,
[TNCY-START] DATE,
[TNCY-END] DATE);
INSERT INTO #test VALUES
('1', '2022-08-22 00:00:00.000', '2024-04-22 00:00:00.000'),
('1', '2019-04-29 00:00:00.000', '2022-08-21 00:00:00.000'),
('1', '2017-06-26 00:00:00.000', '2019-04-28 00:00:00.000'),
('1', '2009-06-29 00:00:00.000', '2010-01-31 00:00:00.000'),
('2', '2020-07-13 00:00:00.000', null),
('2', '2020-05-18 00:00:00.000', '2020-07-12 00:00:00.000'),
('2', '2020-01-13 00:00:00.000', '2020-05-17 00:00:00.000'),
('2', '2010-12-06 00:00:00.000', '2016-08-28 00:00:00.000'),
('2', '2003-09-29 00:00:00.000', '2009-06-01 00:00:00.000'),
('3', '2019-03-25 00:00:00.000', NULL),
('4', '2022-08-22 00:00:00.000', NULL),
('4', '2019-04-29 00:00:00.000', '2022-08-21 00:00:00.000'),
('4', '2017-06-26 00:00:00.000', '2019-04-28 00:00:00.000'),
('4', '2009-06-29 00:00:00.000', '2010-01-31 00:00:00.000')
SELECT * FROM #test;
and I want to end up with a single row per person ref containing person-ref, count of records after gap, start date after gap, latest end date (or null), days between.
eg assuming UK date format and taking null as today
| REF |COUNT OF TENANCIES |FIRST TNCY-START|LAST TNCY-END|TOTAL TENANCY DAYS|
|:----:|:----------------:|:--------------:|:------------:|:----------------:|
| 1 | 3 |2017-06-26 | 2024-04-22 |2492|
| 2 | 3|2020-01-13|NULL|1562|
|3|1|2019-03-25|NULL|1866|
|4|3|2017-06-26|NULL|2493|
Thanks for any help anyone can offer
Judy D
(3 rep)
Apr 23, 2024, 02:33 PM
• Last activity: Apr 24, 2024, 10:35 AM
0
votes
1
answers
1546
views
Find Consecutive Date in SQL using 2 date columns
I am trying to group by all consecutive patient admission to the table. If the patient re-admitted to the hospital next day( or within 24 hours), then I need to keep only one admission stay per patient. I appreciate your help. 0 |Patient_bed_id |Patientname| Hospital_In_date| Hospital_out_Date| |-|-...
I am trying to group by all consecutive patient admission to the table. If the patient re-admitted to the hospital next day( or within 24 hours), then I need to keep only one admission stay per patient. I appreciate your help.
0
|Patient_bed_id |Patientname| Hospital_In_date| Hospital_out_Date|
|-|-|-|-|
|111| Lukas| 1/1/2022| 1/31/2022|
|111 | Lukas | 2/1/2022| 2/28/2022|
|111| Lukas| 3/1/2022| 3/31/2022|
|111| Lukas| 5/25/2022| 6/2/2022|
|111| Lukas| 8/1/2022| 8/20/2022|
|111| Lukas| 8/21/2022| 9/10/2022|
|222| Jason| 5/1/2022| 5/3/2022|
|222| Jason| 6/15/2022 | 7/11/2022|
|222| Jason| 7/12/2022 | 7/26/2022|
|222| Jason| 9/13/2022| 9/15/2022|
|222| Jason| 9/16/2022 |9/27/2022|
Final table
|Patient_bed_id| Patientname| Hospital_In_date| Hospital_out_Date|
|-|-|-|-|-
|111| Lukas| 1/1/2022| 3/31/2022|
|111| Lukas| 5/25/2022| 6/2/2022|
|111| Lukas| 8/1/2022| 9/10/2022|
|222| Jason| 5/1/2022| 5/3/2022|
|222| Jason| 6/15/2022| 7/26/2022|
|222| Jason | 9/13/2022| 9/27/2022|
Jane
(1 rep)
Jun 29, 2022, 03:28 AM
• Last activity: Apr 23, 2024, 07:24 AM
0
votes
2
answers
2411
views
select row number increment on value change compared to previous return result
I'm experimenting with Postgresql queries a bit and am trying to figure out if it's possible to write a query that would increment a row number counter when returned value changes. Example SOURCE (where SOURCE can be a sorted/unsorted sub-query/table): X|Y --- 0|0 0|1 0|0 1|0 1|0 1|1 2|0 Example inc...
I'm experimenting with Postgresql queries a bit and am trying to figure out if it's possible to write a query that would increment a row number counter when returned value changes.
Example SOURCE (where SOURCE can be a sorted/unsorted sub-query/table):
X|Y
---
0|0
0|1
0|0
1|0
1|0
1|1
2|0
Example increment on X:
select X, Y, wishful_row_number(X) as Rn from SOURCE;
0,0,1
0,1,1
0,0,1
1,0,2
1,0,2
1,1,2
2,0,3
(row number changes every time X changes)
Example increment on Y:
select X, Y, wishful_row_number(Y) as Rn from SOURCE;
0,0,1
0,1,2
0,0,3
1,0,3
1,0,3
1,1,4
2,0,5
(row number changes every time Y changes - goes to a bigger number/different string even if that value was already seen before)
So the Rn increment is not dependent on any "order by" or source sorting but just on the previously returned row.
Can such a query be written (without store procedures and if possible without temporary tables)?
EDIT: And yes, I know tables have no inherent ordering which is exactly the point of why I am trying to skip the "force me to order by" step and bind the Rn row generation on returned values (see my reply comment below).
And yes, I understand that something like that might not be possible in SQL based on the "why would somebody in their right mind want to do that???" but as far as I'm concerned that's the SQL limitation and the answer is "no, it can't be done" even if some people have the fetish of down voting my question.
Domen Vrankar
(111 rep)
Mar 10, 2024, 11:54 PM
• Last activity: Mar 11, 2024, 10:46 AM
1
votes
1
answers
60
views
Identify if the sequence of visits to an area id was interrupted at any point
I have a PostgreSQL table `df` with the following information: |area_id|trip_nr |timestamp| |-------|-----|----------------------| |a |3 |29/06/2022 17:18:03| |a |2 |29/06/2022 17:18:02| |a |1 |29/06/2022 17:18:01| |b |1 |28/06/2022 19:21:03| |c |3 |28/06/2022 19:21:02| |e |3 |28/06/2022 19:21:01| |...
I have a PostgreSQL table
df
with the following information:
|area_id|trip_nr |timestamp|
|-------|-----|----------------------|
|a |3 |29/06/2022 17:18:03|
|a |2 |29/06/2022 17:18:02|
|a |1 |29/06/2022 17:18:01|
|b |1 |28/06/2022 19:21:03|
|c |3 |28/06/2022 19:21:02|
|e |3 |28/06/2022 19:21:01|
|e |2 |27/06/2022 17:36:03|
|c |2 |27/06/2022 17:36:02|
|c |1 |27/06/2022 17:36:01|
|e |1 |27/06/2022 16:55:03|
I would like to add a column continuous
that checks if the trip_nr
to a particular area_id
follows chronologically from the first visit based on the order of timestamp
.
In other words, continuous=True
if an area_id
is visited for the nth time and the previous visits were to the same area_id
all the way from n, n-1,...,1.
And, continuous=False
if an area_id
is visited for the n’th time, but a different area_id
was visited somewhere in between the visits n, n-1,...,1 (i.e. the sequence of trip_nr
’s is broken at some point).
An example of the desired output:
|area_id|trip_nr |timestamp|continous|
|-------|-----|----------------------|-----|
|a |3 |29/06/2022 17:18:03| True|
|a |2 |29/06/2022 17:18:02| True|
|a |1 |29/06/2022 17:18:01| True|
|b |1 |28/06/2022 19:21:03|True |
|c |3 |28/06/2022 19:21:02| False|
|e |3 |28/06/2022 19:21:01| False|
|e |2 |27/06/2022 17:36:03| False|
|c |2 |27/06/2022 17:36:02| True|
|c |1 |27/06/2022 17:36:01|True |
|e |1 |27/06/2022 16:55:03| True|
How can I achieve this optimally? Clickhouse/SQL answers also welcome.
Note that the timestamp is not necessarily stored in order.
Ruan
(35 rep)
Mar 3, 2024, 01:36 PM
• Last activity: Mar 3, 2024, 11:52 PM
7
votes
2
answers
4524
views
Carry over long sequence of missing values with Postgres
I have a table like this: create table foo (foo_label text, foo_price int, foo_date date); insert into foo ( values ('aaa', 100, '2017-01-01'), ('aaa', NULL, '2017-02-01'), ('aaa', NULL, '2017-03-01'), ('aaa', NULL, '2017-04-01'), ('aaa', 140, '2017-05-01'), ('aaa', NULL, '2017-06-01'), ('aaa', 180,...
I have a table like this:
create table foo (foo_label text, foo_price int, foo_date date);
insert into foo (
values
('aaa', 100, '2017-01-01'),
('aaa', NULL, '2017-02-01'),
('aaa', NULL, '2017-03-01'),
('aaa', NULL, '2017-04-01'),
('aaa', 140, '2017-05-01'),
('aaa', NULL, '2017-06-01'),
('aaa', 180, '2017-07-01')
);
As you can see a few values on the
foo_price
column are missing.
What I need is that missing values get filled up with the "previous" available value in this way:
foo_label | fixed_foo_price | foo_date
-----------+-----------------+------------
aaa | 100 | 2017-01-01
aaa | 100 | 2017-02-01
aaa | 100 | 2017-03-01
aaa | 100 | 2017-04-01
aaa | 140 | 2017-05-01
aaa | 140 | 2017-06-01
aaa | 180 | 2017-07-01
**My attempt:**
select
foo_label,
(case when foo_price is null then previous_foo_price else foo_price end) as fixed_foo_price,
foo_date
from (
select
foo_label,
lag(foo_price) OVER (PARTITION BY foo_label order by foo_date::date) as previous_foo_price,
foo_price,
foo_date
from foo
) T;
As you can see from here:
https://www.db-fiddle.com/#&togetherjs=s6giIonUxT
It doesn't fill completely the '100' series.
Any idea how can I get the wanted result?
Randomize
(1203 rep)
Sep 18, 2017, 02:24 PM
• Last activity: Jan 3, 2024, 10:24 AM
1
votes
1
answers
353
views
SELECT / DELETE rows where consecutive null value count is greater than N
I have a table of the following structure, where all columns except the last are non-null: | ID | receiver | send_time | recv_time | | -- | -------- | --------- | --------- | | 1 | A | 00:00:00 | 00:00:01 | | 2 | A | 00:00:01 | NULL | | 3 | A | 00:00:02 | NULL | | 4 | A | 00:00:03 | NULL | | 5 | A |...
I have a table of the following structure, where all columns except the last are non-null:
| ID | receiver | send_time | recv_time |
| -- | -------- | --------- | --------- |
| 1 | A | 00:00:00 | 00:00:01 |
| 2 | A | 00:00:01 | NULL |
| 3 | A | 00:00:02 | NULL |
| 4 | A | 00:00:03 | NULL |
| 5 | A | 00:00:04 | 00:00:05 |
| 6 | B | 00:00:00 | 00:00:01 |
| 7 | B | 00:00:01 | NULL |
| 8 | B | 00:00:02 | 00:00:03 |
| 9 | A | 00:00:05 | NULL |
| 10 | B | 00:00:03 | NULL |
| 11 | A | 00:00:06 | 00:00:07 |
I want to select (and ultimately delete, but I think selecting would be the first step) all rows, where the
recv_time
column is NULL
for *n or more* consecutive rows, partitioned by the receiver
column and ordered by the send_time
column.
So from the above example, if *n=2*, I want to select/delete the rows with ID 2, 3 and 4, but not rows 7, 9, or 10.
I think a count with partition is needed, but I can't figure out how to "reset" the count of null values as soon as a non-null row is encountered. This is what I've got so far:
SELECT
id, receiver, send_time, recv_time,
COUNT(CASE WHEN recv_time IS NULL THEN 1 END) OVER (PARTITION BY receiver ORDER BY send_time)
FROM test
ORDER BY id
which gives me:
| ID | receiver | send_time | recv_time | count |
| -- | -------- | --------- | --------- | ----- |
| 1 | A | 00:00:00 | 00:00:01 | 0 |
| 2 | A | 00:00:01 | NULL | 1 |
| 3 | A | 00:00:02 | NULL | 2 |
| 4 | A | 00:00:03 | NULL | 3 |
| 5 | A | 00:00:04 | 00:00:05 | 3 |
| 6 | B | 00:00:00 | 00:00:01 | 0 |
| 7 | B | 00:00:01 | NULL | 1 |
| 8 | B | 00:00:02 | 00:00:03 | 1 |
| 9 | A | 00:00:05 | NULL | 4 |
| 10 | B | 00:00:03 | NULL | 2 |
| 11 | A | 00:00:06 | 00:00:07 | 4 |
But I want the count to reset at rows 5, 8, 11. I'm also not sure how I would proceed after that, because I want to capture *all* rows in the "group" of consecutive NULL rows, but obviously the count for row 2 in this example is smaller than *n*.
Edgxxar
(13 rep)
Sep 16, 2023, 01:54 PM
• Last activity: Sep 20, 2023, 12:23 AM
3
votes
1
answers
2517
views
Calculate Difference in Overlapping Time in PostgreSQL / SSRS
I'm using PostgreSQL 9.2 and SSRS 3.0. I'm trying to calculate the difference in non-overlapping time entries in a timesheet. My data looks like this: [![data][1]][1] I would like this to return: [![results][2]][2] I can get the overlapping minutes 635, but am having trouble with the not overlapping...
I'm using PostgreSQL 9.2 and SSRS 3.0. I'm trying to calculate the difference in non-overlapping time entries in a timesheet.
My data looks like this:
I would like this to return:
I can get the overlapping minutes 635, but am having trouble with the not overlapping, 605 minutes.


Rob Slone
(31 rep)
Aug 21, 2015, 05:51 PM
• Last activity: Sep 8, 2023, 10:25 PM
1
votes
1
answers
38
views
Merging wifi session data if time between them is less than 15 minutes
I am trying to process network logs and join sessions together if the time between them is less than 15 minutes. The relevant fields are start time, end time, mac address, and wifi access point. I am working in Greenplum 6.22/Postgresql 9.4.26: ``` pdap=# SELECT version(); ``` | version | | :------...
I am trying to process network logs and join sessions together if the time between them is less than 15 minutes. The relevant fields are start time, end time, mac address, and wifi access point.
I am working in Greenplum 6.22/Postgresql 9.4.26:
pdap=# SELECT version();
| version |
| :------ |
| PostgreSQL 9.4.26 (Greenplum Database 6.22.2) |
[db fiddle](https://dbfiddle.uk/vOP-wLu8)
Logically, what I want to do is "If the start time from the next row is less than 15 minutes after the end time from this row, merge the two rows into one row with the earlier start time and the later end time."
Here is an example table with some data:
CREATE TABLE network_test
( start_ts TIMESTAMPTZ,
end_ts TIMESTAMPTZ,
mac_addr MACADDR,
access_point VARCHAR
);
INSERT INTO network_test
VALUES
('2023-08-14 13:21:10.289'::timestamptz, '2023-08-14 13:31:20.855'::timestamptz, '00:00:00:00:00:01'::macaddr, 'access_point_01'),
('2023-08-14 13:58:10.638'::timestamptz, '2023-08-14 13:58:22.668'::timestamptz, '00:00:00:00:00:01'::macaddr, 'access_point_01'),
('2023-08-14 13:58:22.727'::timestamptz, '2023-08-14 13:58:38.966'::timestamptz, '00:00:00:00:00:01'::macaddr, 'access_point_01'),
('2023-08-14 13:28:28.190'::timestamptz, '2023-08-14 13:28:28.190'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_02'),
('2023-08-14 13:28:44.167'::timestamptz, '2023-08-14 13:28:44.288'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_02'),
('2023-08-14 13:45:40.281'::timestamptz, '2023-08-14 13:46:02.726'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_03'),
('2023-08-14 13:46:02.964'::timestamptz, '2023-08-14 13:46:10.783'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_03'),
('2023-08-14 13:46:11.026'::timestamptz, '2023-08-14 13:46:18.803'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_03'),
('2023-08-14 13:46:19.037'::timestamptz, '2023-08-14 13:46:26.798'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_03'),
('2023-08-14 13:46:27.036'::timestamptz, '2023-08-14 13:46:34.815'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_03'),
('2023-08-14 13:46:35.057'::timestamptz, '2023-08-14 13:46:46.980'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_03'),
('2023-08-14 13:46:47.213'::timestamptz, '2023-08-14 13:46:54.946'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_03'),
('2023-08-14 13:46:55.189'::timestamptz, '2023-08-14 13:47:17.040'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_03'),
('2023-08-14 13:47:17.297'::timestamptz, '2023-08-14 13:47:25.106'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_03'),
('2023-08-14 13:55:25.381'::timestamptz, '2023-08-14 13:58:33.059'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_03');
SELECT *
FROM network_test
ORDER BY mac_addr, access_point, start_ts
| start\_ts | end\_ts | mac\_addr | access\_point |
| :---------|:-------|:---------|:-------------|
| 2023-08-14 13:21:10.289+00 | 2023-08-14 13:31:20.855+00 | 00:00:00:00:00:01 | access\_point\_01 |
| 2023-08-14 13:58:10.638+00 | 2023-08-14 13:58:22.668+00 | 00:00:00:00:00:01 | access\_point\_01 |
| 2023-08-14 13:58:22.727+00 | 2023-08-14 13:58:38.966+00 | 00:00:00:00:00:01 | access\_point\_01 |
| 2023-08-14 13:28:28.19+00 | 2023-08-14 13:28:28.19+00 | 00:00:00:00:00:02 | access\_point\_02 |
| 2023-08-14 13:28:44.167+00 | 2023-08-14 13:28:44.288+00 | 00:00:00:00:00:02 | access\_point\_02 |
| 2023-08-14 13:45:40.281+00 | 2023-08-14 13:46:02.726+00 | 00:00:00:00:00:02 | access\_point\_03 |
| 2023-08-14 13:46:02.964+00 | 2023-08-14 13:46:10.783+00 | 00:00:00:00:00:02 | access\_point\_03 |
| 2023-08-14 13:46:11.026+00 | 2023-08-14 13:46:18.803+00 | 00:00:00:00:00:02 | access\_point\_03 |
| 2023-08-14 13:46:19.037+00 | 2023-08-14 13:46:26.798+00 | 00:00:00:00:00:02 | access\_point\_03 |
| 2023-08-14 13:46:27.036+00 | 2023-08-14 13:46:34.815+00 | 00:00:00:00:00:02 | access\_point\_03 |
| 2023-08-14 13:46:35.057+00 | 2023-08-14 13:46:46.98+00 | 00:00:00:00:00:02 | access\_point\_03 |
| 2023-08-14 13:46:47.213+00 | 2023-08-14 13:46:54.946+00 | 00:00:00:00:00:02 | access\_point\_03 |
| 2023-08-14 13:46:55.189+00 | 2023-08-14 13:47:17.04+00 | 00:00:00:00:00:02 | access\_point\_03 |
| 2023-08-14 13:47:17.297+00 | 2023-08-14 13:47:25.106+00 | 00:00:00:00:00:02 | access\_point\_03 |
| 2023-08-14 13:55:25.381+00 | 2023-08-14 13:58:33.059+00 | 00:00:00:00:00:02 | access\_point\_03 |
Here is what I would like the result to be:
| start\_ts | end\_ts | mac\_addr | access\_point |
| :---------|:-------|:---------|:-------------|
| 2023-08-14 13:21:10.289+00 | 2023-08-14 13:31:20.855+00 | 00:00:00:00:00:01 | access\_point\_01 |
| 2023-08-14 13:58:10.638+00 | 2023-08-14 13:58:38.966+00 | 00:00:00:00:00:01 | access\_point\_01 |
| 2023-08-14 13:28:28.19+00 | 2023-08-14 13:28:44.288+00 | 00:00:00:00:00:02 | access\_point\_02 |
| 2023-08-14 13:45:40.281+00 | 2023-08-14 13:58:33.059+00 | 00:00:00:00:00:02 | access\_point\_03 |
The first session stays as it is. The 2nd and 3rd sessions are merged into one because they have the same mac address and access point, and there is less than 15 minutes between them. The same happens for the 4th and 5th sessions, as well as the 6th through the 15th.
I can come close using window functions:
SELECT DISTINCT
MIN(start_ts) OVER (PARTITION BY mac_addr, access_point, ROUND(EXTRACT(EPOCH FROM start_ts)/900)) AS start_ts,
MAX(end_ts) OVER (PARTITION BY mac_addr, access_point, ROUND(EXTRACT(EPOCH FROM end_ts)/900)) AS end_ts,
mac_addr,
access_point
FROM network_test
ORDER BY mac_addr, access_point, start_ts
| start\_ts | end\_ts | mac\_addr | access\_point |
| :---------|:-------|:---------|:-------------|
| 2023-08-14 13:21:10.289+00 | 2023-08-14 13:31:20.855+00 | 00:00:00:00:00:01 | access\_point\_01 |
| 2023-08-14 13:58:10.638+00 | 2023-08-14 13:58:38.966+00 | 00:00:00:00:00:01 | access\_point\_01 |
| 2023-08-14 13:28:28.19+00 | 2023-08-14 13:28:44.288+00 | 00:00:00:00:00:02 | access\_point\_02 |
| 2023-08-14 13:45:40.281+00 | 2023-08-14 13:47:25.106+00 | 00:00:00:00:00:02 | access\_point\_03 |
| 2023-08-14 13:55:25.381+00 | 2023-08-14 13:58:33.059+00 | 00:00:00:00:00:02 | access\_point\_03 |
But note that the last two data points end up in separate 15-minute buckets even though they're only 8 minutes apart.
Does anyone know if there is a way to do this in SQL, or am I going to have to write a PL/pgSQL function to go through the data row by row and do the comparison?
lpscott
(33 rep)
Sep 8, 2023, 02:15 PM
• Last activity: Sep 8, 2023, 03:28 PM
2
votes
1
answers
634
views
T-SQL : Slicing time periods by related time periods
Consider the following table that gives the validity dates for which an `Item` is associated to something called a `Demand_Unit`, and a certain `Market` : | Item_ID | Start_Date | End_Date | Demand_Unit | Market | |---------|------------|------------|-------------|----------| | X | 2020-06-01 | 2020...
Consider the following table that gives the validity dates for which an
Item
is associated to something called a Demand_Unit
, and a certain Market
:
| Item_ID | Start_Date | End_Date | Demand_Unit | Market |
|---------|------------|------------|-------------|----------|
| X | 2020-06-01 | 2020-09-30 | A | GREECE |
| X | 2020-11-01 | 2021-01-01 | A | BELGIUM |
| X | 2021-01-01 | 2023-12-31 | A | USA |
| Y | 2021-01-01 | 2021-02-01 | B | PORTUGAL |
| Y | 2021-02-01 | 2021-06-07 | B | FRANCE |
| Y | 2021-06-07 | 2022-01-03 | B | ITALY |
| Y | 2022-01-03 | 2023-12-31 | B | SPAIN |
| Y | 2024-01-01 | 2025-12-31 | B | UKRAINE |
| Z | 2021-05-01 | 2021-06-30 | C | USA |
| W | 2021-02-01 | 2021-05-30 | D | SWEDEN |
| W | 2023-12-31 | 2025-05-30 | D | DANEMARK |
| U | 2021-01-04 | 2022-02-07 | E | TUNISIA |
| U | 2022-02-07 | 2025-01-05 | (null) | MOROCCO |
> Important note:
> You can never have overlapping dates because basically when one Market closes another opens. But it is possible to have non-contiguous periods (like for Item
X for example)
Now consider the following table that gives the Classification
s of a Demand_Unit
throughout time:
| Demand_Unit | Start_Date | End_Date | Classification |
|-------------|------------|------------|----------------|
| A | 2021-01-01 | 2021-04-05 | N+ |
| A | 2021-04-05 | 2023-12-31 | K- |
| B | 2021-03-08 | 2021-07-26 | N+ |
| B | 2021-07-26 | 2022-11-30 | L- |
| C | 2021-01-01 | 2023-12-31 | N- |
| D | 2021-06-01 | 2023-12-31 | K+ |
| E | 2021-08-02 | 2022-01-03 | N+ |
| E | 2022-01-03 | 2025-01-06 | C- |
> Important note:
> You can only have contiguous periods here. This table is extracted from a Slowly changing dimension.
The goal is to 'slice' the first table to associate to each Item
its Classification
, while taking into account the time periods. The result should look like this:
| Item_ID | Demand_Unit | Market | Item_Classification_Dt_Begin | Item_Classification_Dt_End | Item_Classification |
|---------|-------------|----------|------------------------------|----------------------------|---------------------|
| U | E | TUNISIA | 2021-01-04 | 2021-08-02 | (null) |
| U | E | TUNISIA | 2021-08-02 | 2022-01-03 | N+ |
| U | E | TUNISIA | 2022-01-03 | 2022-02-07 | C- |
| U | (null) | MOROCCO | 2022-02-07 | 2025-01-05 | (null) |
| W | D | SWEDEN | 2021-02-01 | 2021-05-30 | (null) |
| W | D | DANEMARK | 2023-12-31 | 2023-12-31 | K+ |
| W | D | DANEMARK | 2023-12-31 | 2025-05-30 | (null) |
| X | A | GREECE | 2020-06-01 | 2020-09-30 | (null) |
| X | A | BELGIUM | 2020-11-01 | 2021-01-01 | (null) |
| X | A | USA | 2021-01-01 | 2021-04-05 | N+ |
| X | A | USA | 2021-04-05 | 2023-12-31 | K- |
| Y | B | PORTUGAL | 2021-01-01 | 2021-02-01 | (null) |
| Y | B | FRANCE | 2021-02-01 | 2021-03-08 | (null) |
| Y | B | FRANCE | 2021-03-08 | 2021-06-07 | N+ |
| Y | B | ITALY | 2021-06-07 | 2021-07-26 | N+ |
| Y | B | ITALY | 2021-07-26 | 2022-01-03 | L- |
| Y | B | SPAIN | 2022-01-03 | 2022-11-30 | L- |
| Y | B | SPAIN | 2022-11-30 | 2023-12-31 | (null) |
| Y | B | UKRAINE | 2024-01-01 | 2025-12-31 | (null) |
| Z | C | USA | 2021-05-01 | 2021-06-30 | (null) |
| Z | C | USA | 2021-05-01 | 2021-06-30 | N- |
These are some examples with 'tricky' cases (Items X and Y) and a 'simple' case (Item Z) which has its dates completely included in the classifications period, + other cases to try and test possibilities I could think of...
I'm working on a solution that uses LAG()
LEAD()
and a three-part UNION
, but I'm afraid I'm forgetting something and the original table has hundreds of thousands of rows.
I'm not trying to have someone do my work here, I would just like your opinions on how should this be properly done.
Here's my solution so far, it looks like it works but I don't find it elegant and I'm sure the second and third UNION
block can be simplified, but I just can't see straight anymore.
;WITH CLASSIFICATIONS_LAG_LEAD AS
(
SELECT
Demand_Unit
,Start_Date
,End_Date
,Classification
,LAG(End_Date) OVER (PARTITION BY Demand_Unit ORDER BY Start_Date) AS LAGD
,LEAD(Start_Date) OVER (PARTITION BY Demand_Unit ORDER BY Start_Date ) AS LEADD
FROM Classifications
)
-- This first block handles Item periods that are not covered by classifications periods
-- and/or the NULL classif portion of Item periods that are partially covered by classifications periods
SELECT
T1.Item_ID
,T1.Demand_Unit
,T1.Market
,CASE
WHEN T2.End_Date > T1.Start_Date
AND T2.End_Date T1.Start_Date
AND T2.Start_Date T2.End_Date
)
)
UNION
-- The following block handles Items periods that are (at least partially)
-- contained within the Classification period, and checks which DT_BEGIN to use
SELECT
T1.Item_ID
,T1.Demand_Unit
,T1.Market
,CASE
WHEN T1.Start_Date T2.Start_Date
UNION
-- The following block handles Items periods that are (at least partially)
-- contained within the Classification period, and checks which DT_END to use
SELECT
T1.Item_ID
,T1.Demand_Unit
,T1.Market
,T1.Start_Date AS Item_Classification_Dt_Begin
,CASE
WHEN T1.End_Date > T2.End_Date THEN T2.End_Date
ELSE T1.End_Date
END AS Item_Classification_Dt_End
,T2.Classification AS Item_Classification
FROM Items T1
INNER JOIN Classifications T2
ON T1.Demand_Unit = T2.Demand_Unit
AND T1.Start_Date = T2.Start_Date
ORDER BY Item_ID, 4
Here is a SQLFiddle
Many thanks in advance
Edit : last minute fixes in my solution
Yass T
(33 rep)
Oct 27, 2021, 04:06 PM
• Last activity: Jul 23, 2023, 12:02 AM
6
votes
2
answers
8382
views
Check if contiguous date interval exists
I have a table in PostgreSQL which describes some events that have a start date and an end date: CREATE TABLE my_table ( event_id serial NOT NULL, start_date timestamp without time zone NOT NULL, end_date timestamp without time zone NOT NULL ) A single event may overlap the previous and the next one...
I have a table in PostgreSQL which describes some events that have a start date and an end date:
CREATE TABLE my_table
(
event_id serial NOT NULL,
start_date timestamp without time zone NOT NULL,
end_date timestamp without time zone NOT NULL
)
A single event may overlap the previous and the next one. In the following table the first 3 of 4 events form a contiguous time interval:
1 '2015-04-02 22:09:03' '2015-04-02 22:19:05'
2 '2015-04-02 22:17:38' '2015-04-02 22:27:38'
3 '2015-04-02 22:25:21' '2015-04-02 22:36:23'
4 '2015-04-02 22:45:23' '2015-04-02 22:55:23'
Is it possible to write a query that checks if a contiguous date interval between two given dates exists?
I would like to have something like:
select ...
from my_table
where start_date > '2015-04-02' and end_date < '2015-04-06'
hank
(919 rep)
Apr 3, 2015, 12:06 PM
• Last activity: Jul 14, 2023, 03:44 PM
0
votes
1
answers
2469
views
SQL (Postgres) Filling up missing rows with values from previous row and calculating on result set
I have got this dataset: ``` id | block_number | value | name ----+--------------+-------+------ 2 | 47 | 100 | one 3 | 52 | 200 | one 4 | 58 | 120 | one 5 | 60 | 180 | one ```` created from ``` create table test (id serial, block_number int, value int, name varchar); ``` and ``` insert into test (b...
I have got this dataset:
id | block_number | value | name
----+--------------+-------+------
2 | 47 | 100 | one
3 | 52 | 200 | one
4 | 58 | 120 | one
5 | 60 | 180 | one
`
created from
create table test (id serial, block_number int, value int, name varchar);
and
insert into test (block_number, value, name) values (47, 100, 'one'), (52, 200, 'one'), (58, 120, 'one'), (60, 180, 'one');
I would like to fill up the rows (not actually add them to the table but just use them for further processing), so that the column 'block_number' shows up consecutively using the value of the previous row for 'value' like so:
block_number | value | name
-------------+-------+------
47 | 100 | one
48 | 100 | one
49 | 100 | one
50 | 100 | one
51 | 100 | one
52 | 200 | one
53 | 200 | one
54 | 200 | one
55 | 200 | one
56 | 200 | one
57 | 200 | one
58 | 120 | one
59 | 120 | one
60 | 180 | one
Then I would like to multiply each of these rows with a given number and sum up the total of that, for example given number would be 2 then the result of the above would be 4,240.
I haven't gotten very far with this, only thing I could come up with is a statement that generates the missing rows. However, I am failing to find out how I could set the value of each row to the one of the previous row:
SELECT x.block_number, x.value, t.value
FROM (
SELECT generate_series(min(block_number), max(block_number)) AS block_number
FROM my_table
) x
LEFT JOIN my_table t USING (block_number)
ORDER BY x.block_number;
Any help greatly appreciated.
### UPDATE: ###
I tried to reconcile this with @mustaccio's hint here: https://dba.stackexchange.com/posts/327530/edit#:~:text=Carry%20over%20long%20sequence%20of%20missing%20values%20with%20Postgres and I came up with this:
SELECT block_number
, first_value(value) OVER (PARTITION BY block_number, grp ORDER BY block_number) AS value
FROM (
SELECT
count(value) OVER (PARTITION BY block_number ORDER BY block_number) AS grp
, value
, block_number
FROM (
SELECT x.block_number, t.value
FROM (
SELECT generate_series(min(block_number), max(block_number)) AS block_number
FROM test
) x
LEFT JOIN test t USING (block_number)
ORDER BY x.block_number
) test
) sub;
but it still doesn't get me the missing values:
block_number | value
--------------+-------
47 | 100
48 |
49 |
50 |
51 |
52 | 200
53 |
54 |
55 |
56 |
57 |
58 | 120
59 |
60 | 180
Any suggestions?
### UPDATE 2 ###
Thanks to @mustaccio's answer the first part is solved. I tried to amend the statement so that it does the calculation as mentioned above: Multiplying the 'value' in each row with a given number (here: 2) and then summing up all those values. What I tried:
SELECT SUM(a.value * 2)
FROM (
SELECT block_number, value, first_value(value) OVER (PARTITION BY grp)
FROM (
SELECT
sum(CASE WHEN value IS NOT NULL THEN 1 END) OVER (ORDER BY block_number) AS grp
, value
, block_number
FROM (
SELECT x.block_number, t.value
FROM (
SELECT generate_series(min(block_number), max(block_number)) AS block_number
FROM test
) x
LEFT JOIN test t USING (block_number)
) y
) z
) a;
However, that gives me a result of 1,200 instead of the expected 4,060.
Any ideas?
Westcoaster
(3 rep)
May 25, 2023, 03:50 PM
• Last activity: May 26, 2023, 02:34 PM
0
votes
1
answers
427
views
Count the number of days an order has been at a specific status
I can't figure out how to write a query that returns the results I need. Below is a table that list in chronological order the status of an order on specific days and time. An order can change from one status to another but can also change back to a status it was at previously. How do I write a quer...
I can't figure out how to write a query that returns the results I need.
Below is a table that list in chronological order the status of an order on specific days and time.
An order can change from one status to another but can also change back to a status it was at previously.
How do I write a query that will give me the following for each time an order changes a status (minimum days is 1):
select doc_no, doc_type, line_no, status, DATEDIFF(day, '04/18/2023', '05/10/2023') AS [number of days at this status before changing]
Data:
doc_no |doc_type | line_no |status | updated date | time
--------|---------|---------|-------|---------------------------|-----
2353021 | SA | 21000 | 527 | 2023-02-17 00:00:00.000 | 100640
2353021 | SA | 21000 | 527 | 2023-03-16 00:00:00.000 | 114555
2353021 | SA | 21000 | 527 | 2023-03-23 00:00:00.000 | 101018
2353021 | SA | 21000 | 527 | 2023-03-23 00:00:00.000 | 145322
2353021 | SA | 21000 | 545 | 2023-03-23 00:00:00.000 | 145323
2353021 | SA | 21000 | 555 | 2023-03-23 00:00:00.000 | 145411
2353021 | SA | 21000 | 555 | 2023-04-14 00:00:00.000 | 113735
2353021 | SA | 21000 | 555 | 2023-04-14 00:00:00.000 | 115644
2353021 | SA | 21000 | 555 | 2023-04-14 00:00:00.000 | 115826
2353021 | SA | 21000 | 527 | 2023-04-14 00:00:00.000 | 122233
2353021 | SA | 21000 | 527 | 2023-04-14 00:00:00.000 | 190943
2353021 | SA | 21000 | 527 | 2023-04-18 00:00:00.000 | 32939
2353021 | SA | 21000 | 527 | 2023-04-18 00:00:00.000 | 161039
2353021 | SA | 21000 | 545 | 2023-04-18 00:00:00.000 | 161046
2353021 | SA | 21000 | 555 | 2023-04-18 00:00:00.000 | 161102
2353021 | SA | 21000 | 555 | 2023-04-18 00:00:00.000 | 163512
2353021 | SA | 21000 | 555 | 2023-05-09 00:00:00.000 | 134853
2353021 | SA | 21000 | 555 | 2023-05-10 00:00:00.000 | 140516
The results I am looking for:
doc_no | doc_type | line_no | status | number_of_days_at_status
-------|----------|---------|--------|-------------------------
2353021| SA | 2100 | 527 | 27
2353021| SA | 2100 | 545 | 1
2353021| SA | 2100 | 555 | 22
2353021| SA | 2100 | 527 | 4
2353021| SA | 2100 | 545 | 1
2353021| SA | 2100 | 555 | 22
Kram_Koorbse
(255 rep)
May 10, 2023, 07:53 PM
• Last activity: May 12, 2023, 02:45 PM
Showing page 1 of 20 total questions