Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes

1 answers

203 views

Update row data with dynamic lag

I have a table of items and restaurants. We assume that if the `restaurant` entry is missing, it has to be completed with the last non-null `restaurant` value in the previous rows (as defined by ascending `id` values). CREATE TABLE food_items ( id SERIAL PRIMARY KEY, item VARCHAR(255) NOT NULL, rest...

                                  I have a table of items and restaurants. We assume that if the restaurant entry is missing, it has to be completed with the last non-null restaurant value in the previous rows (as defined by ascending id values). 

    CREATE TABLE food_items (
      id SERIAL PRIMARY KEY,
      item VARCHAR(255) NOT NULL,
      restaurant VARCHAR(255)
    );

    INSERT INTO food_items(item, restaurant) 
    VALUES ('Salad','McDonalds'), ('Burger',''),('Fries',''),('Salad','Taco Bell'),('Quesadilla','');

The tricky part is that the number of rows for each restaurant is variable, so it cannot be captured with a fixed lag.
                                

eddie (101 rep)

Jan 2, 2024, 08:17 AM • Last activity: Jun 23, 2025, 09:03 AM

0 votes

1 answers

192 views

Determine Available Blocks of Contiguous Ids

sql-server identity auto-increment gaps-and-islands limits

We have a SQL Server table that has an int autoincrement primary key column on it. The table's primary key space is fragmented. For example, ids 1 through 10 are in use but then ids 11 through 100,000 are not in use. Ids 1,000,000 through 1,100,000 are in use but ids 1,100,000 through 50,000,000 are...

                                  We have a SQL Server table that has an int autoincrement primary key column on it. 

The table's primary key space is fragmented. For example, ids 1 through 10 are in use but then ids 11 through 100,000 are not in use. Ids 1,000,000 through 1,100,000 are in use but ids 1,100,000 through 50,000,000 are not in use. 

I am trying to figure out any and all such available range of ids. Once the ranges are determined, then we may reseed the primary key column to start at the beginning of the widest range.

Any tool or utility or SQL script out there for determining such available ranges?

Manny Siddiqui (11 rep)

Sep 23, 2021, 06:13 PM • Last activity: Jun 22, 2025, 08:09 PM

0 votes

1 answers

209 views

Gaps and Islands Across Fields With Start and End Timestamps

postgresql gaps-and-islands

I am working with GPS data which I want to group by locations (to six decimal places of the coordinate value) and generate `from` and `until` times for. For simplicity in this example I'm making the coordinates pre-formatted string values. I am using Postgresql. ``` CREATE TABLE locations ( location...

I am working with GPS data which I want to group by locations (to six decimal places of the coordinate value) and generate from and until times for. For simplicity in this example I'm making the coordinates pre-formatted string values. I am using Postgresql.

CREATE TABLE locations
(
    location_id serial PRIMARY KEY,
    latitude VARCHAR (10),
    longitude VARCHAR (10),
    generated_at TIMESTAMP
);

INSERT INTO locations (latitude, longitude, generated_at)
VALUES
('51.123456', '-1.123456', '2020-04-27 17:00:00.000'),
('51.123456', '-1.123456', '2020-04-27 17:05:00.000'),
('51.654321', '-1.654321', '2020-04-27 17:10:00.000'),
('51.654321', '-1.654321', '2020-04-27 17:15:00.000'),
('51.654321', '-1.654321', '2020-04-27 17:20:00.000'),
('51.444555', '-1.444555', '2020-04-27 17:25:00.000'),
('51.123456', '-1.123456', '2020-04-27 17:30:00.000'),
('51.123456', '-1.123456', '2020-04-27 17:35:00.000'),
('51.123456', '-1.123456', '2020-04-27 17:40:00.000'),
('51.444555', '-1.444555', '2020-04-27 17:45:00.000'),
('51.444555', '-1.444555', '2020-04-27 17:50:00.000'),
('51.654321', '-1.654321', '2020-04-27 17:55:00.000');

What I would like to get back from the query would be:

'51.123456' | '-1.123456' | '2020-04-27 17:00:00.000' | '2020-04-27 17:10:00.000'
'51.654321' | '-1.654321' | '2020-04-27 17:10:00.000' | '2020-04-27 17:25:00.000'
'51.444555' | '-1.444555' | '2020-04-27 17:25:00.000' | '2020-04-27 17:30:00.000'
'51.123456' | '-1.123456' | '2020-04-27 17:30:00.000' | '2020-04-27 17:45:00.000'
'51.444555' | '-1.444555' | '2020-04-27 17:45:00.000' | '2020-04-27 17:55:00.000'
'51.654321' | '-1.654321' | '2020-04-27 17:55:00.000' |

The idea is to group by 'locations' and a new location is defined as being when the coordinates change from the previously reported location. If the coordinates are the same as an earlier location then that is still a new location. The third column is the timestamp when the updated location was first reported and the fourth is the timestamp when the next location was first reported. I can get part-way there with this:

SELECT
	latitude,
	longitude,
	generated_at
FROM (
	SELECT l.*,
	LAG (latitude) OVER w AS lagLatitude,
	LAG (longitude) OVER w AS lagLongitude
	from locations l
	WINDOW w as (ORDER BY generated_at)	
) x WHERE
	latitude is distinct from lagLatitude
or longitude is distinct from lagLongitude;

That will give me the first three columns I need but not the last one where the timestamp is from the next record. Any help at this point would be wonderful because I am now just going round and round in circles and think it can be done but have no idea how.

sgaw (101 rep)

Apr 27, 2020, 04:51 PM • Last activity: Jun 12, 2025, 07:06 AM

0 votes

1 answers

381 views

Possible to GROUP BY across different columns in mariadb/mysql

mysql mariadb group-by gaps-and-islands

I have a table full of events similar to this: ``` `date` | start_time | end_time | length -------------------------------------------- 2022-01-10 | 07:00:00 | 16:00:00 | 9 2022-01-11 | 07:00:00 | 16:00:00 | 9 2022-01-11 | 16:00:00 | 20:00:00 | 5 2022-01-12 | 07:00:00 | 12:00:00 | 5 2022-01-12 | 16:...

I have a table full of events similar to this:

date     | start_time | end_time | length 
--------------------------------------------
2022-01-10 | 07:00:00   | 16:00:00 |      9
2022-01-11 | 07:00:00   | 16:00:00 |      9
2022-01-11 | 16:00:00   | 20:00:00 |      5
2022-01-12 | 07:00:00   | 12:00:00 |      5
2022-01-12 | 16:00:00   | 20:00:00 |      5

I want to group by contiguous events on each date. See how on 2022-01-11 there are two entries, one ending at the same time the next one starts? (but not on 12th) I'd like to get something like this:

select date, SUM(length) from events
group by date, ????

date     | length
--------------------
2022-01-10 |      9
2022-01-11 |     14
2022-01-12 |      5
2022-01-12 |      5

Is this even possible?

MrWainscotting (1 rep)

Jan 7, 2022, 12:36 AM • Last activity: Jan 12, 2025, 07:07 AM

2 votes

2 answers

487 views

How to select blocks with sequential data and aggregate the ids

postgresql postgresql-9.5 window-functions gaps-and-islands

I have the following table: id(int) startDate(timestamp) endDate(timestamp) plan_id(int) planned(bool) machine(int) -------------------------------------------------------------------------------------------------------------- 2005 '2019-01-16 08:29:24.872736' '2019-01-16 08:30:23.529706' 34 true 6...

                                  I have the following table:

     id(int) startDate(timestamp)         endDate(timestamp)            plan_id(int)  planned(bool)  machine(int)
    --------------------------------------------------------------------------------------------------------------
      2005  '2019-01-16 08:29:24.872736'  '2019-01-16 08:30:23.529706'    34          true              6
      2004  '2019-01-16 08:19:28.011148'  '2019-01-16 08:29:22.680828'    34          true              6
      2003  '2019-01-16 08:18:27.074312'  '2019-01-16 08:19:25.753475'    34          true              6
      2002  '2019-01-16 08:08:30.206288'  '2019-01-16 08:18:24.856308'    34          true              6
      2001  '2019-01-16 08:07:29.163124'  '2019-01-16 08:08:27.949013'    34          true              6
      2000  '2019-01-16 07:59:03.221309'  '2019-01-16 08:00:14.654391'    null        false             7
      1999  '2019-01-16 08:00:00.986367'  '2019-01-16 08:00:03.221309'    null        false             6
      1998  '2019-01-16 07:57:30.711044'  '2019-01-16 07:59:58.778444'    null        false             6
      1997  '2019-01-16 07:56:32.466508'  '2019-01-16 07:57:28.489287'    null        false             6
      1996  '2019-01-16 07:50:06.887349'  '2019-01-16 07:56:30.237725'    null        false             6
      1995  '2019-01-16 07:46:34.327582'  '2019-01-16 07:50:04.619592'    33          true              6
      1994  '2019-01-16 07:45:33.813483'  '2019-01-16 07:46:32.014849'    33          true              6
      1993  '2019-01-16 07:24:39.267365'  '2019-01-16 07:39:23.786911'    null        false             6
      1992  '2019-01-16 07:23:39.646218'  '2019-01-16 07:24:37.093414'    null        false             6
      1991  '2019-01-16 07:13:41.166337'  '2019-01-16 07:23:37.403375'    null        false             6
      1990  '2019-01-16 07:12:39.961234'  '2019-01-16 07:13:38.907838'    null        false             6
      1989  '2019-01-16 07:10:46.984236'  '2019-01-16 07:12:37.647108'    null        false             6
      1988  '2019-01-15 17:05:59.832834'  '2019-01-15 17:08:21.603931'    31          true              6
      1987  '2019-01-15 17:04:59.567046'  '2019-01-15 17:05:57.565188'    31          true              6
      1986  '2019-01-15 17:00:01.411266'  '2019-01-15 17:10:57.255158'    31          true              7

I have to select the IDs of the blocks of unplanned records for a specific machine. I have been trying using window function, unfortunately, I couldn't work out the logic of it!

The problem here is that since we have different machines, we cannot rely on sequential ids, just that the endDate of a sequence is very close to next startDate (it is ok to set a tolerance constant e.g. 3 seconds).

I would like to have a query where the result would be: the min startDate, the max endDate and the IDs of the block. For this sample with machine = 6, it would be:

    blockStartDate                blockEndDate                  ids
    -------------------------------------------------------------------------------
    "2019-01-16 07:50:06.887349" "2019-01-16 08:00:03.221309" [1999,1998,1997,1996]
    "2019-01-16 07:10:46.984236" "2019-01-16 07:39:23.786911" [1989,1990,1991,1992,1993]

Note that the answer, in this case, has sequential IDs but this is not always the case. I am working on providing real data where 2 machines are producing data at the same time and the ids become useless.
                                

Attyla Fellipe (23 rep)

Jan 17, 2019, 02:00 PM • Last activity: Nov 14, 2024, 01:44 AM

0 votes

0 answers

242 views

Calculate the sum of minutes between statuses Clickhouse

gaps-and-islands clickhouse

There is a table in ClickHouse that is constantly updated, format: ``` date_time | shop_id | item_id | status | balance --------------------------------------------------------------- 2022-09-09 13:00:01 | abc | 1234 | 0 | 0 2022-09-09 13:00:00 | abc | 1234 | 1 | 3 2022-09-09 12:50:00 | abc | 1234 |...

There is a table in ClickHouse that is constantly updated, format:

date_time           | shop_id | item_id | status | balance
---------------------------------------------------------------
2022-09-09 13:00:01 | abc     | 1234    | 0      | 0
2022-09-09 13:00:00 | abc     | 1234    | 1      | 3
2022-09-09 12:50:00 | abc     | 1234    | 1      | 10

The table stores statuses and balances for each item_id, when the balance is changed, a new record with status, time and balance is added. If the balance = 0, the status changes to 0. Need to calculate how much time (how many minutes) every item_id in the shop was available for the day. The status may change several times a day. Please help me calculate this.

Kirill_K (1 rep)

Sep 12, 2022, 09:59 AM • Last activity: Sep 3, 2024, 11:17 AM

-1 votes

1 answers

141 views

How to sum values within blocks with Postgres?

postgresql gaps-and-islands

Given this table called `prices`: | my_date | my_value | my_separator | |:---- |:------:| -----:| | 2012-02-10 | 100 | 1 | | 2012-02-11 | 120 | 1 | | 2012-02-12 | 120 | 1 | | 2012-02-13 | 110 | 1 | | 2012-02-14 | 100 | 0 | | 2012-02-15 | 115 | 0 | | 2012-02-16 | 130 | 1 | | 2012-02-17 | 190 | 1 | |...

                                  Given this table called prices:

| my_date | my_value | my_separator |
|:---- |:------:| -----:|
| 2012-02-10  | 100    | 1 |
| 2012-02-11  | 120    | 1 |
| 2012-02-12  | 120    | 1 |
| 2012-02-13  | 110    | 1 |
| 2012-02-14  | 100    | 0 |
| 2012-02-15  | 115    | 0 |
| 2012-02-16  | 130    | 1 |
| 2012-02-17  | 190    | 1 |
| 2012-02-18  | 180    | 1 |
| 2012-02-19  | 150    | 0 |
| 2012-02-20  | 130    | 1 |

I need to sum the values in my_value column based on the condition that my_separator is 1 within that group (so excluding the values where my_separator = 0). That should return the following values:

* for the date range 2012-02-10-2012-02-13 => 100 + 120 + 120 + 110 = 450
* for the date range 2012-02-16-2012-02-18 => 130 + 190 + 180 = 500
* for the date 2012-02-20 => 130

I tried using window functions but I could not come up with anything valid. Any help?
                                

Randomize (1203 rep)

Aug 2, 2024, 01:39 PM • Last activity: Aug 2, 2024, 06:21 PM

9 votes

2 answers

18867 views

Generate a series of dates for each group in a table

postgresql group-by gaps-and-islands

I have a `balances` table in PostgreSQL 9.3 that looks like this: ~~~pgsql CREATE TABLE balances ( user_id INT , balance INT , as_of_date DATE ); INSERT INTO balances (user_id, balance, as_of_date) VALUES (1, 100, '2016-01-03') , (1, 50, '2016-01-02') , (1, 10, '2016-01-01') , (2, 200, '2016-01-01')...

                                  I have a balances table in PostgreSQL 9.3 that looks like this:
    
~~~pgsql
CREATE TABLE balances (
  user_id INT
, balance INT
, as_of_date DATE
);

INSERT INTO balances (user_id, balance, as_of_date) VALUES
  (1, 100, '2016-01-03')
, (1,  50, '2016-01-02')
, (1,  10, '2016-01-01')
, (2, 200, '2016-01-01')
, (3,  30, '2016-01-03')
;
~~~

It only contains balances for dates that a user has made a transaction. I need it to contain a row for each user with their balance on each date in a given date range.

* If the user does not have a row for a given date in the range, I need to use their balance from the day before.
* If the user created their account after a given date in the range, I need to avoid creating a row for that user/date combination.

I can reference an accounts table to get users' create_date:

~~~pgsql
CREATE TABLE accounts (
  user_id INT
, create_date DATE
);

INSERT INTO accounts (user_id, create_date) VALUES
  (1, '2015-12-01')
, (2, '2015-12-31')
, (3, '2016-01-03')
;
~~~

My desired result looks like this:

    +---------+---------+--------------------------+
    | user_id | balance |        as_of_date        |
    +---------+---------+--------------------------+
    |       1 |     100 | 2016-01-03T00:00:00.000Z |
    |       1 |      50 | 2016-01-02T00:00:00.000Z |
    |       1 |      10 | 2016-01-01T00:00:00.000Z |
    |       2 |     200 | 2016-01-03T00:00:00.000Z |
    |       2 |     200 | 2016-01-02T00:00:00.000Z |
    |       2 |     200 | 2016-01-01T00:00:00.000Z |
    |       3 |      30 | 2016-01-03T00:00:00.000Z |
    +---------+---------+--------------------------+

Note that rows have been added for user 2 for 2016-01-02 and 2016-01-03, carrying over the previous balance from 2016-01-01; and that no rows have been added for user 3, who was created on 2016-01-03.

To generate a series of dates in a date range, I know I can use:

    SELECT d.date FROM GENERATE_SERIES('2016-01-01', '2016-01-03', '1 day'::INTERVAL) d

...but I am struggling with LEFT JOINing that series with each set of rows grouped by user_id.
                                

Shaun Scovil (341 rep)

Mar 25, 2016, 08:48 PM • Last activity: May 31, 2024, 03:29 AM

0 votes

1 answers

123 views

Total days on consecutive date records up to gap

sql-server gaps-and-islands

I have a classic Gaps and Islands issue whereby I need to total the time across the most recent records as far back as the first gap in dates Looking at , this provides a very neat solution to my question except that it does not work for sql server, there are syntax errors. Does anyone have an equivalent solution that works for sql server? The data is something like this | REF |TNCY-START| TNCY-END | |:----: |:--------:|:--------:| | 1 |2022-08-22| 2024-04-22| | 1 |2019-04-29| 2022-08-21| | 1 |2017-06-26| 2019-04-28| | 1 |2009-06-29| 2010-01-31| | 2 |2020-07-13| NULL | | 2 |2020-05-18| 2020-07-12| | 2 |2020-01-13| 2020-05-17| | 2 |2010-12-06| 2016-08-28| | 2 |2003-09-29| 2009-06-01| | 3 |2019-03-25| NULL | | 4 |2022-08-22| NULL | | 4 |2019-04-29| 2022-08-21| | 4 |2017-06-26| 2019-04-28| | 4 |2009-06-29| 2010-01-31|

create table #test ([PERSON-REF] INT, 
                   [TNCY-START] DATE,
                   [TNCY-END] DATE);
INSERT INTO #test VALUES


('1', '2022-08-22 00:00:00.000',	'2024-04-22 00:00:00.000'),
('1',	'2019-04-29 00:00:00.000',	'2022-08-21 00:00:00.000'),	
('1',	'2017-06-26 00:00:00.000',	'2019-04-28 00:00:00.000'),	
('1',	'2009-06-29 00:00:00.000',	'2010-01-31 00:00:00.000'),
('2',	'2020-07-13 00:00:00.000',	null),
('2',	'2020-05-18 00:00:00.000',	'2020-07-12 00:00:00.000'),
('2',	'2020-01-13 00:00:00.000',	'2020-05-17 00:00:00.000'),
('2',	'2010-12-06 00:00:00.000',	'2016-08-28 00:00:00.000'),
('2',	'2003-09-29 00:00:00.000',	'2009-06-01 00:00:00.000'),
('3',	'2019-03-25 00:00:00.000',	NULL),
('4',	'2022-08-22 00:00:00.000',	NULL),
('4',	'2019-04-29 00:00:00.000',	'2022-08-21 00:00:00.000'),
('4',	'2017-06-26 00:00:00.000',	'2019-04-28 00:00:00.000'),
('4',	'2009-06-29 00:00:00.000',	'2010-01-31 00:00:00.000')

SELECT * FROM #test;

and I want to end up with a single row per person ref containing person-ref, count of records after gap, start date after gap, latest end date (or null), days between. eg assuming UK date format and taking null as today | REF |COUNT OF TENANCIES |FIRST TNCY-START|LAST TNCY-END|TOTAL TENANCY DAYS| |:----:|:----------------:|:--------------:|:------------:|:----------------:| | 1 | 3 |2017-06-26 | 2024-04-22 |2492| | 2 | 3|2020-01-13|NULL|1562| |3|1|2019-03-25|NULL|1866| |4|3|2017-06-26|NULL|2493| Thanks for any help anyone can offer

Judy D (3 rep)

Apr 23, 2024, 02:33 PM • Last activity: Apr 24, 2024, 10:35 AM

0 votes

1 answers

1546 views

Find Consecutive Date in SQL using 2 date columns

date window-functions gaps-and-islands

I am trying to group by all consecutive patient admission to the table. If the patient re-admitted to the hospital next day( or within 24 hours), then I need to keep only one admission stay per patient. I appreciate your help. 0 |Patient_bed_id |Patientname| Hospital_In_date| Hospital_out_Date| |-|-...

                                  I am trying to group by all consecutive patient admission to the table. If the patient re-admitted to the hospital next day( or within 24 hours), then I need to keep only one admission stay per patient. I appreciate your help.

0


|Patient_bed_id |Patientname| Hospital_In_date| Hospital_out_Date|
|-|-|-|-|
|111| Lukas| 1/1/2022| 1/31/2022|
|111 | Lukas | 2/1/2022| 2/28/2022|
|111| Lukas| 3/1/2022| 3/31/2022|
|111| Lukas| 5/25/2022| 6/2/2022|
|111| Lukas| 8/1/2022| 8/20/2022|
|111| Lukas| 8/21/2022| 9/10/2022|
|222| Jason| 5/1/2022| 5/3/2022|
|222| Jason| 6/15/2022 | 7/11/2022|
|222| Jason| 7/12/2022 | 7/26/2022|
|222| Jason| 9/13/2022| 9/15/2022|
|222| Jason| 9/16/2022 |9/27/2022|

Final table

|Patient_bed_id| Patientname| Hospital_In_date| Hospital_out_Date|
|-|-|-|-|-
|111| Lukas| 1/1/2022| 3/31/2022|
|111| Lukas| 5/25/2022| 6/2/2022|
|111| Lukas| 8/1/2022| 9/10/2022|
|222| Jason| 5/1/2022| 5/3/2022|
|222| Jason| 6/15/2022| 7/26/2022|
|222| Jason | 9/13/2022| 9/27/2022|



                                

Jane (1 rep)

Jun 29, 2022, 03:28 AM • Last activity: Apr 23, 2024, 07:24 AM

0 votes

2 answers

2411 views

select row number increment on value change compared to previous return result

postgresql gaps-and-islands

I'm experimenting with Postgresql queries a bit and am trying to figure out if it's possible to write a query that would increment a row number counter when returned value changes. Example SOURCE (where SOURCE can be a sorted/unsorted sub-query/table): X|Y --- 0|0 0|1 0|0 1|0 1|0 1|1 2|0 Example inc...

                                  I'm experimenting with Postgresql queries a bit and am trying to figure out if it's possible to write a query that would increment a row number counter when returned value changes.

Example SOURCE (where SOURCE can be a sorted/unsorted sub-query/table):

    X|Y
    ---
    0|0
    0|1
    0|0
    1|0
    1|0
    1|1
    2|0

Example increment on X:
select X, Y, wishful_row_number(X) as Rn from SOURCE;

    0,0,1
    0,1,1
    0,0,1
    1,0,2
    1,0,2
    1,1,2
    2,0,3

(row number changes every time X changes)

Example increment on Y:
select X, Y, wishful_row_number(Y) as Rn from SOURCE;

    0,0,1
    0,1,2
    0,0,3
    1,0,3
    1,0,3
    1,1,4
    2,0,5

(row number changes every time Y changes - goes to a bigger number/different string even if that value was already seen before)

So the Rn increment is not dependent on any "order by" or source sorting but just on the previously returned row.

Can such a query be written (without store procedures and if possible without temporary tables)?

EDIT: And yes, I know tables have no inherent ordering which is exactly the point of why I am trying to skip the "force me to order by" step and bind the Rn row generation on returned values (see my reply comment below).

And yes, I understand that something like that might not be possible in SQL based on the "why would somebody in their right mind want to do that???" but as far as I'm concerned that's the SQL limitation and the answer is "no, it can't be done" even if some people have the fetish of down voting my question.
                                

Domen Vrankar (111 rep)

Mar 10, 2024, 11:54 PM • Last activity: Mar 11, 2024, 10:46 AM

1 votes

1 answers

60 views

Identify if the sequence of visits to an area id was interrupted at any point

postgresql order-by window-functions gaps-and-islands

I have a PostgreSQL table `df` with the following information: |area_id|trip_nr |timestamp| |-------|-----|----------------------| |a |3 |29/06/2022 17:18:03| |a |2 |29/06/2022 17:18:02| |a |1 |29/06/2022 17:18:01| |b |1 |28/06/2022 19:21:03| |c |3 |28/06/2022 19:21:02| |e |3 |28/06/2022 19:21:01| |...

                                  I have a PostgreSQL table df with the following information:

|area_id|trip_nr |timestamp|
|-------|-----|----------------------|
|a	|3	 |29/06/2022 17:18:03| 
|a	|2	 |29/06/2022 17:18:02| 
|a	|1	 |29/06/2022 17:18:01| 
|b	|1	 |28/06/2022 19:21:03| 
|c	|3	 |28/06/2022 19:21:02| 
|e	|3	 |28/06/2022 19:21:01| 
|e	|2	 |27/06/2022 17:36:03| 
|c	|2	 |27/06/2022 17:36:02| 
|c	|1	 |27/06/2022 17:36:01| 
|e	|1	 |27/06/2022 16:55:03| 

I would like to add a column continuous that checks if the trip_nr to a particular area_id follows chronologically from the first visit based on the order of timestamp.  

In other words, continuous=True if an area_id is visited for the nth time and the previous visits were to the same area_id all the way from n, n-1,...,1.


And, continuous=False if an area_id is visited for the n’th time, but a different area_id was visited somewhere in between the visits n, n-1,...,1 (i.e. the sequence of trip_nr’s is broken at some point).

An example of the desired output:

|area_id|trip_nr |timestamp|continous|
|-------|-----|----------------------|-----|
|a	|3	 |29/06/2022 17:18:03| True|
|a	|2	 |29/06/2022 17:18:02| True|
|a	|1	 |29/06/2022 17:18:01| True|
|b	|1	 |28/06/2022 19:21:03|True |
|c	|3	 |28/06/2022 19:21:02| False|
|e	|3	 |28/06/2022 19:21:01| False|
|e	|2	 |27/06/2022 17:36:03| False|
|c	|2	 |27/06/2022 17:36:02| True|
|c	|1	 |27/06/2022 17:36:01|True |
|e	|1	 |27/06/2022 16:55:03| True|

How can I achieve this optimally? Clickhouse/SQL answers also welcome.

Note that the timestamp is not necessarily stored in order. 
                                

Ruan (35 rep)

Mar 3, 2024, 01:36 PM • Last activity: Mar 3, 2024, 11:52 PM

7 votes

2 answers

4524 views

Carry over long sequence of missing values with Postgres

postgresql count null window-functions gaps-and-islands

I have a table like this: create table foo (foo_label text, foo_price int, foo_date date); insert into foo ( values ('aaa', 100, '2017-01-01'), ('aaa', NULL, '2017-02-01'), ('aaa', NULL, '2017-03-01'), ('aaa', NULL, '2017-04-01'), ('aaa', 140, '2017-05-01'), ('aaa', NULL, '2017-06-01'), ('aaa', 180,...

                                  I have a table like this:

    create table foo (foo_label text, foo_price int, foo_date date);
    
    insert into foo (
              values
              ('aaa', 100,  '2017-01-01'),
              ('aaa', NULL, '2017-02-01'),
              ('aaa', NULL, '2017-03-01'),
              ('aaa', NULL, '2017-04-01'),
              ('aaa', 140,  '2017-05-01'),
              ('aaa', NULL, '2017-06-01'),
              ('aaa', 180,  '2017-07-01')
            );

As you can see a few values on the foo_price column are missing.

What I need is that missing values get filled up with the "previous" available value in this way:

     foo_label | fixed_foo_price | foo_date
    -----------+-----------------+------------
     aaa       | 100             | 2017-01-01
     aaa       | 100             | 2017-02-01
     aaa       | 100             | 2017-03-01
     aaa       | 100             | 2017-04-01
     aaa       | 140             | 2017-05-01
     aaa       | 140             | 2017-06-01
     aaa       | 180             | 2017-07-01

**My attempt:**

    select 
    	foo_label, 
    	(case when foo_price is null then previous_foo_price else foo_price end) as fixed_foo_price,
        foo_date
    from (
      select 
          foo_label, 
          lag(foo_price) OVER (PARTITION BY foo_label order by foo_date::date) as previous_foo_price, 
          foo_price,
          foo_date
          from foo
    ) T;

As you can see from here:

https://www.db-fiddle.com/#&togetherjs=s6giIonUxT 

It doesn't fill completely the '100' series.

Any idea how can I get the wanted result?
                                

Randomize (1203 rep)

Sep 18, 2017, 02:24 PM • Last activity: Jan 3, 2024, 10:24 AM

1 votes

1 answers

353 views

SELECT / DELETE rows where consecutive null value count is greater than N

postgresql delete count window-functions gaps-and-islands

I have a table of the following structure, where all columns except the last are non-null: | ID | receiver | send_time | recv_time | | -- | -------- | --------- | --------- | | 1 | A | 00:00:00 | 00:00:01 | | 2 | A | 00:00:01 | NULL | | 3 | A | 00:00:02 | NULL | | 4 | A | 00:00:03 | NULL | | 5 | A | 00:00:04 | 00:00:05 | | 6 | B | 00:00:00 | 00:00:01 | | 7 | B | 00:00:01 | NULL | | 8 | B | 00:00:02 | 00:00:03 | | 9 | A | 00:00:05 | NULL | | 10 | B | 00:00:03 | NULL | | 11 | A | 00:00:06 | 00:00:07 | I want to select (and ultimately delete, but I think selecting would be the first step) all rows, where the recv_time column is NULL for *n or more* consecutive rows, partitioned by the receiver column and ordered by the send_time column. So from the above example, if *n=2*, I want to select/delete the rows with ID 2, 3 and 4, but not rows 7, 9, or 10. I think a count with partition is needed, but I can't figure out how to "reset" the count of null values as soon as a non-null row is encountered. This is what I've got so far:

SELECT 
    id, receiver, send_time, recv_time, 
    COUNT(CASE WHEN recv_time IS NULL THEN 1 END) OVER (PARTITION BY receiver ORDER BY send_time)
FROM test 
ORDER BY id

which gives me: | ID | receiver | send_time | recv_time | count | | -- | -------- | --------- | --------- | ----- | | 1 | A | 00:00:00 | 00:00:01 | 0 | | 2 | A | 00:00:01 | NULL | 1 | | 3 | A | 00:00:02 | NULL | 2 | | 4 | A | 00:00:03 | NULL | 3 | | 5 | A | 00:00:04 | 00:00:05 | 3 | | 6 | B | 00:00:00 | 00:00:01 | 0 | | 7 | B | 00:00:01 | NULL | 1 | | 8 | B | 00:00:02 | 00:00:03 | 1 | | 9 | A | 00:00:05 | NULL | 4 | | 10 | B | 00:00:03 | NULL | 2 | | 11 | A | 00:00:06 | 00:00:07 | 4 | But I want the count to reset at rows 5, 8, 11. I'm also not sure how I would proceed after that, because I want to capture *all* rows in the "group" of consecutive NULL rows, but obviously the count for row 2 in this example is smaller than *n*.

Edgxxar (13 rep)

Sep 16, 2023, 01:54 PM • Last activity: Sep 20, 2023, 12:23 AM

3 votes

1 answers

2517 views

Calculate Difference in Overlapping Time in PostgreSQL / SSRS

postgresql ssrs gaps-and-islands time

I'm using PostgreSQL 9.2 and SSRS 3.0. I'm trying to calculate the difference in non-overlapping time entries in a timesheet. My data looks like this: [![data][1]][1] I would like this to return: [![results][2]][2] I can get the overlapping minutes 635, but am having trouble with the not overlapping...

                                  I'm using PostgreSQL 9.2 and SSRS 3.0.  I'm trying to calculate the difference in non-overlapping time entries in a timesheet.

My data looks like this:

I would like this to return:

I can get the overlapping minutes 635, but am having trouble with the not overlapping, 605 minutes.

Rob Slone (31 rep)

Aug 21, 2015, 05:51 PM • Last activity: Sep 8, 2023, 10:25 PM

1 votes

1 answers

38 views

Merging wifi session data if time between them is less than 15 minutes

postgresql postgresql-9.4 gaps-and-islands greenplum

I am trying to process network logs and join sessions together if the time between them is less than 15 minutes. The relevant fields are start time, end time, mac address, and wifi access point. I am working in Greenplum 6.22/Postgresql 9.4.26: ``` pdap=# SELECT version(); ``` | version | | :------...

pdap=# SELECT version();

| version | | :------ | | PostgreSQL 9.4.26 (Greenplum Database 6.22.2) | [db fiddle](https://dbfiddle.uk/vOP-wLu8) Logically, what I want to do is "If the start time from the next row is less than 15 minutes after the end time from this row, merge the two rows into one row with the earlier start time and the later end time." Here is an example table with some data:

CREATE TABLE network_test
( start_ts TIMESTAMPTZ,
  end_ts TIMESTAMPTZ,
  mac_addr MACADDR,
  access_point VARCHAR
);

INSERT INTO network_test
VALUES
('2023-08-14 13:21:10.289'::timestamptz, '2023-08-14 13:31:20.855'::timestamptz, '00:00:00:00:00:01'::macaddr, 'access_point_01'),
('2023-08-14 13:58:10.638'::timestamptz, '2023-08-14 13:58:22.668'::timestamptz, '00:00:00:00:00:01'::macaddr, 'access_point_01'),
('2023-08-14 13:58:22.727'::timestamptz, '2023-08-14 13:58:38.966'::timestamptz, '00:00:00:00:00:01'::macaddr, 'access_point_01'),
('2023-08-14 13:28:28.190'::timestamptz, '2023-08-14 13:28:28.190'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_02'),
('2023-08-14 13:28:44.167'::timestamptz, '2023-08-14 13:28:44.288'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_02'),
('2023-08-14 13:45:40.281'::timestamptz, '2023-08-14 13:46:02.726'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_03'),
('2023-08-14 13:46:02.964'::timestamptz, '2023-08-14 13:46:10.783'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_03'),
('2023-08-14 13:46:11.026'::timestamptz, '2023-08-14 13:46:18.803'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_03'),
('2023-08-14 13:46:19.037'::timestamptz, '2023-08-14 13:46:26.798'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_03'),
('2023-08-14 13:46:27.036'::timestamptz, '2023-08-14 13:46:34.815'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_03'),
('2023-08-14 13:46:35.057'::timestamptz, '2023-08-14 13:46:46.980'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_03'),
('2023-08-14 13:46:47.213'::timestamptz, '2023-08-14 13:46:54.946'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_03'),
('2023-08-14 13:46:55.189'::timestamptz, '2023-08-14 13:47:17.040'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_03'),
('2023-08-14 13:47:17.297'::timestamptz, '2023-08-14 13:47:25.106'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_03'),
('2023-08-14 13:55:25.381'::timestamptz, '2023-08-14 13:58:33.059'::timestamptz, '00:00:00:00:00:02'::macaddr, 'access_point_03');

SELECT *
FROM network_test
ORDER BY mac_addr, access_point, start_ts

| start\_ts | end\_ts | mac\_addr | access\_point | | :---------|:-------|:---------|:-------------| | 2023-08-14 13:21:10.289+00 | 2023-08-14 13:31:20.855+00 | 00:00:00:00:00:01 | access\_point\_01 | | 2023-08-14 13:58:10.638+00 | 2023-08-14 13:58:22.668+00 | 00:00:00:00:00:01 | access\_point\_01 | | 2023-08-14 13:58:22.727+00 | 2023-08-14 13:58:38.966+00 | 00:00:00:00:00:01 | access\_point\_01 | | 2023-08-14 13:28:28.19+00 | 2023-08-14 13:28:28.19+00 | 00:00:00:00:00:02 | access\_point\_02 | | 2023-08-14 13:28:44.167+00 | 2023-08-14 13:28:44.288+00 | 00:00:00:00:00:02 | access\_point\_02 | | 2023-08-14 13:45:40.281+00 | 2023-08-14 13:46:02.726+00 | 00:00:00:00:00:02 | access\_point\_03 | | 2023-08-14 13:46:02.964+00 | 2023-08-14 13:46:10.783+00 | 00:00:00:00:00:02 | access\_point\_03 | | 2023-08-14 13:46:11.026+00 | 2023-08-14 13:46:18.803+00 | 00:00:00:00:00:02 | access\_point\_03 | | 2023-08-14 13:46:19.037+00 | 2023-08-14 13:46:26.798+00 | 00:00:00:00:00:02 | access\_point\_03 | | 2023-08-14 13:46:27.036+00 | 2023-08-14 13:46:34.815+00 | 00:00:00:00:00:02 | access\_point\_03 | | 2023-08-14 13:46:35.057+00 | 2023-08-14 13:46:46.98+00 | 00:00:00:00:00:02 | access\_point\_03 | | 2023-08-14 13:46:47.213+00 | 2023-08-14 13:46:54.946+00 | 00:00:00:00:00:02 | access\_point\_03 | | 2023-08-14 13:46:55.189+00 | 2023-08-14 13:47:17.04+00 | 00:00:00:00:00:02 | access\_point\_03 | | 2023-08-14 13:47:17.297+00 | 2023-08-14 13:47:25.106+00 | 00:00:00:00:00:02 | access\_point\_03 | | 2023-08-14 13:55:25.381+00 | 2023-08-14 13:58:33.059+00 | 00:00:00:00:00:02 | access\_point\_03 | Here is what I would like the result to be: | start\_ts | end\_ts | mac\_addr | access\_point | | :---------|:-------|:---------|:-------------| | 2023-08-14 13:21:10.289+00 | 2023-08-14 13:31:20.855+00 | 00:00:00:00:00:01 | access\_point\_01 | | 2023-08-14 13:58:10.638+00 | 2023-08-14 13:58:38.966+00 | 00:00:00:00:00:01 | access\_point\_01 | | 2023-08-14 13:28:28.19+00 | 2023-08-14 13:28:44.288+00 | 00:00:00:00:00:02 | access\_point\_02 | | 2023-08-14 13:45:40.281+00 | 2023-08-14 13:58:33.059+00 | 00:00:00:00:00:02 | access\_point\_03 | The first session stays as it is. The 2nd and 3rd sessions are merged into one because they have the same mac address and access point, and there is less than 15 minutes between them. The same happens for the 4th and 5th sessions, as well as the 6th through the 15th. I can come close using window functions:

SELECT DISTINCT
       MIN(start_ts) OVER (PARTITION BY mac_addr, access_point, ROUND(EXTRACT(EPOCH FROM start_ts)/900)) AS start_ts,
       MAX(end_ts) OVER (PARTITION BY mac_addr, access_point, ROUND(EXTRACT(EPOCH FROM end_ts)/900)) AS end_ts,
       mac_addr,
       access_point
FROM network_test
ORDER BY mac_addr, access_point, start_ts

lpscott (33 rep)

Sep 8, 2023, 02:15 PM • Last activity: Sep 8, 2023, 03:28 PM

2 votes

1 answers

634 views

T-SQL : Slicing time periods by related time periods

sql-server t-sql query gaps-and-islands

Consider the following table that gives the validity dates for which an `Item` is associated to something called a `Demand_Unit`, and a certain `Market` : | Item_ID | Start_Date | End_Date | Demand_Unit | Market | |---------|------------|------------|-------------|----------| | X | 2020-06-01 | 2020...

                                  Consider the following table that gives the validity dates for which an Item is associated to something called a Demand_Unit, and a certain Market :

| Item_ID | Start_Date |   End_Date | Demand_Unit |   Market |
|---------|------------|------------|-------------|----------|
|       X | 2020-06-01 | 2020-09-30 |           A |   GREECE |
|       X | 2020-11-01 | 2021-01-01 |           A |  BELGIUM |
|       X | 2021-01-01 | 2023-12-31 |           A |      USA |
|       Y | 2021-01-01 | 2021-02-01 |           B | PORTUGAL |
|       Y | 2021-02-01 | 2021-06-07 |           B |   FRANCE |
|       Y | 2021-06-07 | 2022-01-03 |           B |    ITALY |
|       Y | 2022-01-03 | 2023-12-31 |           B |    SPAIN |
|       Y | 2024-01-01 | 2025-12-31 |           B |  UKRAINE |
|       Z | 2021-05-01 | 2021-06-30 |           C |      USA |
|       W | 2021-02-01 | 2021-05-30 |           D |   SWEDEN |
|       W | 2023-12-31 | 2025-05-30 |           D | DANEMARK |
|       U | 2021-01-04 | 2022-02-07 |           E |  TUNISIA |
|       U | 2022-02-07 | 2025-01-05 |      (null) |  MOROCCO |


> Important note:   
> You can never have overlapping dates because basically when one Market closes another opens. But it is possible to have non-contiguous periods (like for Item X for example)

Now consider the following table that gives the Classifications of a Demand_Unit throughout time:

| Demand_Unit | Start_Date |   End_Date | Classification |
|-------------|------------|------------|----------------|
|           A | 2021-01-01 | 2021-04-05 |             N+ |
|           A | 2021-04-05 | 2023-12-31 |             K- |
|           B | 2021-03-08 | 2021-07-26 |             N+ |
|           B | 2021-07-26 | 2022-11-30 |             L- |
|           C | 2021-01-01 | 2023-12-31 |             N- |
|           D | 2021-06-01 | 2023-12-31 |             K+ |
|           E | 2021-08-02 | 2022-01-03 |             N+ |
|           E | 2022-01-03 | 2025-01-06 |             C- |

> Important note:   
> You can only have contiguous periods here. This table is extracted from a Slowly changing dimension.

The goal is to 'slice' the first table to associate to each Item its Classification, while taking into account the time periods. The result should look like this:

| Item_ID | Demand_Unit |   Market | Item_Classification_Dt_Begin | Item_Classification_Dt_End | Item_Classification |
|---------|-------------|----------|------------------------------|----------------------------|---------------------|
|       U |           E |  TUNISIA |                   2021-01-04 |                 2021-08-02 |              (null) |
|       U |           E |  TUNISIA |                   2021-08-02 |                 2022-01-03 |                  N+ |
|       U |           E |  TUNISIA |                   2022-01-03 |                 2022-02-07 |                  C- |
|       U |      (null) |  MOROCCO |                   2022-02-07 |                 2025-01-05 |              (null) |
|       W |           D |   SWEDEN |                   2021-02-01 |                 2021-05-30 |              (null) |
|       W |           D | DANEMARK |                   2023-12-31 |                 2023-12-31 |                  K+ |
|       W |           D | DANEMARK |                   2023-12-31 |                 2025-05-30 |              (null) |
|       X |           A |   GREECE |                   2020-06-01 |                 2020-09-30 |              (null) |
|       X |           A |  BELGIUM |                   2020-11-01 |                 2021-01-01 |              (null) |
|       X |           A |      USA |                   2021-01-01 |                 2021-04-05 |                  N+ |
|       X |           A |      USA |                   2021-04-05 |                 2023-12-31 |                  K- |
|       Y |           B | PORTUGAL |                   2021-01-01 |                 2021-02-01 |              (null) |
|       Y |           B |   FRANCE |                   2021-02-01 |                 2021-03-08 |              (null) |
|       Y |           B |   FRANCE |                   2021-03-08 |                 2021-06-07 |                  N+ |
|       Y |           B |    ITALY |                   2021-06-07 |                 2021-07-26 |                  N+ |
|       Y |           B |    ITALY |                   2021-07-26 |                 2022-01-03 |                  L- |
|       Y |           B |    SPAIN |                   2022-01-03 |                 2022-11-30 |                  L- |
|       Y |           B |    SPAIN |                   2022-11-30 |                 2023-12-31 |              (null) |
|       Y |           B |  UKRAINE |                   2024-01-01 |                 2025-12-31 |              (null) |
|       Z |           C |      USA |                   2021-05-01 |                 2021-06-30 |              (null) |
|       Z |           C |      USA |                   2021-05-01 |                 2021-06-30 |                  N- |

These are some examples with 'tricky' cases (Items X and Y) and a 'simple' case (Item Z) which has its dates completely included in the classifications period, + other cases to try and test possibilities I could think of...

I'm working on a solution that uses LAG() LEAD() and a three-part UNION, but I'm afraid I'm forgetting something and the original table has hundreds of thousands of rows. 

I'm not trying to have someone do my work here, I would just like your opinions on how should this be properly done.
Here's my solution so far, it looks like it works but I don't find it elegant and I'm sure the second and third UNION block can be simplified, but I just can't see straight anymore.

    ;WITH CLASSIFICATIONS_LAG_LEAD AS
    (
    SELECT
    Demand_Unit
    ,Start_Date
    ,End_Date
    ,Classification
    ,LAG(End_Date) OVER (PARTITION BY Demand_Unit ORDER BY Start_Date)		AS LAGD
    ,LEAD(Start_Date) OVER (PARTITION BY Demand_Unit ORDER BY Start_Date ) 	AS LEADD
    FROM Classifications
    )
    -- This first block handles Item periods that are not covered by classifications periods
    -- and/or the NULL classif portion of Item periods that are partially covered by classifications periods
    SELECT  
    T1.Item_ID
    ,T1.Demand_Unit	
    ,T1.Market
    ,CASE 
    	WHEN T2.End_Date > T1.Start_Date
             AND T2.End_Date  T1.Start_Date
             AND T2.Start_Date  T2.End_Date
    			)
    		)
    
    UNION
    -- The following block handles Items periods that are (at least partially) 
    -- contained within the Classification period, and checks which DT_BEGIN to use
    SELECT
    T1.Item_ID
    ,T1.Demand_Unit	
    ,T1.Market
    ,CASE 
         WHEN T1.Start_Date  T2.Start_Date
    
    UNION
    -- The following block handles Items periods that are (at least partially) 
    -- contained within the Classification period, and checks which DT_END to use
    SELECT
    T1.Item_ID
    ,T1.Demand_Unit	
    ,T1.Market
    ,T1.Start_Date                 AS Item_Classification_Dt_Begin
    ,CASE
          WHEN T1.End_Date > T2.End_Date THEN T2.End_Date
          ELSE T1.End_Date
    END                            AS Item_Classification_Dt_End
    ,T2.Classification             AS Item_Classification
    FROM Items T1
    INNER JOIN Classifications T2
    	ON T1.Demand_Unit = T2.Demand_Unit
        AND T1.Start_Date = T2.Start_Date
    
    ORDER BY Item_ID, 4

Here is a SQLFiddle 

Many thanks in advance

Edit : last minute fixes in my solution
                                

Yass T (33 rep)

Oct 27, 2021, 04:06 PM • Last activity: Jul 23, 2023, 12:02 AM

6 votes

2 answers

8382 views

Check if contiguous date interval exists

postgresql window-functions gaps-and-islands

I have a table in PostgreSQL which describes some events that have a start date and an end date: CREATE TABLE my_table ( event_id serial NOT NULL, start_date timestamp without time zone NOT NULL, end_date timestamp without time zone NOT NULL ) A single event may overlap the previous and the next one...

                                  I have a table in PostgreSQL which describes some events that have a start date and an end date: 

    CREATE TABLE my_table
    (
      event_id serial NOT NULL,
      start_date timestamp without time zone NOT NULL,
      end_date timestamp without time zone NOT NULL
    )

A single event may overlap the previous and the next one. In the following table the first 3 of 4 events form a contiguous time interval:

    1	'2015-04-02 22:09:03'	'2015-04-02 22:19:05'
    2	'2015-04-02 22:17:38'	'2015-04-02 22:27:38'
    3	'2015-04-02 22:25:21'	'2015-04-02 22:36:23'
    4	'2015-04-02 22:45:23'	'2015-04-02 22:55:23'

Is it possible to write a query that checks if a contiguous date interval between two given dates exists?

I would like to have something like:

    select ...
    from my_table
    where start_date > '2015-04-02' and end_date < '2015-04-06'
                                

hank (919 rep)

Apr 3, 2015, 12:06 PM • Last activity: Jul 14, 2023, 03:44 PM

0 votes

1 answers

2469 views

SQL (Postgres) Filling up missing rows with values from previous row and calculating on result set

postgresql window-functions gaps-and-islands

I have got this dataset: ``` id | block_number | value | name ----+--------------+-------+------ 2 | 47 | 100 | one 3 | 52 | 200 | one 4 | 58 | 120 | one 5 | 60 | 180 | one ```` created from ``` create table test (id serial, block_number int, value int, name varchar); ``` and ``` insert into test (b...

I have got this dataset:

id | block_number | value | name 
----+--------------+-------+------
  2 |           47 |   100 | one
  3 |           52 |   200 | one
  4 |           58 |   120 | one
  5 |           60 |   180 | one

` created from

create table test (id serial, block_number int, value int, name varchar);

and

insert into test (block_number, value, name) values (47, 100, 'one'), (52, 200, 'one'), (58, 120, 'one'), (60, 180, 'one');

I would like to fill up the rows (not actually add them to the table but just use them for further processing), so that the column 'block_number' shows up consecutively using the value of the previous row for 'value' like so:

block_number | value | name 
-------------+-------+------
          47 |   100 | one
          48 |   100 | one
          49 |   100 | one
          50 |   100 | one
          51 |   100 | one
          52 |   200 | one
          53 |   200 | one
          54 |   200 | one
          55 |   200 | one
          56 |   200 | one
          57 |   200 | one
          58 |   120 | one
          59 |   120 | one
          60 |   180 | one

Then I would like to multiply each of these rows with a given number and sum up the total of that, for example given number would be 2 then the result of the above would be 4,240. I haven't gotten very far with this, only thing I could come up with is a statement that generates the missing rows. However, I am failing to find out how I could set the value of each row to the one of the previous row:

SELECT x.block_number, x.value, t.value
           FROM (
              SELECT generate_series(min(block_number), max(block_number)) AS block_number
              FROM   my_table
               ) x
           LEFT   JOIN my_table t USING (block_number)
           ORDER  BY x.block_number;

Any help greatly appreciated. ### UPDATE: ### I tried to reconcile this with @mustaccio's hint here: https://dba.stackexchange.com/posts/327530/edit#:~:text=Carry%20over%20long%20sequence%20of%20missing%20values%20with%20Postgres and I came up with this:

SELECT block_number
     , first_value(value) OVER (PARTITION BY block_number, grp ORDER BY block_number) AS value
     
FROM  (
   SELECT 
        count(value) OVER (PARTITION BY block_number ORDER BY block_number) AS grp
        , value
        , block_number
   FROM   (
   
   SELECT x.block_number, t.value
           FROM (
              SELECT generate_series(min(block_number), max(block_number)) AS block_number
              FROM   test
               ) x
           LEFT   JOIN test t USING (block_number)
           ORDER  BY x.block_number
) test
   
   
   ) sub;

but it still doesn't get me the missing values:

block_number | value 
--------------+-------
           47 |   100
           48 |      
           49 |      
           50 |      
           51 |      
           52 |   200
           53 |      
           54 |      
           55 |      
           56 |      
           57 |      
           58 |   120
           59 |      
           60 |   180

Any suggestions? ### UPDATE 2 ### Thanks to @mustaccio's answer the first part is solved. I tried to amend the statement so that it does the calculation as mentioned above: Multiplying the 'value' in each row with a given number (here: 2) and then summing up all those values. What I tried:

SELECT SUM(a.value * 2)
FROM (

SELECT  block_number, value, first_value(value) OVER (PARTITION BY grp) 
FROM (
 SELECT 
  sum(CASE WHEN value IS NOT NULL THEN 1 END) OVER (ORDER BY block_number) AS grp
  , value
  , block_number
 FROM   (
   SELECT x.block_number, t.value
   FROM (
    SELECT generate_series(min(block_number), max(block_number)) AS block_number
    FROM   test
   ) x
   LEFT   JOIN test t USING (block_number)
 ) y
) z
) a;

However, that gives me a result of 1,200 instead of the expected 4,060. Any ideas?

Westcoaster (3 rep)

May 25, 2023, 03:50 PM • Last activity: May 26, 2023, 02:34 PM

0 votes

1 answers

427 views

Count the number of days an order has been at a specific status

sql-server query gaps-and-islands

I can't figure out how to write a query that returns the results I need. Below is a table that list in chronological order the status of an order on specific days and time. An order can change from one status to another but can also change back to a status it was at previously. How do I write a quer...

                                  I can't figure out how to write a query that returns the results I need.

Below is a table that list in chronological order the status of an order on specific days and time. 

An order can change from one status to another but can also change back to a status it was at previously.

How do I write a query that will give me the following for each time an order changes a status (minimum days is 1): 

    select doc_no, doc_type, line_no, status, DATEDIFF(day, '04/18/2023', '05/10/2023') AS   [number of days at this status before changing]

Data:


doc_no	|doc_type |	line_no	|status	|	updated date		    | time
--------|---------|---------|-------|---------------------------|-----
2353021	|	SA	|	21000	|	527	|	2023-02-17 00:00:00.000	|	100640
2353021	|	SA	|	21000	|	527	|	2023-03-16 00:00:00.000	|	114555
2353021	|	SA	|	21000	|	527	|	2023-03-23 00:00:00.000	|	101018
2353021	|	SA	|	21000	|	527	|	2023-03-23 00:00:00.000	|	145322
2353021	|	SA	|	21000	|	545	|	2023-03-23 00:00:00.000	|	145323
2353021	|	SA	|	21000	|	555	|	2023-03-23 00:00:00.000	|	145411
2353021	|	SA	|	21000	|	555	|	2023-04-14 00:00:00.000	|	113735
2353021	|	SA	|	21000	|	555	|	2023-04-14 00:00:00.000	|	115644
2353021	|	SA	|	21000	|	555	|	2023-04-14 00:00:00.000	|	115826
2353021	|	SA	|	21000	|	527	|	2023-04-14 00:00:00.000	|	122233
2353021	|	SA	|	21000	|	527	|	2023-04-14 00:00:00.000	|	190943
2353021	|	SA	|	21000	|	527	|	2023-04-18 00:00:00.000	|	32939
2353021	|	SA	|	21000	|	527	|	2023-04-18 00:00:00.000	|	161039
2353021	|	SA	|	21000	|	545	|	2023-04-18 00:00:00.000	|	161046
2353021	|	SA	|	21000	|	555	|	2023-04-18 00:00:00.000	|	161102
2353021	|	SA	|	21000	|	555	|	2023-04-18 00:00:00.000	|	163512
2353021	|	SA	|	21000	|	555	|	2023-05-09 00:00:00.000	|	134853
2353021	|	SA	|	21000	|	555	|	2023-05-10 00:00:00.000	|	140516

The results I am looking for:

doc_no | doc_type | line_no | status | number_of_days_at_status
-------|----------|---------|--------|-------------------------
2353021|  SA      | 2100    |  527   |     27
2353021|  SA      | 2100    |  545   |     1
2353021|  SA      | 2100    |  555   |     22
2353021|  SA      | 2100    |  527   |     4
2353021|  SA      | 2100    |  545   |     1
2353021|  SA      | 2100    |  555   |     22



                                

Kram_Koorbse (255 rep)

May 10, 2023, 07:53 PM • Last activity: May 12, 2023, 02:45 PM

Showing page 1 of 20 total questions