Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes

2 answers

451 views

postgresql cumulative counts in date range

postgresql group-by count date window-functions

I'm trying to get the cumulative count of rows (by `group_id`) between two dates that represent a time period where the row was active. I have a table like this: ``` group_id | id | type | start_date | end_date ----------+--------+------+------------+------------ 33 | 119435 | AAA | 2013-05-21 | 201...

I'm trying to get the cumulative count of rows (by group_id) between two dates that represent a time period where the row was active. I have a table like this:

group_id |   id   | type | start_date |  end_date
----------+--------+------+------------+------------
       33 | 119435 | AAA  | 2013-05-21 | 2014-05-19
       33 |  15144 | AAA  | 2013-05-21 | 2015-05-18
       33 |  29393 | AAA  | 2013-05-21 | 2016-05-23
       33 | 119437 | AAA  | 2013-05-21 | 2017-05-15
       33 |  62380 | AAA  | 2013-05-21 | 2099-12-31
       33 | 119436 | AAA  | 2013-05-21 | 2099-12-31
       33 |  27346 | AAA  | 2013-05-21 | 2099-12-31
       33 |  28529 | AAA  | 2014-05-20 | 2099-12-31
       33 | 221576 | AAA  | 2015-05-19 | 2099-12-31
       33 | 253893 | AAA  | 2016-05-24 | 2099-12-31
       33 | 251589 | AAA  | 2017-05-16 | 2099-12-31
       33 | 285245 | AAA  | 2019-01-24 | 2099-12-31
       34 | 253893 | AAA  | 2016-05-24 | 2099-12-31
       34 | 251589 | AAA  | 2017-05-16 | 2099-12-31
       34 | 285245 | AAA  | 2019-01-24 | 2099-12-31
       34 | 285246 | AAA  | 2019-05-31 | 2099-12-31

... and I need to get active counts for each of those date ranges like this:

group_id | start_date |  end_date  | active
----------+------------+------------+--------
       33 | 2013-05-21 | 2014-05-19 |      7
       33 | 2013-05-21 | 2015-05-18 |      8
       33 | 2013-05-21 | 2016-05-23 |      9
       33 | 2013-05-21 | 2017-05-15 |     10
       33 | 2013-05-21 | 2099-12-31 |     12
       33 | 2013-05-21 | 2099-12-31 |     12
       33 | 2013-05-21 | 2099-12-31 |     12
       33 | 2014-05-20 | 2099-12-31 |     11
       33 | 2015-05-19 | 2099-12-31 |     10
       33 | 2016-05-24 | 2099-12-31 |      9
       33 | 2017-05-16 | 2099-12-31 |      8
       33 | 2019-01-24 | 2099-12-31 |      8
       34 | 2016-05-24 | 2099-12-31 |      1
       34 | 2017-05-16 | 2099-12-31 |      2
       34 | 2019-01-24 | 2099-12-31 |      3
       34 | 2019-05-31 | 2099-12-31 |      4

I've tried various combinations of LAG and LEAD, with and without CTEs, but cannot come up with a solution. Is there a way to do this in a single query? If not a single query, perhaps a combination of queries in a UDF? **UPDATE** Per @jjanes comment below, I believe my source table is setup incorrectly. I think I should create the source table like this instead:

group_id |   id   | type | start_date |  end_date
----------+--------+------+------------+------------
  ... (skipped group 33) ...
       34 | 253893 | AAA  | 2016-05-24 | 2017-05-15
       34 | 253893 | AAA  | 2017-05-16 | 2019-01-23
       34 | 253893 | AAA  | 2019-01-24 | 2019-05-30
       34 | 253893 | AAA  | 2019-05-31 | 2099-12-31
       34 | 251589 | AAA  | 2017-05-16 | 2019-01-23
       34 | 251589 | AAA  | 2019-01-24 | 2019-05-30
       34 | 251589 | AAA  | 2019-05-31 | 2099-12-31
       34 | 285245 | AAA  | 2019-01-24 | 2019-05-30
       34 | 285245 | AAA  | 2019-05-31 | 2099-12-31
       34 | 285246 | AAA  | 2019-05-31 | 2099-12-31

With that change in the source data, the outcome of actives (showing only group 34 here) would be like this:

group_id | start_date |  end_date  | active
----------+------------+------------+--------
       34 | 2016-05-24 | 2017-05-15 |      1
       34 | 2017-05-16 | 2019-01-23 |      2
       34 | 2019-01-24 | 2019-05-30 |      3
       34 | 2019-05-31 | 2099-12-31 |      4

jacaetevha (1 rep)

Mar 21, 2020, 08:20 PM • Last activity: Jun 9, 2025, 09:01 AM

0 votes

1 answers

683 views

Postgresql: partition by number of rows after the current one

postgresql aggregate window-functions

I need to calculate correlation for each say 3 consecutive rows. For example, assume there is a table with the following data. create table a ( asset_id int, time int not null, value decimal not null ); insert into a values (1,1,30),(1,2,45),(1,3,30),(1,4,30),(1,5,30), (2,1,30),(2,2,40),(2,3,30),(2,...

                                  I need to calculate correlation for each say 3 consecutive rows.

For example, assume there is a table with the following data.

    create table a (
    	asset_id int,
    	time int not null,
    	value decimal not null
    );
    insert into a values
    (1,1,30),(1,2,45),(1,3,30),(1,4,30),(1,5,30),
    (2,1,30),(2,2,40),(2,3,30),(2,4,25),(2,5,25);


    # select row_number() over() as r, a.asset_id, b.asset_id, a.time, a.value, b.value
    from a join a b on a.time=b.time and a.asset_id
                              
                            

yaugenka (455 rep)

Feb 5, 2022, 12:32 PM • Last activity: Jun 2, 2025, 08:05 PM

1 votes

1 answers

332 views

How to force Redshift/Postgres to aggregate nth_value?

postgresql window-functions redshift

I'm trying to retrieve the 2nd, 3rd, 4th order/transaction from an orders table using a query that looks like this: select customer_email, nth_value(created_at, 1) over (partition by customer_email), nth_value(created_at, 2) over (partition by customer_email), nth_value(created_at, 3) over (partitio...

                                  I'm trying to retrieve the 2nd, 3rd, 4th order/transaction from an orders table using a query that looks like this:

    select 
        customer_email,
    	nth_value(created_at, 1) over (partition by customer_email),
    	nth_value(created_at, 2) over (partition by customer_email),
    	nth_value(created_at, 3) over (partition by customer_email),
    	nth_value(created_at, 4) over (partition by customer_email)
    from 
    	fact_orders
    where
    group by 
    	customer_email,
    	created_at
    limit 100


My goal is to get the 1st, 2nd, 3rd, and 4th order for each customer.
I'm forced to do a group by on customer_email and created_at, which results in a row for each order and date by a customer.

The nth_value yields the correct results, but I would prefer a result set that looks as so:

    customer_email, first_order, second_order, third_order, fourth_order

With only 1 record per customer.

How can I achieve this?
                                

john (153 rep)

Jul 10, 2018, 04:43 AM • Last activity: Apr 21, 2025, 11:03 AM

1 votes

1 answers

2674 views

Rolling sum with a floor in Athena/Presto

aggregate window-functions amazon-presto

In Athena, I want to calculate a rolling sum, over a window, that can't go below 0. For example, if summing up a column that has values ```(1, 2, -1, -2, -1, -2, 1, 2)``` I should get ```(1, 3, 2, 0, 0, 0, 1, 3)```. Without the floor constraint, it's easy - ```SUM(X) OVER (PARTITION BY some_group OR...

In Athena, I want to calculate a rolling sum, over a window, that can't go below 0. For example, if summing up a column that has values

(1, 2, -1, -2, -1, -2, 1, 2)

I should get

(1, 3, 2, 0, 0, 0, 1, 3)

. Without the floor constraint, it's easy -

(X) OVER (PARTITION BY some_group ORDER BY ordering_col)

. Or if we just wanted to clip all the values at 0 after doing an actual cumulative sum. But I cannot figure out how to use window functions to achieve my desired result in general, though I can make it work for some special cases. Is this even possible?

Max Gibiansky (111 rep)

Mar 17, 2022, 06:15 PM • Last activity: Apr 7, 2025, 01:09 PM

0 votes

1 answers

470 views

How do I calculate a streak of data and show the data that corresponds to it on MySQL?

mysql cte window-functions

I'm currently working on a project with NBA stats and I want to check a player's longest streak of his highest amount of points on MySQL. I'm open to a 5-10 game streak to simplify it. For example: Kevin Durant scores 30, 32, 35, 36, 41 in 5 consecutive games and that's his highest one, how would I...

                                  I'm currently working on a project with NBA stats and I want to check a player's longest streak of his highest amount of points on MySQL. I'm open to a 5-10 game streak to simplify it. 

For example: Kevin Durant scores 30, 32, 35, 36, 41 in 5 consecutive games and that's his highest one, how would I go about extracting the game stats, points, the length of the streak AND showing the total amount of points during this streak all together.

I've tried using multiple CTEs with window functions, but I confuse myself as I'm doing it. Is there an easy or simple way to do this at all?

thisisforfun (1 rep)

Jul 18, 2023, 05:49 PM • Last activity: Feb 21, 2025, 03:04 PM

0 votes

2 answers

57 views

In PostgreSQL how to get the hourly minimum of a value using a window function?

postgresql window-functions

Let's assume we have the following data with one temperature measurement per second. The span of data is several years so there is a considerable number of rows. It is measurement data from a field device. The table has other columns as well but they are not relevant to the question. | Time | Temperature_deg_in_C | | ---- | ------------------- | | 2024-11-01 00:00:00+00 | 20.1 | | 2024-11-01 00:00:01+00 | 21.2 | | 2024-11-01 00:00:02+00 | 21.6 | | 2024-11-01 00:00:03+00 | 20.2 | | ... | ... | | 2026-12-31 23:59:57+00 | 25.4 | | 2026-12-31 23:59:58+00 | 25.2 | | 2026-12-31 23:59:59+00 | 25.6 | I know that I can use a GROUP BY clause to get the averages over one hour windows between a specific start and end times like so:

SELECT to_char("time", 'YYYY-MM-DD HH'), AVG("Temperature_deg_in_C")
    FROM "measurements"
   WHERE "time" BETWEEN '2024-11-01' AND '2024-12-01'
GROUP BY to_char("time", 'YYYY-MM-DD HH')

However, this syntax feels like a hack, and it doesn't quite bend to e.g. 10 second or other more complex window frames (or partitions). Furthermore, I would like to know, specifically, how this is done correctly and performantly using the OVER, WINDOW, PARTITION BY, RANGE, ROWS and similar tools outlined in [PostgreSQL manual](https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS) . **UPDATE:** Window functions cannot be used to achieve this goal on Postgres, as they do not create "buckets" or "frames" in the way I thought they did. GROUP BY is the way to achieve this. The end goal is to find a solution where the window length and unit of measurement (minutes, hours, days), and the start and end time of the observing period are flexible but the query body itself stays the same. A parameterized query that is simple to understand, that is.

Antti Keskinen (3 rep)

Nov 28, 2024, 03:23 PM • Last activity: Nov 29, 2024, 05:52 AM

25 votes

5 answers

80772 views

Using DISTINCT in window function with OVER

sql-server sql-server-2014 window-functions

I'm trying to migrate a query from Oracle to SQL Server 2014. Here is my query which works great in Oracle: select count(distinct A) over (partition by B) / count(*) over() as A_B from MyTable Here is the error i got after tried to run this query in SQL Server 2014. Use of DISTINCT is not allowed wi...

                                  I'm trying to migrate a query from Oracle to SQL Server 2014.

Here is my query which works great in Oracle:

    select
    count(distinct A) over (partition by B) / count(*) over() as A_B
    from MyTable 

Here is the error i got after tried to run this query in SQL Server 2014.

    Use of DISTINCT is not allowed with the OVER clause

Anyone know what is the problem? Is such as kind of query possible in SQL Server?
Please advise.
                                

Omri (383 rep)

Jan 11, 2015, 09:21 AM • Last activity: Nov 20, 2024, 07:34 AM

2 votes

2 answers

487 views

How to select blocks with sequential data and aggregate the ids

postgresql postgresql-9.5 window-functions gaps-and-islands

I have the following table: id(int) startDate(timestamp) endDate(timestamp) plan_id(int) planned(bool) machine(int) -------------------------------------------------------------------------------------------------------------- 2005 '2019-01-16 08:29:24.872736' '2019-01-16 08:30:23.529706' 34 true 6...

                                  I have the following table:

     id(int) startDate(timestamp)         endDate(timestamp)            plan_id(int)  planned(bool)  machine(int)
    --------------------------------------------------------------------------------------------------------------
      2005  '2019-01-16 08:29:24.872736'  '2019-01-16 08:30:23.529706'    34          true              6
      2004  '2019-01-16 08:19:28.011148'  '2019-01-16 08:29:22.680828'    34          true              6
      2003  '2019-01-16 08:18:27.074312'  '2019-01-16 08:19:25.753475'    34          true              6
      2002  '2019-01-16 08:08:30.206288'  '2019-01-16 08:18:24.856308'    34          true              6
      2001  '2019-01-16 08:07:29.163124'  '2019-01-16 08:08:27.949013'    34          true              6
      2000  '2019-01-16 07:59:03.221309'  '2019-01-16 08:00:14.654391'    null        false             7
      1999  '2019-01-16 08:00:00.986367'  '2019-01-16 08:00:03.221309'    null        false             6
      1998  '2019-01-16 07:57:30.711044'  '2019-01-16 07:59:58.778444'    null        false             6
      1997  '2019-01-16 07:56:32.466508'  '2019-01-16 07:57:28.489287'    null        false             6
      1996  '2019-01-16 07:50:06.887349'  '2019-01-16 07:56:30.237725'    null        false             6
      1995  '2019-01-16 07:46:34.327582'  '2019-01-16 07:50:04.619592'    33          true              6
      1994  '2019-01-16 07:45:33.813483'  '2019-01-16 07:46:32.014849'    33          true              6
      1993  '2019-01-16 07:24:39.267365'  '2019-01-16 07:39:23.786911'    null        false             6
      1992  '2019-01-16 07:23:39.646218'  '2019-01-16 07:24:37.093414'    null        false             6
      1991  '2019-01-16 07:13:41.166337'  '2019-01-16 07:23:37.403375'    null        false             6
      1990  '2019-01-16 07:12:39.961234'  '2019-01-16 07:13:38.907838'    null        false             6
      1989  '2019-01-16 07:10:46.984236'  '2019-01-16 07:12:37.647108'    null        false             6
      1988  '2019-01-15 17:05:59.832834'  '2019-01-15 17:08:21.603931'    31          true              6
      1987  '2019-01-15 17:04:59.567046'  '2019-01-15 17:05:57.565188'    31          true              6
      1986  '2019-01-15 17:00:01.411266'  '2019-01-15 17:10:57.255158'    31          true              7

I have to select the IDs of the blocks of unplanned records for a specific machine. I have been trying using window function, unfortunately, I couldn't work out the logic of it!

The problem here is that since we have different machines, we cannot rely on sequential ids, just that the endDate of a sequence is very close to next startDate (it is ok to set a tolerance constant e.g. 3 seconds).

I would like to have a query where the result would be: the min startDate, the max endDate and the IDs of the block. For this sample with machine = 6, it would be:

    blockStartDate                blockEndDate                  ids
    -------------------------------------------------------------------------------
    "2019-01-16 07:50:06.887349" "2019-01-16 08:00:03.221309" [1999,1998,1997,1996]
    "2019-01-16 07:10:46.984236" "2019-01-16 07:39:23.786911" [1989,1990,1991,1992,1993]

Note that the answer, in this case, has sequential IDs but this is not always the case. I am working on providing real data where 2 machines are producing data at the same time and the ids become useless.
                                

Attyla Fellipe (23 rep)

Jan 17, 2019, 02:00 PM • Last activity: Nov 14, 2024, 01:44 AM

1 votes

1 answers

3363 views

(Athena/Presto) Window function (lag) is not reading the previous lag row

window-functions amazon-presto

I have two tables, ```credit_memos``` and ```credit_memo_items``` and want to show each event individually and will thus do a insert (doing union in this case which is logically the same). I know the key for both tables and am doing a union with them while sharing the same partition key ```credit_me...

I have two tables,

and

and want to show each event individually and will thus do a insert (doing union in this case which is logically the same). I know the key for both tables and am doing a union with them while sharing the same partition key

that I'll be using for the window. I'm trying to use lag to fill up the rest of the null rows as shown.

The code for the second column looks as follows:

(invoice_nk, lag(invoice_nk, 1, '') over (partition by cm_dd order by event_at)) as invoice_nk

For some reason the third row shown in the picture is not able to interpret the previous lagged row and insert a value its row as a result. I presume it may have to do something where Athena does not actually execute the lag function recursively and takes the previous row in each partition literally and doesn't wait for the previous rows to get executed. Is there a way to grab values recursively as I'm trying to do? Thanks. Update: Here's the full query:

with
  cme as(
    with
      cm as(
        select
    --      change_log
          row_number() over (partition by id order by updated_at desc) r
        , cast(cm.invoice_id as varchar(99))                         as invoice_nk
        , cast(cm.id as varchar(99))                                 as cm_dd
        , cast(cm.created_at as timestamp)                           as cm_created_at
        , cast(cm.updated_at as timestamp)                           as cm_updated_at
        from "postgres-replica-parquet".credit_memos cm
      )
    select
      case when r = 1 then cm_created_at
           else cm_updated_at end as cm_begin_at
    , null as cmi_begin_at
    , invoice_nk
    , cm_dd
    , cm_created_at
    , cm_updated_at
    from cm
  )
, cmi as(
    with
      ci as(
        select
    --      ci.change_log
          row_number() over (partition by id order by updated_at desc) r
        , cast(ci.credit_memo_id as varchar(99)) as cm_id
        , cast(ci.created_at as timestamp)       as cmi_created_at
        , cast(ci.updated_at as timestamp)       as cmi_updated_at
        from "postgres-replica-parquet".credit_memo_items ci
      )
    select -- row 1 is the snapshot when cdc first started
      null as cm_begin_at
    , case when r = 1 then cmi_created_at
           else cmi_updated_at end as cmi_begin_at
    , null as invoice_nk
    , cm_id as cm_dd
    , null as cm_created_at
    , null as cm_updated_at
    from ci
  )
, combined as(
    with
      all as(
        select * from cme
        union select * from cmi
      )
    select
      greatest(
        coalesce(cm_begin_at, cast(date_parse('1970-01-01', '%Y-%m-%d') as timestamp))
      , coalesce(cmi_begin_at, cast(date_parse('1970-01-01', '%Y-%m-%d') as timestamp))
      ) as event_at
    , *
    from all
  )
select
  event_at
, cm_begin_at as event_credit_memo_last_at
, cmi_begin_at as event_credit_memo_item_last_at
, coalesce(invoice_nk, lag(invoice_nk, 1, '') over (partition by cm_dd order by event_at)) as invoice_nk
, cm_dd as credit_memo_dd
from combined

And a snapshot of the full result:

Rows 3 and 12 should have the invoice_nk based on the credit_memo cm is the subquery to grab the credit memos cmi is the subquery to grab the credit memo items combined is just to precompute the order by key in the window functions Note that both are pulled from a database that has the full change log (CDC). The end goal is to make one row for each event, I'm defining an event for the final table to be any time either credit memo or credit memo item undergoes a change. I would like to be able to snapshot the state of a credit memo combined with the credit memo item by windowing at the

for reporting.

Eugene Choi (11 rep)

Mar 11, 2022, 03:11 PM • Last activity: Nov 13, 2024, 02:01 PM

0 votes

0 answers

27 views

Variable only resets per row when query has a static IN condition

mysql mysql-8.0 subquery window-functions

I have a complicated query. I have a bunch of users. I have a ledger of debits and credits. I want to filter a user's subscribers by how long the subscribers have been negative. To do that I find negative subscribers for a given user, get each subscriber's transactions in reverse order, and work backwards until the subscriber's balance is ≥ 0 to find how long ago that was. The query works only if I take the list of subscriber IDs the query returns and add them into the WHERE clause. ##### The Query

-- Return just the info we want
SELECT subscriber_id,
       balance,
       days_negative,
       DATE_SUB(CURRENT_DATE, INTERVAL days_negative DAY) AS days_since
FROM (
    -- Reset @balanced for each subscriber
    SELECT r.subscriber_id,
           balance,
           (@balanced := FALSE),
           (
               -- get date of last transaction before balance >= 0
               SELECT DATEDIFF(CURRENT_DATE, MIN(transaction_time))
               FROM (
                   -- work backwards until balance >= 0
                   SELECT *,
                          ROUND(balance, 2),
                          ROUND(balance - running_total, 2),
                          (@balanced := @balanced OR balance - running_total + amount >= 0) AS balance_reached
                   FROM (
                       -- get transactions with running total
                       SELECT t.transaction_id,
                              t.transaction_time,
                              t.subscriber_id,
                              t.amount,
                              t.category,
                              SUM(t.amount) OVER w AS running_total
                       FROM transactions t
                       WHERE t.subscriber_id = r.subscriber_id
                         AND t.void = 0
                       WINDOW w AS (ORDER BY transaction_time DESC, transaction_id DESC)
                   ) with_running_total
               ) with_balance_reached
               WHERE balance_reached = 0
           ) AS days_negative
    FROM subscribers r
             INNER JOIN user_subscriber_linker usl USING (subscriber_id)
             INNER JOIN subscriber_balance b USING (subscriber_id)
    WHERE usl.user_id = 12345
      AND r.subscriber_id IN (987, 654, 321)
      AND r.void = 0
      AND b.balance < 0
    GROUP BY subscriber_id
) with_each;

This query gives me a result like this: | renter\_id | balance | days\_negative | negative\_since | | :--- | :--- | :--- | :--- | | 987 | -5998.90 | 98 | 2024-08-01 | | 654 | -2000.00 | 62 | 2024-09-06 | | 321 | -3774.20 | 79 | 2024-08-20 | When I remove the line AND r.subscriber_id IN (987, 654, 321), I get this: | renter\_id | balance | days\_negative | negative\_since | | :--- | :--- | :--- | :--- | | 987 | -5998.90 | 98 | 2024-08-01 | | 654 | -2000.00 | null | null | | 321 | -3774.20 | null | null | The fact that the first row is correct leads me to believe @balanced isn't being set for each row. I tried using joins (JOIN (SELECT @balanced := FALSE)) to set the var but that hasn't worked either. How do I get the query to work?

tvanc (101 rep)

Nov 7, 2024, 07:57 PM • Last activity: Nov 7, 2024, 08:43 PM

1 votes

1 answers

765 views

Numbering rows consecutively for a number of tables

postgresql update scripting window-functions

I have a number of tables, all containing attributes `aid,bid,cid`, and `xid` of type integer, other attributes may differ. For every (given) table `T`, I would like to order the rows according to `aid,bid,cid` ascending and update column `xid` with incremented values starting with 0. What's the bes...

                                  I have a number of tables, all containing attributes aid,bid,cid, and xid of type integer, other attributes may differ. For every (given) table T, I would like to order the rows according to aid,bid,cid ascending and update column xid with incremented values starting with 0. What's the best way to achieve that?

My current solution involves:

 - selecting table T
 - openining a cursor over the ordered tuples of a table
 - assigning the autoincrement value to xid
 - inserting the tuple into a temporal table T_temp
 - deleting all records from T
 - inserting all records from T_temp into T

Since the tables have different schemas, I programmed half of the code in PL/pgSQL and half of the code with bash scripting.

**Question 1:** Any comments how to have it program that in pure PL/pgSQL?

**Question 2:** Any comments how to implement that more elegantly?
                                

arthur (888 rep)

Dec 13, 2013, 12:31 PM • Last activity: Nov 1, 2024, 02:19 AM

1 votes

1 answers

57 views

Rolling sum with partial value for last day

postgresql dynamic-sql window-functions

I can't figure out how to write a query that returns rolling sum by item based on rules in another table. Below is a table that list in chronological order the stock value of an item on specific days. Table 1: Stock | item | stock | date | |-------|-------|----------| | Blade | 10 | 1/3/2020 | | Bla...

                                  I can't figure out how to write a query that returns rolling sum by item based on rules in another table.

Below is a table that list in chronological order the stock value of an item on specific days.

Table 1: Stock

| item  | stock | date     |
|-------|-------|----------|
| Blade | 10    | 1/3/2020 |
| Blade | 20    | 1/4/2020 |
| Blade | 30    | 1/5/2020 |
| Blade | 40    | 1/6/2020 |
| Blade | 50    | 1/7/2020 |
| Blade | 60    | 1/8/2020 |
| Blade | 70    | 1/9/2020 |
| Table | 10    | 1/3/2020 |
| Table | 20    | 1/4/2020 |
| Table | 30    | 1/5/2020 |
| Table | 40    | 1/6/2020 |
| Table | 50    | 1/7/2020 |
| Table | 60    | 1/8/2020 |
| Table | 70    | 1/9/2020 |

Another table has two rules for each item on how many days are used to calcaute the rolling sum values.

Table 2: Rule

| item  | rule     | value    |
|-------|----------|----------|
| Blade | cum_sum  | 2.5      |
| Blade | lead_sum | 2.5      |
| Table | cum_sum  | 3        |
| Table | lead_sum | 3        |
|       |          |          |

Output:
cum_sum: For Balde, date - 1/3/2020, rule is 2.5 and so the value = 10+20+30 * 0.5
lead_sum: For Balde, date - 1/3/2020, rule is 2.5 and so the value = 20+30+40 * 0.5

How do I write the query to consider partial values for the last date.


| item  | stock | date     | cum_sum | lead_sum  |
|-------|-------|----------|---------|-----------|
| Blade | 10    | 1/3/2020 | 45      | 70        |
| Blade | 20    | 1/4/2020 | 70      | 95        |
| Blade | 30    | 1/5/2020 | 95      | 120       |
| Blade | 40    | 1/6/2020 | 120     | 145       |
| Blade | 50    | 1/7/2020 | 145     | 130       |
| Blade | 60    | 1/8/2020 | 130     | 70        |
| Blade | 70    | 1/9/2020 | 70      | 0         |
|       |       |          |         |           |
|       |       |          |         |           |
| Table | 10    | 1/3/2020 | 60      | 90        |
| Table | 20    | 1/4/2020 | 90      | 120       |
| Table | 30    | 1/5/2020 | 120     | 150       |
| Table | 40    | 1/6/2020 | 150     | 180       |
| Table | 50    | 1/7/2020 | 180     | 130       |
| Table | 60    | 1/8/2020 | 130     | 70        |
| Table | 70    | 1/9/2020 | 70      | 0         |

 https://sqlfiddle.com/postgresql/online-compiler?id=c87e6a47-0949-4781-b8b5-3559929a063d 
                                

user1708730 (13 rep)

Oct 5, 2024, 02:09 PM • Last activity: Oct 7, 2024, 02:11 AM

7 votes

1 answers

548 views

OUTPUT clause with window functions

sql-server window-functions output-clause

**Is there an undocumented restriction in the `OUTPUT` clause, or is this a bug?** Given the following table: ``` CREATE TABLE t1 (SomeId int, flag bit, value int); ``` I would like to use a calculated value in an `UPDATE` statement, and then use `OUTPUT` to output the value. Note that the calculate...

**Is there an undocumented restriction in the OUTPUT clause, or is this a bug?** Given the following table:

CREATE TABLE t1 (SomeId int, flag bit, value int);

I would like to use a calculated value in an UPDATE statement, and then use OUTPUT to output the value. Note that the calculated value is not being used in the SET part, which might have allowed a workaround by outputting the left-hand-side column. The following works fine, it is a perfectly standard updatable subquery (derived table).

UPDATE subT1
SET flag = 1
OUTPUT inserted.SomeValue
FROM (
    SELECT *,
      SomeValue = t1.value + 123
    FROM t1
) subT1;

However using a window function gets a weird error:

UPDATE subT1
SET flag = 1
OUTPUT inserted.Sum
FROM (
    SELECT *,
      Sum = SUM(t1.value) OVER (PARTITION BY t1.SomeId)
    FROM t1
) subT1;

Msg 404 Level 16 State 1 Line 3
The column reference "inserted.Sum" is not allowed because it refers to a base table that is not being modified in this statement.

Which doesn't make sense as we already established that calculated columns can be referred to using inserted. OK let's try using OUTPUT subT1.Sum anyway, rather than inserted.Sum:

Msg 4104 Level 16 State 1 Line 3
The multi-part identifier "subT1.Sum" could not be bound.

which makes sense, since that table reference has been modified, and the docs say: > If the table being modified is also specified in the FROM clause, any reference to columns in that table must be qualified with the INSERTED or DELETED prefix. ____ Meanwhile, if I use a join on a second table:

CREATE TABLE t2 (SomeId int, flag bit, value int);

it works fine

UPDATE t1
SET flag = 1
OUTPUT subT2.Sum
FROM t1
JOIN (
    SELECT t2.*,
      Sum = SUM(t2.value) OVER (PARTITION BY t2.SomeId)
    FROM t2
) subT2 ON subT2.SomeId = t1.SomeId;

dbfiddle **So is the fact that a window function throws that error a bug, or an undocumented restriction?** Having said all that, a join *inside* a subquery is also not allowed, so most likely it is an undocumented restriction.

UPDATE subT1
SET flag = 1
OUTPUT inserted.Sum
FROM (
    SELECT t1.*,
      Sum = t2.value
    FROM t1
    JOIN t2 ON t2.SomeId = t1.SomeId
) subT1;

Msg 404 Level 16 State 1 Line 3
The column reference "inserted.Sum" is not allowed because it refers to a base table that is not being modified in this statement.

Charlieface (17545 rep)

May 7, 2023, 11:18 AM • Last activity: Sep 23, 2024, 06:30 AM

2 votes

2 answers

569 views

Use expression in window function that references columns from the current row

sql-server window-functions database-agnostic

Suppose I have the following query that uses window function: ``` SELECT id , var , num , SUM(var * num) OVER (ORDER BY id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS calc FROM (VALUES (1, 0.1, 7), (2, 0.7, 1), (3, 0.3, 9), (4, 0.9, 5), (5, 0.5, 3) ) AS t(id, var, num) ``` And the following result:...

Suppose I have the following query that uses window function:

SELECT id
     , var
     , num
     , SUM(var * num) OVER (ORDER BY id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS calc
FROM (VALUES
    (1, 0.1, 7),
    (2, 0.7, 1),
    (3, 0.3, 9),
    (4, 0.9, 5),
    (5, 0.5, 3)
) AS t(id, var, num)

And the following result:

id | var | num | calc | explanation
1  | 0.1 | 7   | 0.7  | 0.1*7
2  | 0.7 | 1   | 1.4  | 0.1*7 + 0.7*1
3  | 0.3 | 9   | 4.1  | 0.1*7 + 0.7*1 + 0.3*9
4  | 0.9 | 5   | 7.9  | 0.7*1 + 0.3*9 + 0.9*5
5  | 0.5 | 3   | 8.7  | 0.3*9 + 0.9*5 + 0.5*3

Is is possible to reference the var column _from the outside_ inside the SUM() OVER ()? For example: ```none id | var | num | calc | sum of f(r.var, w.var, w.num) 1 | 0.1 | 7 | ... | iif(0.1fiddle][1] where I was able to achieve the result with correlated queries but I want to use window functions.

Salman Arshad (461 rep)

Nov 20, 2019, 09:23 AM • Last activity: Sep 8, 2024, 12:03 AM

0 votes

1 answers

51 views

PostgreSQL window function: Add value in CASE running sum gets below threshold

postgresql cte window-functions

For context: I am playing with a filling level sensor, a water tank and a watering pump and have the following problem: 1. I have an initial filling value for my tank. 2. Each time step, I am subtracting a certain amount (the outflow) from this init value via a window function, in order to get a time series with the filling level. 3. Should that running sum get below a lower threshold, it gets "refilled" to an upper threshold for the refill. 4. From there, I want to keep subtracting until it hits the lower value again, then refill, etc. 5. I would like to sum up the refill amount over the series Something like the following:

Assuming I start from 10, and refill from 5 as lower limit to 8 as upper limit. The table should look something like this, with running sum being the amount of water in the tank, refill_amount as the amount that was needed to refill, and/or refill_running_sum to get a final, total amount for refilling:

time_step | running_sum | refill_amount | refill_running_sum   |
 ----------+-------------+---------------+----------------------|      
      1          10             0                 0         
      2           9             0                 0         
      3           8             0                 0         
      4           7             0                 0         
      5           6             0                 0         
      6  -  -  -  5  -   -   -  3  -   -   -   -  3  -  --refill
      7           8             0                 3         
      8           7             0                 3         
      9           6             0                 3         
      10 -  -  -  5  -   -   -  3  -   -   -   -  6  -  --refill     
      11          8             0                 6        
      12          7             0                 6         
      13          6             0                 6         
      14 -  -  -  5  -   -   -  3  -   -   -   -  9  -  --refill     
      15          8             0                 9

I got as far as the SQL below dbfiddle here : Using a CASE statement, I get to refill the amount once and get the first amount of water to refill. But I would need a CTE with a case for every single refill. Apologies as I know I'm missing SQL vocabulary here to properly approach this. Yet thank you for any pointers!

/* Generate some mockup series with init+threshold limits */ 
    WITH vals AS ( 
    	SELECT 
    		generate_series(1,30,1) AS time_step
    		,10 AS init_val
    		,8 AS upper_limit_refill
    		,5 AS lower_limit_refill
    		,random() AS random_val
    ), 
    /* Create the window function subtracting a mockup val */
    example_series AS (	
    	SELECT 
    		*	
    		,init_val-sum(random_val) OVER(ORDER BY time_step ASC) AS running_amount
    	FROM vals
    )
    SELECT 
    	time_step
    	,running_amount
        /* I manage to get the amount needed to refill ONCE, but I'd need a new CTE and CASE for each refill like this*/
    	,CASE 
    		WHEN running_amount < lower_limit_refill 
    		THEN upper_limit_refill +  running_amount -sum(random_val) OVER(ORDER BY time_step ASC  ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) 
    		ELSE 0 
    	END AS refill_once
    	,CASE 
    		WHEN running_amount < lower_limit_refill 
    		THEN upper_limit_refill - running_amount
    		ELSE 0 
    	END AS refill_to_limit_amount

    FROM example_series

user22827835 (1 rep)

Aug 26, 2024, 04:43 PM • Last activity: Sep 2, 2024, 03:49 PM

0 votes

1 answers

50 views

Finding id of last child in table structure

postgresql view window-functions

I have DB structure like this one: [structure][1]. message_id from_user_id to_user_id subject body date_add parent_message_id 1 1 2 message 1 body 1 2024-07-01 null 2 2 1 message 2 body 2 2024-07-01 1 3 1 2 message 3 body 3 2024-07-01 2 4 2 1 message 4 body 4 2024-07-01 null 5 1 2 message 5 body 5 2...

                                  I have DB structure like this one: structure .

    message_id	from_user_id	to_user_id	subject	   body	   date_add    parent_message_id
    1	        1	            2	        message 1  body 1  2024-07-01  null
    2	        2	            1	        message 2  body 2  2024-07-01  1
    3	        1	            2	        message 3  body 3  2024-07-01  2
    4	        2	            1	        message 4  body 4  2024-07-01  null
    5	        1	            2	        message 5  body 5  2024-07-01  null
    6	        1	            2	        message 6  body 6  2024-07-01  null
    7	        2	            1	        message 7  body 7  2024-07-01  6
    8	        1	            2	        message 8  body 8  2024-07-01  7

I'm looking for efficient way to find last_message_id in each thread and total message count in each thread. If you have messages connected together with parent_message_id this create message thread (like in Gmail).



For example for:

    message_id last_thread_message_id total_thread_count
    1          3                      3  
    2          3                      3
    3          3                      3
    4          4                      1
    5          5                      1
    6          8                      3
    7          8                      3
    8          8                      3

thanks for advice

R.


                                

Robert (141 rep)

Jul 1, 2024, 01:59 PM • Last activity: Jul 2, 2024, 12:00 PM

12 votes

1 answers

1131 views

Is "ROW_NUMBER() OVER(ORDER BY xml.node)" well defined?

sql-server t-sql xml window-functions

(It is more a question of *documentation* rather than *behavior*. It has been migrated from [Stack overflow](https://stackoverflow.com/questions/78665262/) as was suggested there.) While researching the [answer to another question](https://stackoverflow.com/questions/78411293/#78412334) that required extracting XML nodes in the original element order, I came across several answers ([here](https://stackoverflow.com/questions/1134075/#30185119) , [here](https://stackoverflow.com/questions/17748012/#17764715) , and [here](https://stackoverflow.com/questions/20249307/#20250521)) that used expressions of the form ROW_NUMBER() OVER (ORDER BY xml.node), and asserted or implied that the assigned row number values would be assigned in XML document order. However, I cannot find anywhere that defines the behavior of ORDER BY xml.node. Although it seems to be allowed in the OVER() clause, the [documentation](https://learn.microsoft.com/en-us/sql/t-sql/queries/select-over-clause-transact-sql) does not specifically mention XML nodes. For example, given:

DECLARE @xml XML = '
One
Two
Three
Four
'

SELECT
    ROW_NUMBER() OVER(ORDER BY xml.node) AS rn,
    xml.node.value('./text()', 'varchar(255)') AS value
FROM @xml.nodes('*/node') xml(node)
ORDER BY
    ROW_NUMBER() OVER(ORDER BY xml.node)

The following results are returned: rn | value ---------- 1 | One 2 | Two 3 | Three 4 | Four **Question:** Are these results guaranteed anywhere in the documentation? Is this accepted as guaranteed, but undocumented behavior? Or is this another case like ORDER BY (SELECT NULL) that appears to work anecdotally for small seemingly pre-ordered source data sets, but may eventually fail when scaled up? The reason I am asking is that I would prefer not to recommend use of a technique whose behavior and reliability is not supported by the documentation. Interestingly, although an XML node is allowed in a windowed ORDER BY, it is not allowed in an ordinary SELECT ... ORDER BY. When used in an ordinary select order-by clause, ORDER BY xml.node yields the following error: > Msg 493 Level 16 State 1 Line 7 > > The column 'node' that was returned from the nodes() method cannot be used directly. It can only be used with one of the four XML data type methods, exist(), nodes(), query(), and value(), or in IS NULL and IS NOT NULL checks. The above error message does not list the window-function OVER(ORDER BY ...) as an allowed use. [See this dbfiddle](https://dbfiddle.uk/r_mOqNdp) .

T N (231 rep)

Jun 25, 2024, 05:10 PM • Last activity: Jun 28, 2024, 08:04 AM

0 votes

1 answers

41 views

Sort by maximum value per partition

postgresql order-by window-functions

I have a table with data something like below: ``` | ID(pk) | name | count | |--------|------|-------| | id1 | xyz | 98 | | id2 | ptr | 5 | | id3 | xyz | 2 | | id4 | de | 1 | | id5 | ptr | 1 | ``` I want to first have the row with max count then all the rows with same name sorted by count. After tha...

I have a table with data something like below:

| ID(pk) | name | count |
|--------|------|-------|
| id1    | xyz  |    98 |
| id2    | ptr  |     5 |
| id3    | xyz  |     2 |
| id4    | de   |     1 |
| id5    | ptr  |     1 |

I want to first have the row with max count then all the rows with same name sorted by count. After that next max count and all the counts with same name. Something like below:

| ID(pk) | name | count |
|--------|------|-------|
| id1    | xyz  |    98 |
| id3    | xyz  |     2 |
| id2    | ptr  |     5 |
| id5    | ptr  |     1 |
| id4    | de   |     1 |

Is something like this possible in Postgres?

gmtek (103 rep)

Jun 20, 2024, 11:44 AM • Last activity: Jun 21, 2024, 05:20 AM

1 votes

1 answers

372 views

Getting sum for each day based on most recent timestamp entry for each entity

sql-server sql-server-2019 aggregate cte window-functions

I have a table that contains an ID, timestamp, and integer. Each row is a "snapshot" of the entity at a given time: **TableA** | ID | timestamp | num | | -- | ---------- | --- | | A | 2024-02-04 | 5 | | B | 2024-02-04 | 3 | | C | 2024-02-07 | 8 | | A | 2024-02-07 | 3 | My goal is to get a table of all days from the last year and show the sum of num for that given day. The sum for each day though should look at the most recent timestamp for each entity. Then if there's no data for a given day, just forward-fill until reaching the next timestamp. So the result would be: **Goal** | day | total | | ---------- | ----- | | 2024-02-04 | 8 | | 2024-02-05 | 8 | | 2024-02-06 | 8 | | 2024-02-07 | 14 | 2024-02-05 and 2024-02-06 are filled from the most recent row because there is no timestamp data for them, and 2024-02-07 is 14 because entity A now has "3" instead of "5", and entity "C" has been added (3 + 8 + 3). My approach has been to start with a recursive CTE for the days of the last year, and then left join on another CTE that contains the aggregated data.

WITH DateSequence AS (
  SELECT CAST(GETDATE() AS DATE) AS [Date]
  UNION ALL
  SELECT DATEADD(DAY, -1, [Date])
  FROM DateSequence
  WHERE DATEADD(DAY, -1, [Date]) > DATEADD(DAY, -365, GETDATE())
),

Aggregator AS (
  /* for each day in TableA, sum the most recent num columns for each entity */
)

SELECT DateSequence.[Date], Aggregator.total 
FROM DateSequence 
LEFT JOIN Aggregator ON Aggregator.date = DateSequence.[Date]
ORDER BY DateSequence.[Date]
OPTION (MAXRECURSION 365);

The two parts I haven't worked out are the summing and the forward-filling. I'm pretty sure I need to use window functions, and need a way to say: for each day in **TableA**, sum the most recent num columns for each entity.

Timothy Fisher (177 rep)

May 3, 2024, 12:24 AM • Last activity: May 3, 2024, 05:54 AM

0 votes

1 answers

1546 views

Find Consecutive Date in SQL using 2 date columns

date window-functions gaps-and-islands

I am trying to group by all consecutive patient admission to the table. If the patient re-admitted to the hospital next day( or within 24 hours), then I need to keep only one admission stay per patient. I appreciate your help. 0 |Patient_bed_id |Patientname| Hospital_In_date| Hospital_out_Date| |-|-...

                                  I am trying to group by all consecutive patient admission to the table. If the patient re-admitted to the hospital next day( or within 24 hours), then I need to keep only one admission stay per patient. I appreciate your help.

0


|Patient_bed_id |Patientname| Hospital_In_date| Hospital_out_Date|
|-|-|-|-|
|111| Lukas| 1/1/2022| 1/31/2022|
|111 | Lukas | 2/1/2022| 2/28/2022|
|111| Lukas| 3/1/2022| 3/31/2022|
|111| Lukas| 5/25/2022| 6/2/2022|
|111| Lukas| 8/1/2022| 8/20/2022|
|111| Lukas| 8/21/2022| 9/10/2022|
|222| Jason| 5/1/2022| 5/3/2022|
|222| Jason| 6/15/2022 | 7/11/2022|
|222| Jason| 7/12/2022 | 7/26/2022|
|222| Jason| 9/13/2022| 9/15/2022|
|222| Jason| 9/16/2022 |9/27/2022|

Final table

|Patient_bed_id| Patientname| Hospital_In_date| Hospital_out_Date|
|-|-|-|-|-
|111| Lukas| 1/1/2022| 3/31/2022|
|111| Lukas| 5/25/2022| 6/2/2022|
|111| Lukas| 8/1/2022| 9/10/2022|
|222| Jason| 5/1/2022| 5/3/2022|
|222| Jason| 6/15/2022| 7/26/2022|
|222| Jason | 9/13/2022| 9/27/2022|



                                

Jane (1 rep)

Jun 29, 2022, 03:28 AM • Last activity: Apr 23, 2024, 07:24 AM

Showing page 1 of 20 total questions