Database Administrators
Q&A for database professionals who wish to improve their database skills
Latest Questions
0
votes
2
answers
451
views
postgresql cumulative counts in date range
I'm trying to get the cumulative count of rows (by `group_id`) between two dates that represent a time period where the row was active. I have a table like this: ``` group_id | id | type | start_date | end_date ----------+--------+------+------------+------------ 33 | 119435 | AAA | 2013-05-21 | 201...
I'm trying to get the cumulative count of rows (by
group_id
) between two dates that represent a time period where the row was active.
I have a table like this:
group_id | id | type | start_date | end_date
----------+--------+------+------------+------------
33 | 119435 | AAA | 2013-05-21 | 2014-05-19
33 | 15144 | AAA | 2013-05-21 | 2015-05-18
33 | 29393 | AAA | 2013-05-21 | 2016-05-23
33 | 119437 | AAA | 2013-05-21 | 2017-05-15
33 | 62380 | AAA | 2013-05-21 | 2099-12-31
33 | 119436 | AAA | 2013-05-21 | 2099-12-31
33 | 27346 | AAA | 2013-05-21 | 2099-12-31
33 | 28529 | AAA | 2014-05-20 | 2099-12-31
33 | 221576 | AAA | 2015-05-19 | 2099-12-31
33 | 253893 | AAA | 2016-05-24 | 2099-12-31
33 | 251589 | AAA | 2017-05-16 | 2099-12-31
33 | 285245 | AAA | 2019-01-24 | 2099-12-31
34 | 253893 | AAA | 2016-05-24 | 2099-12-31
34 | 251589 | AAA | 2017-05-16 | 2099-12-31
34 | 285245 | AAA | 2019-01-24 | 2099-12-31
34 | 285246 | AAA | 2019-05-31 | 2099-12-31
... and I need to get active counts for each of those date ranges like this:
group_id | start_date | end_date | active
----------+------------+------------+--------
33 | 2013-05-21 | 2014-05-19 | 7
33 | 2013-05-21 | 2015-05-18 | 8
33 | 2013-05-21 | 2016-05-23 | 9
33 | 2013-05-21 | 2017-05-15 | 10
33 | 2013-05-21 | 2099-12-31 | 12
33 | 2013-05-21 | 2099-12-31 | 12
33 | 2013-05-21 | 2099-12-31 | 12
33 | 2014-05-20 | 2099-12-31 | 11
33 | 2015-05-19 | 2099-12-31 | 10
33 | 2016-05-24 | 2099-12-31 | 9
33 | 2017-05-16 | 2099-12-31 | 8
33 | 2019-01-24 | 2099-12-31 | 8
34 | 2016-05-24 | 2099-12-31 | 1
34 | 2017-05-16 | 2099-12-31 | 2
34 | 2019-01-24 | 2099-12-31 | 3
34 | 2019-05-31 | 2099-12-31 | 4
I've tried various combinations of LAG
and LEAD
, with and without CTEs, but cannot come up with a solution.
Is there a way to do this in a single query? If not a single query, perhaps a combination of queries in a UDF?
**UPDATE**
Per @jjanes comment below, I believe my source table is setup incorrectly. I think I should create the source table like this instead:
group_id | id | type | start_date | end_date
----------+--------+------+------------+------------
... (skipped group 33) ...
34 | 253893 | AAA | 2016-05-24 | 2017-05-15
34 | 253893 | AAA | 2017-05-16 | 2019-01-23
34 | 253893 | AAA | 2019-01-24 | 2019-05-30
34 | 253893 | AAA | 2019-05-31 | 2099-12-31
34 | 251589 | AAA | 2017-05-16 | 2019-01-23
34 | 251589 | AAA | 2019-01-24 | 2019-05-30
34 | 251589 | AAA | 2019-05-31 | 2099-12-31
34 | 285245 | AAA | 2019-01-24 | 2019-05-30
34 | 285245 | AAA | 2019-05-31 | 2099-12-31
34 | 285246 | AAA | 2019-05-31 | 2099-12-31
With that change in the source data, the outcome of actives (showing only group 34 here) would be like this:
group_id | start_date | end_date | active
----------+------------+------------+--------
34 | 2016-05-24 | 2017-05-15 | 1
34 | 2017-05-16 | 2019-01-23 | 2
34 | 2019-01-24 | 2019-05-30 | 3
34 | 2019-05-31 | 2099-12-31 | 4
jacaetevha
(1 rep)
Mar 21, 2020, 08:20 PM
• Last activity: Jun 9, 2025, 09:01 AM
0
votes
1
answers
683
views
Postgresql: partition by number of rows after the current one
I need to calculate correlation for each say 3 consecutive rows. For example, assume there is a table with the following data. create table a ( asset_id int, time int not null, value decimal not null ); insert into a values (1,1,30),(1,2,45),(1,3,30),(1,4,30),(1,5,30), (2,1,30),(2,2,40),(2,3,30),(2,...
I need to calculate correlation for each say 3 consecutive rows.
For example, assume there is a table with the following data.
create table a (
asset_id int,
time int not null,
value decimal not null
);
insert into a values
(1,1,30),(1,2,45),(1,3,30),(1,4,30),(1,5,30),
(2,1,30),(2,2,40),(2,3,30),(2,4,25),(2,5,25);
# select row_number() over() as r, a.asset_id, b.asset_id, a.time, a.value, b.value
from a join a b on a.time=b.time and a.asset_id
yaugenka
(455 rep)
Feb 5, 2022, 12:32 PM
• Last activity: Jun 2, 2025, 08:05 PM
1
votes
1
answers
332
views
How to force Redshift/Postgres to aggregate nth_value?
I'm trying to retrieve the 2nd, 3rd, 4th order/transaction from an orders table using a query that looks like this: select customer_email, nth_value(created_at, 1) over (partition by customer_email), nth_value(created_at, 2) over (partition by customer_email), nth_value(created_at, 3) over (partitio...
I'm trying to retrieve the 2nd, 3rd, 4th order/transaction from an orders table using a query that looks like this:
select
customer_email,
nth_value(created_at, 1) over (partition by customer_email),
nth_value(created_at, 2) over (partition by customer_email),
nth_value(created_at, 3) over (partition by customer_email),
nth_value(created_at, 4) over (partition by customer_email)
from
fact_orders
where
group by
customer_email,
created_at
limit 100
My goal is to get the 1st, 2nd, 3rd, and 4th order for each customer.
I'm forced to do a group by on customer_email and created_at, which results in a row for each order and date by a customer.
The nth_value yields the correct results, but I would prefer a result set that looks as so:
customer_email, first_order, second_order, third_order, fourth_order
With only 1 record per customer.
How can I achieve this?
john
(153 rep)
Jul 10, 2018, 04:43 AM
• Last activity: Apr 21, 2025, 11:03 AM
1
votes
1
answers
2674
views
Rolling sum with a floor in Athena/Presto
In Athena, I want to calculate a rolling sum, over a window, that can't go below 0. For example, if summing up a column that has values ```(1, 2, -1, -2, -1, -2, 1, 2)``` I should get ```(1, 3, 2, 0, 0, 0, 1, 3)```. Without the floor constraint, it's easy - ```SUM(X) OVER (PARTITION BY some_group OR...
In Athena, I want to calculate a rolling sum, over a window, that can't go below 0. For example, if summing up a column that has values
(1, 2, -1, -2, -1, -2, 1, 2)
I should get (1, 3, 2, 0, 0, 0, 1, 3)
.
Without the floor constraint, it's easy - (X) OVER (PARTITION BY some_group ORDER BY ordering_col)
. Or if we just wanted to clip all the values at 0 after doing an actual cumulative sum. But I cannot figure out how to use window functions to achieve my desired result in general, though I can make it work for some special cases.
Is this even possible?
Max Gibiansky
(111 rep)
Mar 17, 2022, 06:15 PM
• Last activity: Apr 7, 2025, 01:09 PM
0
votes
1
answers
470
views
How do I calculate a streak of data and show the data that corresponds to it on MySQL?
I'm currently working on a project with NBA stats and I want to check a player's longest streak of his highest amount of points on MySQL. I'm open to a 5-10 game streak to simplify it. For example: Kevin Durant scores 30, 32, 35, 36, 41 in 5 consecutive games and that's his highest one, how would I...
I'm currently working on a project with NBA stats and I want to check a player's longest streak of his highest amount of points on MySQL. I'm open to a 5-10 game streak to simplify it.
For example: Kevin Durant scores 30, 32, 35, 36, 41 in 5 consecutive games and that's his highest one, how would I go about extracting the game stats, points, the length of the streak AND showing the total amount of points during this streak all together.
I've tried using multiple CTEs with window functions, but I confuse myself as I'm doing it. Is there an easy or simple way to do this at all?
thisisforfun
(1 rep)
Jul 18, 2023, 05:49 PM
• Last activity: Feb 21, 2025, 03:04 PM
0
votes
2
answers
57
views
In PostgreSQL how to get the hourly minimum of a value using a window function?
Let's assume we have the following data with one temperature measurement per second. The span of data is several years so there is a considerable number of rows. It is measurement data from a field device. The table has other columns as well but they are not relevant to the question. | Time | Temper...
Let's assume we have the following data with one temperature measurement per second. The span of data is several years so there is a considerable number of rows. It is measurement data from a field device. The table has other columns as well but they are not relevant to the question.
| Time | Temperature_deg_in_C |
| ---- | ------------------- |
| 2024-11-01 00:00:00+00 | 20.1 |
| 2024-11-01 00:00:01+00 | 21.2 |
| 2024-11-01 00:00:02+00 | 21.6 |
| 2024-11-01 00:00:03+00 | 20.2 |
| ... | ... |
| 2026-12-31 23:59:57+00 | 25.4 |
| 2026-12-31 23:59:58+00 | 25.2 |
| 2026-12-31 23:59:59+00 | 25.6 |
I know that I can use a
GROUP BY
clause to get the averages over one hour windows between a specific start and end times like so:
SELECT to_char("time", 'YYYY-MM-DD HH'), AVG("Temperature_deg_in_C")
FROM "measurements"
WHERE "time" BETWEEN '2024-11-01' AND '2024-12-01'
GROUP BY to_char("time", 'YYYY-MM-DD HH')
However, this syntax feels like a hack, and it doesn't quite bend to e.g. 10 second or other more complex window frames (or partitions).
Furthermore, I would like to know, specifically, how this is done correctly and performantly using the OVER
, WINDOW
, PARTITION BY
, RANGE
, ROWS
and similar tools outlined in [PostgreSQL manual](https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS) .
**UPDATE:** Window functions cannot be used to achieve this goal on Postgres, as they do not create "buckets" or "frames" in the way I thought they did. GROUP BY
is the way to achieve this.
The end goal is to find a solution where the window length and unit of measurement (minutes, hours, days), and the start and end time of the observing period are flexible but the query body itself stays the same. A parameterized query that is simple to understand, that is.
Antti Keskinen
(3 rep)
Nov 28, 2024, 03:23 PM
• Last activity: Nov 29, 2024, 05:52 AM
25
votes
5
answers
80772
views
Using DISTINCT in window function with OVER
I'm trying to migrate a query from Oracle to SQL Server 2014. Here is my query which works great in Oracle: select count(distinct A) over (partition by B) / count(*) over() as A_B from MyTable Here is the error i got after tried to run this query in SQL Server 2014. Use of DISTINCT is not allowed wi...
I'm trying to migrate a query from Oracle to SQL Server 2014.
Here is my query which works great in Oracle:
select
count(distinct A) over (partition by B) / count(*) over() as A_B
from MyTable
Here is the error i got after tried to run this query in SQL Server 2014.
Use of DISTINCT is not allowed with the OVER clause
Anyone know what is the problem? Is such as kind of query possible in SQL Server?
Please advise.
Omri
(383 rep)
Jan 11, 2015, 09:21 AM
• Last activity: Nov 20, 2024, 07:34 AM
2
votes
2
answers
487
views
How to select blocks with sequential data and aggregate the ids
I have the following table: id(int) startDate(timestamp) endDate(timestamp) plan_id(int) planned(bool) machine(int) -------------------------------------------------------------------------------------------------------------- 2005 '2019-01-16 08:29:24.872736' '2019-01-16 08:30:23.529706' 34 true 6...
I have the following table:
id(int) startDate(timestamp) endDate(timestamp) plan_id(int) planned(bool) machine(int)
--------------------------------------------------------------------------------------------------------------
2005 '2019-01-16 08:29:24.872736' '2019-01-16 08:30:23.529706' 34 true 6
2004 '2019-01-16 08:19:28.011148' '2019-01-16 08:29:22.680828' 34 true 6
2003 '2019-01-16 08:18:27.074312' '2019-01-16 08:19:25.753475' 34 true 6
2002 '2019-01-16 08:08:30.206288' '2019-01-16 08:18:24.856308' 34 true 6
2001 '2019-01-16 08:07:29.163124' '2019-01-16 08:08:27.949013' 34 true 6
2000 '2019-01-16 07:59:03.221309' '2019-01-16 08:00:14.654391' null false 7
1999 '2019-01-16 08:00:00.986367' '2019-01-16 08:00:03.221309' null false 6
1998 '2019-01-16 07:57:30.711044' '2019-01-16 07:59:58.778444' null false 6
1997 '2019-01-16 07:56:32.466508' '2019-01-16 07:57:28.489287' null false 6
1996 '2019-01-16 07:50:06.887349' '2019-01-16 07:56:30.237725' null false 6
1995 '2019-01-16 07:46:34.327582' '2019-01-16 07:50:04.619592' 33 true 6
1994 '2019-01-16 07:45:33.813483' '2019-01-16 07:46:32.014849' 33 true 6
1993 '2019-01-16 07:24:39.267365' '2019-01-16 07:39:23.786911' null false 6
1992 '2019-01-16 07:23:39.646218' '2019-01-16 07:24:37.093414' null false 6
1991 '2019-01-16 07:13:41.166337' '2019-01-16 07:23:37.403375' null false 6
1990 '2019-01-16 07:12:39.961234' '2019-01-16 07:13:38.907838' null false 6
1989 '2019-01-16 07:10:46.984236' '2019-01-16 07:12:37.647108' null false 6
1988 '2019-01-15 17:05:59.832834' '2019-01-15 17:08:21.603931' 31 true 6
1987 '2019-01-15 17:04:59.567046' '2019-01-15 17:05:57.565188' 31 true 6
1986 '2019-01-15 17:00:01.411266' '2019-01-15 17:10:57.255158' 31 true 7
I have to select the IDs of the blocks of unplanned records for a specific machine. I have been trying using window function, unfortunately, I couldn't work out the logic of it!
The problem here is that since we have different machines, we cannot rely on sequential ids, just that the
endDate
of a sequence is very close to next startDate
(it is ok to set a tolerance constant e.g. 3 seconds).
I would like to have a query where the result would be: the min startDate
, the max endDate
and the IDs of the block. For this sample with machine = 6
, it would be:
blockStartDate blockEndDate ids
-------------------------------------------------------------------------------
"2019-01-16 07:50:06.887349" "2019-01-16 08:00:03.221309" [1999,1998,1997,1996]
"2019-01-16 07:10:46.984236" "2019-01-16 07:39:23.786911" [1989,1990,1991,1992,1993]
Note that the answer, in this case, has sequential IDs but this is not always the case. I am working on providing real data where 2 machines are producing data at the same time and the ids become useless.
Attyla Fellipe
(23 rep)
Jan 17, 2019, 02:00 PM
• Last activity: Nov 14, 2024, 01:44 AM
1
votes
1
answers
3363
views
(Athena/Presto) Window function (lag) is not reading the previous lag row
I have two tables, ```credit_memos``` and ```credit_memo_items``` and want to show each event individually and will thus do a insert (doing union in this case which is logically the same). I know the key for both tables and am doing a union with them while sharing the same partition key ```credit_me...
I have two tables,
The code for the second column looks as follows:
Rows 3 and 12 should have the invoice_nk based on the credit_memo
cm is the subquery to grab the credit memos
cmi is the subquery to grab the credit memo items
combined is just to precompute the order by key in the window functions
Note that both are pulled from a database that has the full change log (CDC).
The end goal is to make one row for each event, I'm defining an event for the final table to be any time either credit memo or credit memo item undergoes a change. I would like to be able to snapshot the state of a credit memo combined with the credit memo item by windowing at the
and
and want to show each event individually and will thus do a insert (doing union in this case which is logically the same).
I know the key for both tables and am doing a union with them while sharing the same partition key
that I'll be using for the window.
I'm trying to use lag to fill up the rest of the null rows as shown.

(invoice_nk, lag(invoice_nk, 1, '') over (partition by cm_dd order by event_at)) as invoice_nk
For some reason the third row shown in the picture is not able to interpret the previous lagged row and insert a value its row as a result. I presume it may have to do something where Athena does not actually execute the lag function recursively and takes the previous row in each partition literally and doesn't wait for the previous rows to get executed.
Is there a way to grab values recursively as I'm trying to do?
Thanks.
Update:
Here's the full query:
with
cme as(
with
cm as(
select
-- change_log
row_number() over (partition by id order by updated_at desc) r
, cast(cm.invoice_id as varchar(99)) as invoice_nk
, cast(cm.id as varchar(99)) as cm_dd
, cast(cm.created_at as timestamp) as cm_created_at
, cast(cm.updated_at as timestamp) as cm_updated_at
from "postgres-replica-parquet".credit_memos cm
)
select
case when r = 1 then cm_created_at
else cm_updated_at end as cm_begin_at
, null as cmi_begin_at
, invoice_nk
, cm_dd
, cm_created_at
, cm_updated_at
from cm
)
, cmi as(
with
ci as(
select
-- ci.change_log
row_number() over (partition by id order by updated_at desc) r
, cast(ci.credit_memo_id as varchar(99)) as cm_id
, cast(ci.created_at as timestamp) as cmi_created_at
, cast(ci.updated_at as timestamp) as cmi_updated_at
from "postgres-replica-parquet".credit_memo_items ci
)
select -- row 1 is the snapshot when cdc first started
null as cm_begin_at
, case when r = 1 then cmi_created_at
else cmi_updated_at end as cmi_begin_at
, null as invoice_nk
, cm_id as cm_dd
, null as cm_created_at
, null as cm_updated_at
from ci
)
, combined as(
with
all as(
select * from cme
union select * from cmi
)
select
greatest(
coalesce(cm_begin_at, cast(date_parse('1970-01-01', '%Y-%m-%d') as timestamp))
, coalesce(cmi_begin_at, cast(date_parse('1970-01-01', '%Y-%m-%d') as timestamp))
) as event_at
, *
from all
)
select
event_at
, cm_begin_at as event_credit_memo_last_at
, cmi_begin_at as event_credit_memo_item_last_at
, coalesce(invoice_nk, lag(invoice_nk, 1, '') over (partition by cm_dd order by event_at)) as invoice_nk
, cm_dd as credit_memo_dd
from combined
And a snapshot of the full result:

for reporting.
Eugene Choi
(11 rep)
Mar 11, 2022, 03:11 PM
• Last activity: Nov 13, 2024, 02:01 PM
0
votes
0
answers
27
views
Variable only resets per row when query has a static IN condition
I have a complicated query. I have a bunch of users. I have a ledger of debits and credits. I want to filter a user's subscribers by how long the subscribers have been negative. To do that I find negative subscribers for a given user, get each subscriber's transactions in reverse order, and work bac...
I have a complicated query. I have a bunch of users. I have a ledger of debits and credits. I want to filter a user's subscribers by how long the subscribers have been negative. To do that I find negative subscribers for a given user, get each subscriber's transactions in reverse order, and work backwards until the subscriber's balance is ≥ 0 to find how long ago that was.
The query works only if I take the list of subscriber IDs the query returns and add them into the
WHERE
clause.
##### The Query
-- Return just the info we want
SELECT subscriber_id,
balance,
days_negative,
DATE_SUB(CURRENT_DATE, INTERVAL days_negative DAY) AS days_since
FROM (
-- Reset @balanced for each subscriber
SELECT r.subscriber_id,
balance,
(@balanced := FALSE),
(
-- get date of last transaction before balance >= 0
SELECT DATEDIFF(CURRENT_DATE, MIN(transaction_time))
FROM (
-- work backwards until balance >= 0
SELECT *,
ROUND(balance, 2),
ROUND(balance - running_total, 2),
(@balanced := @balanced OR balance - running_total + amount >= 0) AS balance_reached
FROM (
-- get transactions with running total
SELECT t.transaction_id,
t.transaction_time,
t.subscriber_id,
t.amount,
t.category,
SUM(t.amount) OVER w AS running_total
FROM transactions t
WHERE t.subscriber_id = r.subscriber_id
AND t
.void
= 0
WINDOW w AS (ORDER BY transaction_time DESC, transaction_id DESC)
) with_running_total
) with_balance_reached
WHERE balance_reached = 0
) AS days_negative
FROM subscribers r
INNER JOIN user_subscriber_linker usl USING (subscriber_id)
INNER JOIN subscriber_balance b USING (subscriber_id)
WHERE usl.user_id = 12345
AND r.subscriber_id IN (987, 654, 321)
AND r.void = 0
AND b.balance < 0
GROUP BY subscriber_id
) with_each;
This query gives me a result like this:
| renter\_id | balance | days\_negative | negative\_since |
| :--- | :--- | :--- | :--- |
| 987 | -5998.90 | 98 | 2024-08-01 |
| 654 | -2000.00 | 62 | 2024-09-06 |
| 321 | -3774.20 | 79 | 2024-08-20 |
When I remove the line AND r.subscriber_id IN (987, 654, 321)
, I get this:
| renter\_id | balance | days\_negative | negative\_since |
| :--- | :--- | :--- | :--- |
| 987 | -5998.90 | 98 | 2024-08-01 |
| 654 | -2000.00 | null
| null
|
| 321 | -3774.20 | null
| null
|
The fact that the first row is correct leads me to believe @balanced
isn't being set for each row. I tried using joins (JOIN (SELECT @balanced := FALSE)
) to set the var but that hasn't worked either.
How do I get the query to work?
tvanc
(101 rep)
Nov 7, 2024, 07:57 PM
• Last activity: Nov 7, 2024, 08:43 PM
1
votes
1
answers
765
views
Numbering rows consecutively for a number of tables
I have a number of tables, all containing attributes `aid,bid,cid`, and `xid` of type integer, other attributes may differ. For every (given) table `T`, I would like to order the rows according to `aid,bid,cid` ascending and update column `xid` with incremented values starting with 0. What's the bes...
I have a number of tables, all containing attributes
aid,bid,cid
, and xid
of type integer, other attributes may differ. For every (given) table T
, I would like to order the rows according to aid,bid,cid
ascending and update column xid
with incremented values starting with 0. What's the best way to achieve that?
My current solution involves:
- selecting table T
- openining a cursor over the ordered tuples of a table
- assigning the autoincrement value to xid
- inserting the tuple into a temporal table T_temp
- deleting all records from T
- inserting all records from T_temp
into T
Since the tables have different schemas, I programmed half of the code in PL/pgSQL
and half of the code with bash scripting.
**Question 1:** Any comments how to have it program that in pure PL/pgSQL?
**Question 2:** Any comments how to implement that more elegantly?
arthur
(888 rep)
Dec 13, 2013, 12:31 PM
• Last activity: Nov 1, 2024, 02:19 AM
1
votes
1
answers
57
views
Rolling sum with partial value for last day
I can't figure out how to write a query that returns rolling sum by item based on rules in another table. Below is a table that list in chronological order the stock value of an item on specific days. Table 1: Stock | item | stock | date | |-------|-------|----------| | Blade | 10 | 1/3/2020 | | Bla...
I can't figure out how to write a query that returns rolling sum by item based on rules in another table.
Below is a table that list in chronological order the stock value of an item on specific days.
Table 1: Stock
| item | stock | date |
|-------|-------|----------|
| Blade | 10 | 1/3/2020 |
| Blade | 20 | 1/4/2020 |
| Blade | 30 | 1/5/2020 |
| Blade | 40 | 1/6/2020 |
| Blade | 50 | 1/7/2020 |
| Blade | 60 | 1/8/2020 |
| Blade | 70 | 1/9/2020 |
| Table | 10 | 1/3/2020 |
| Table | 20 | 1/4/2020 |
| Table | 30 | 1/5/2020 |
| Table | 40 | 1/6/2020 |
| Table | 50 | 1/7/2020 |
| Table | 60 | 1/8/2020 |
| Table | 70 | 1/9/2020 |
Another table has two rules for each item on how many days are used to calcaute the rolling sum values.
Table 2: Rule
| item | rule | value |
|-------|----------|----------|
| Blade | cum_sum | 2.5 |
| Blade | lead_sum | 2.5 |
| Table | cum_sum | 3 |
| Table | lead_sum | 3 |
| | | |
Output:
cum_sum: For Balde, date - 1/3/2020, rule is 2.5 and so the value = 10+20+30 * 0.5
lead_sum: For Balde, date - 1/3/2020, rule is 2.5 and so the value = 20+30+40 * 0.5
How do I write the query to consider partial values for the last date.
| item | stock | date | cum_sum | lead_sum |
|-------|-------|----------|---------|-----------|
| Blade | 10 | 1/3/2020 | 45 | 70 |
| Blade | 20 | 1/4/2020 | 70 | 95 |
| Blade | 30 | 1/5/2020 | 95 | 120 |
| Blade | 40 | 1/6/2020 | 120 | 145 |
| Blade | 50 | 1/7/2020 | 145 | 130 |
| Blade | 60 | 1/8/2020 | 130 | 70 |
| Blade | 70 | 1/9/2020 | 70 | 0 |
| | | | | |
| | | | | |
| Table | 10 | 1/3/2020 | 60 | 90 |
| Table | 20 | 1/4/2020 | 90 | 120 |
| Table | 30 | 1/5/2020 | 120 | 150 |
| Table | 40 | 1/6/2020 | 150 | 180 |
| Table | 50 | 1/7/2020 | 180 | 130 |
| Table | 60 | 1/8/2020 | 130 | 70 |
| Table | 70 | 1/9/2020 | 70 | 0 |
https://sqlfiddle.com/postgresql/online-compiler?id=c87e6a47-0949-4781-b8b5-3559929a063d
user1708730
(13 rep)
Oct 5, 2024, 02:09 PM
• Last activity: Oct 7, 2024, 02:11 AM
7
votes
1
answers
548
views
OUTPUT clause with window functions
**Is there an undocumented restriction in the `OUTPUT` clause, or is this a bug?** Given the following table: ``` CREATE TABLE t1 (SomeId int, flag bit, value int); ``` I would like to use a calculated value in an `UPDATE` statement, and then use `OUTPUT` to output the value. Note that the calculate...
**Is there an undocumented restriction in the
OUTPUT
clause, or is this a bug?**
Given the following table:
CREATE TABLE t1 (SomeId int, flag bit, value int);
I would like to use a calculated value in an UPDATE
statement, and then use OUTPUT
to output the value. Note that the calculated value is not being used in the SET
part, which might have allowed a workaround by outputting the left-hand-side column.
The following works fine, it is a perfectly standard updatable subquery (derived table).
UPDATE subT1
SET flag = 1
OUTPUT inserted.SomeValue
FROM (
SELECT *,
SomeValue = t1.value + 123
FROM t1
) subT1;
However using a window function gets a weird error:
UPDATE subT1
SET flag = 1
OUTPUT inserted.Sum
FROM (
SELECT *,
Sum = SUM(t1.value) OVER (PARTITION BY t1.SomeId)
FROM t1
) subT1;
Msg 404 Level 16 State 1 Line 3
The column reference "inserted.Sum" is not allowed because it refers to a base table that is not being modified in this statement.
Which doesn't make sense as we already established that calculated columns can be referred to using inserted
.
OK let's try using OUTPUT subT1.Sum
anyway, rather than inserted.Sum
:
Msg 4104 Level 16 State 1 Line 3
The multi-part identifier "subT1.Sum" could not be bound.
which makes sense, since that table reference has been modified, and the docs say:
> If the table being modified is also specified in the FROM clause, any reference to columns in that table must be qualified with the INSERTED or DELETED prefix.
____
Meanwhile, if I use a join on a second table:
CREATE TABLE t2 (SomeId int, flag bit, value int);
it works fine
UPDATE t1
SET flag = 1
OUTPUT subT2.Sum
FROM t1
JOIN (
SELECT t2.*,
Sum = SUM(t2.value) OVER (PARTITION BY t2.SomeId)
FROM t2
) subT2 ON subT2.SomeId = t1.SomeId;
dbfiddle
**So is the fact that a window function throws that error a bug, or an undocumented restriction?**
Having said all that, a join *inside* a subquery is also not allowed, so most likely it is an undocumented restriction.
UPDATE subT1
SET flag = 1
OUTPUT inserted.Sum
FROM (
SELECT t1.*,
Sum = t2.value
FROM t1
JOIN t2 ON t2.SomeId = t1.SomeId
) subT1;
Msg 404 Level 16 State 1 Line 3
The column reference "inserted.Sum" is not allowed because it refers to a base table that is not being modified in this statement.
Charlieface
(17545 rep)
May 7, 2023, 11:18 AM
• Last activity: Sep 23, 2024, 06:30 AM
2
votes
2
answers
569
views
Use expression in window function that references columns from the current row
Suppose I have the following query that uses window function: ``` SELECT id , var , num , SUM(var * num) OVER (ORDER BY id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS calc FROM (VALUES (1, 0.1, 7), (2, 0.7, 1), (3, 0.3, 9), (4, 0.9, 5), (5, 0.5, 3) ) AS t(id, var, num) ``` And the following result:...
Suppose I have the following query that uses window function:
SELECT id
, var
, num
, SUM(var * num) OVER (ORDER BY id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS calc
FROM (VALUES
(1, 0.1, 7),
(2, 0.7, 1),
(3, 0.3, 9),
(4, 0.9, 5),
(5, 0.5, 3)
) AS t(id, var, num)
And the following result:
id | var | num | calc | explanation
1 | 0.1 | 7 | 0.7 | 0.1*7
2 | 0.7 | 1 | 1.4 | 0.1*7 + 0.7*1
3 | 0.3 | 9 | 4.1 | 0.1*7 + 0.7*1 + 0.3*9
4 | 0.9 | 5 | 7.9 | 0.7*1 + 0.3*9 + 0.9*5
5 | 0.5 | 3 | 8.7 | 0.3*9 + 0.9*5 + 0.5*3
Is is possible to reference the var
column _from the outside_ inside the SUM() OVER ()
? For example:
```none
id | var | num | calc | sum of f(r.var, w.var, w.num)
1 | 0.1 | 7 | ... | iif(0.1fiddle][1] where I was able to achieve the result with correlated queries but I want to use window functions.
Salman Arshad
(461 rep)
Nov 20, 2019, 09:23 AM
• Last activity: Sep 8, 2024, 12:03 AM
0
votes
1
answers
51
views
PostgreSQL window function: Add value in CASE running sum gets below threshold
For context: I am playing with a filling level sensor, a water tank and a watering pump and have the following problem: 1. I have an initial filling value for my tank. 2. Each time step, I am subtracting a certain amount (the outflow) from this init value via a window function, in order to get a tim...
For context:
I am playing with a filling level sensor, a water tank and a watering pump and have the following problem:
1. I have an initial filling value for my tank.
2. Each time step, I am subtracting a certain amount (the outflow) from this init value via a window function, in order to get a time series with the filling level.
3. Should that running sum get below a lower threshold, it gets "refilled" to an upper threshold for the refill.
4. From there, I want to keep subtracting until it hits the lower value again, then refill, etc.
5. I would like to sum up the refill amount over the series
Something like the following:
Assuming I start from 10, and refill from 5 as lower limit to 8 as upper limit. The table should look something like this, with

running sum
being the amount of water in the tank, refill_amount
as the amount that was needed to refill, and/or refill_running_sum
to get a final, total amount for refilling:
time_step | running_sum | refill_amount | refill_running_sum |
----------+-------------+---------------+----------------------|
1 10 0 0
2 9 0 0
3 8 0 0
4 7 0 0
5 6 0 0
6 - - - 5 - - - 3 - - - - 3 - --refill
7 8 0 3
8 7 0 3
9 6 0 3
10 - - - 5 - - - 3 - - - - 6 - --refill
11 8 0 6
12 7 0 6
13 6 0 6
14 - - - 5 - - - 3 - - - - 9 - --refill
15 8 0 9
I got as far as the SQL below dbfiddle here :
Using a CASE
statement, I get to refill the amount once and get the first amount of water to refill. But I would need a CTE with a case for every single refill.
Apologies as I know I'm missing SQL vocabulary here to properly approach this. Yet thank you for any pointers!
/* Generate some mockup series with init+threshold limits */
WITH vals AS (
SELECT
generate_series(1,30,1) AS time_step
,10 AS init_val
,8 AS upper_limit_refill
,5 AS lower_limit_refill
,random() AS random_val
),
/* Create the window function subtracting a mockup val */
example_series AS (
SELECT
*
,init_val-sum(random_val) OVER(ORDER BY time_step ASC) AS running_amount
FROM vals
)
SELECT
time_step
,running_amount
/* I manage to get the amount needed to refill ONCE, but I'd need a new CTE and CASE for each refill like this*/
,CASE
WHEN running_amount < lower_limit_refill
THEN upper_limit_refill + running_amount -sum(random_val) OVER(ORDER BY time_step ASC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
ELSE 0
END AS refill_once
,CASE
WHEN running_amount < lower_limit_refill
THEN upper_limit_refill - running_amount
ELSE 0
END AS refill_to_limit_amount
FROM example_series
user22827835
(1 rep)
Aug 26, 2024, 04:43 PM
• Last activity: Sep 2, 2024, 03:49 PM
0
votes
1
answers
50
views
Finding id of last child in table structure
I have DB structure like this one: [structure][1]. message_id from_user_id to_user_id subject body date_add parent_message_id 1 1 2 message 1 body 1 2024-07-01 null 2 2 1 message 2 body 2 2024-07-01 1 3 1 2 message 3 body 3 2024-07-01 2 4 2 1 message 4 body 4 2024-07-01 null 5 1 2 message 5 body 5 2...
I have DB structure like this one: structure .
message_id from_user_id to_user_id subject body date_add parent_message_id
1 1 2 message 1 body 1 2024-07-01 null
2 2 1 message 2 body 2 2024-07-01 1
3 1 2 message 3 body 3 2024-07-01 2
4 2 1 message 4 body 4 2024-07-01 null
5 1 2 message 5 body 5 2024-07-01 null
6 1 2 message 6 body 6 2024-07-01 null
7 2 1 message 7 body 7 2024-07-01 6
8 1 2 message 8 body 8 2024-07-01 7
I'm looking for efficient way to find last_message_id in each thread and total message count in each thread. If you have messages connected together with parent_message_id this create message thread (like in Gmail).
For example for:
message_id last_thread_message_id total_thread_count
1 3 3
2 3 3
3 3 3
4 4 1
5 5 1
6 8 3
7 8 3
8 8 3
thanks for advice
R.
Robert
(141 rep)
Jul 1, 2024, 01:59 PM
• Last activity: Jul 2, 2024, 12:00 PM
12
votes
1
answers
1131
views
Is "ROW_NUMBER() OVER(ORDER BY xml.node)" well defined?
(It is more a question of *documentation* rather than *behavior*. It has been migrated from [Stack overflow](https://stackoverflow.com/questions/78665262/) as was suggested there.) While researching the [answer to another question](https://stackoverflow.com/questions/78411293/#78412334) that require...
(It is more a question of *documentation* rather than *behavior*. It has been migrated from [Stack overflow](https://stackoverflow.com/questions/78665262/) as was suggested there.)
While researching the [answer to another question](https://stackoverflow.com/questions/78411293/#78412334) that required extracting XML nodes in the original element order, I came across several answers ([here](https://stackoverflow.com/questions/1134075/#30185119) , [here](https://stackoverflow.com/questions/17748012/#17764715) , and [here](https://stackoverflow.com/questions/20249307/#20250521)) that used expressions of the form
ROW_NUMBER() OVER (ORDER BY xml.node)
, and asserted or implied that the assigned row number values would be assigned in XML document order.
However, I cannot find anywhere that defines the behavior of ORDER BY xml.node
. Although it seems to be allowed in the OVER()
clause, the [documentation](https://learn.microsoft.com/en-us/sql/t-sql/queries/select-over-clause-transact-sql) does not specifically mention XML nodes.
For example, given:
DECLARE @xml XML = '
One
Two
Three
Four
'
SELECT
ROW_NUMBER() OVER(ORDER BY xml.node) AS rn,
xml.node.value('./text()', 'varchar(255)') AS value
FROM @xml.nodes('*/node') xml(node)
ORDER BY
ROW_NUMBER() OVER(ORDER BY xml.node)
The following results are returned:
rn | value
----------
1 | One
2 | Two
3 | Three
4 | Four
**Question:** Are these results guaranteed anywhere in the documentation? Is this accepted as guaranteed, but undocumented behavior? Or is this another case like ORDER BY (SELECT NULL)
that appears to work anecdotally for small seemingly pre-ordered source data sets, but may eventually fail when scaled up? The reason I am asking is that I would prefer not to recommend use of a technique whose behavior and reliability is not supported by the documentation.
Interestingly, although an XML node is allowed in a windowed ORDER BY
, it is not allowed in an ordinary SELECT ... ORDER BY
. When used in an ordinary select order-by clause, ORDER BY xml.node
yields the following error:
> Msg 493 Level 16 State 1 Line 7
>
> The column 'node' that was returned from the nodes() method cannot be used directly. It can only be used with one of the four XML data type methods, exist(), nodes(), query(), and value(), or in IS NULL and IS NOT NULL checks.
The above error message does not list the window-function OVER(ORDER BY ...)
as an allowed use.
[See this dbfiddle](https://dbfiddle.uk/r_mOqNdp) .
T N
(231 rep)
Jun 25, 2024, 05:10 PM
• Last activity: Jun 28, 2024, 08:04 AM
0
votes
1
answers
41
views
Sort by maximum value per partition
I have a table with data something like below: ``` | ID(pk) | name | count | |--------|------|-------| | id1 | xyz | 98 | | id2 | ptr | 5 | | id3 | xyz | 2 | | id4 | de | 1 | | id5 | ptr | 1 | ``` I want to first have the row with max count then all the rows with same name sorted by count. After tha...
I have a table with data something like below:
| ID(pk) | name | count |
|--------|------|-------|
| id1 | xyz | 98 |
| id2 | ptr | 5 |
| id3 | xyz | 2 |
| id4 | de | 1 |
| id5 | ptr | 1 |
I want to first have the row with max count then all the rows with same name sorted by count. After that next max count and all the counts with same name. Something like below:
| ID(pk) | name | count |
|--------|------|-------|
| id1 | xyz | 98 |
| id3 | xyz | 2 |
| id2 | ptr | 5 |
| id5 | ptr | 1 |
| id4 | de | 1 |
Is something like this possible in Postgres?
gmtek
(103 rep)
Jun 20, 2024, 11:44 AM
• Last activity: Jun 21, 2024, 05:20 AM
1
votes
1
answers
372
views
Getting sum for each day based on most recent timestamp entry for each entity
I have a table that contains an ID, timestamp, and integer. Each row is a "snapshot" of the entity at a given time: **TableA** | ID | timestamp | num | | -- | ---------- | --- | | A | 2024-02-04 | 5 | | B | 2024-02-04 | 3 | | C | 2024-02-07 | 8 | | A | 2024-02-07 | 3 | My goal is to get a table of a...
I have a table that contains an ID, timestamp, and integer. Each row is a "snapshot" of the entity at a given time:
**TableA**
| ID | timestamp | num |
| -- | ---------- | --- |
| A | 2024-02-04 | 5 |
| B | 2024-02-04 | 3 |
| C | 2024-02-07 | 8 |
| A | 2024-02-07 | 3 |
My goal is to get a table of all days from the last year and show the sum of
num
for that given day. The sum for each day though should look at the most recent timestamp for each entity. Then if there's no data for a given day, just forward-fill until reaching the next timestamp. So the result would be:
**Goal**
| day | total |
| ---------- | ----- |
| 2024-02-04 | 8 |
| 2024-02-05 | 8 |
| 2024-02-06 | 8 |
| 2024-02-07 | 14 |
2024-02-05 and 2024-02-06 are filled from the most recent row because there is no timestamp data for them, and 2024-02-07 is 14 because entity A now has "3" instead of "5", and entity "C" has been added (3 + 8 + 3).
My approach has been to start with a recursive CTE for the days of the last year, and then left join on another CTE that contains the aggregated data.
WITH DateSequence AS (
SELECT CAST(GETDATE() AS DATE) AS [Date]
UNION ALL
SELECT DATEADD(DAY, -1, [Date])
FROM DateSequence
WHERE DATEADD(DAY, -1, [Date]) > DATEADD(DAY, -365, GETDATE())
),
Aggregator AS (
/* for each day in TableA, sum the most recent num
columns for each entity */
)
SELECT DateSequence.[Date], Aggregator.total
FROM DateSequence
LEFT JOIN Aggregator ON Aggregator.date = DateSequence.[Date]
ORDER BY DateSequence.[Date]
OPTION (MAXRECURSION 365);
The two parts I haven't worked out are the summing and the forward-filling. I'm pretty sure I need to use window functions, and need a way to say: for each day in **TableA**, sum the most recent num
columns for each entity.
Timothy Fisher
(177 rep)
May 3, 2024, 12:24 AM
• Last activity: May 3, 2024, 05:54 AM
0
votes
1
answers
1546
views
Find Consecutive Date in SQL using 2 date columns
I am trying to group by all consecutive patient admission to the table. If the patient re-admitted to the hospital next day( or within 24 hours), then I need to keep only one admission stay per patient. I appreciate your help. 0 |Patient_bed_id |Patientname| Hospital_In_date| Hospital_out_Date| |-|-...
I am trying to group by all consecutive patient admission to the table. If the patient re-admitted to the hospital next day( or within 24 hours), then I need to keep only one admission stay per patient. I appreciate your help.
0
|Patient_bed_id |Patientname| Hospital_In_date| Hospital_out_Date|
|-|-|-|-|
|111| Lukas| 1/1/2022| 1/31/2022|
|111 | Lukas | 2/1/2022| 2/28/2022|
|111| Lukas| 3/1/2022| 3/31/2022|
|111| Lukas| 5/25/2022| 6/2/2022|
|111| Lukas| 8/1/2022| 8/20/2022|
|111| Lukas| 8/21/2022| 9/10/2022|
|222| Jason| 5/1/2022| 5/3/2022|
|222| Jason| 6/15/2022 | 7/11/2022|
|222| Jason| 7/12/2022 | 7/26/2022|
|222| Jason| 9/13/2022| 9/15/2022|
|222| Jason| 9/16/2022 |9/27/2022|
Final table
|Patient_bed_id| Patientname| Hospital_In_date| Hospital_out_Date|
|-|-|-|-|-
|111| Lukas| 1/1/2022| 3/31/2022|
|111| Lukas| 5/25/2022| 6/2/2022|
|111| Lukas| 8/1/2022| 9/10/2022|
|222| Jason| 5/1/2022| 5/3/2022|
|222| Jason| 6/15/2022| 7/26/2022|
|222| Jason | 9/13/2022| 9/27/2022|
Jane
(1 rep)
Jun 29, 2022, 03:28 AM
• Last activity: Apr 23, 2024, 07:24 AM
Showing page 1 of 20 total questions