Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

1 votes

1 answers

2674 views

Rolling sum with a floor in Athena/Presto

aggregate window-functions amazon-presto

In Athena, I want to calculate a rolling sum, over a window, that can't go below 0. For example, if summing up a column that has values ```(1, 2, -1, -2, -1, -2, 1, 2)``` I should get ```(1, 3, 2, 0, 0, 0, 1, 3)```. Without the floor constraint, it's easy - ```SUM(X) OVER (PARTITION BY some_group OR...

In Athena, I want to calculate a rolling sum, over a window, that can't go below 0. For example, if summing up a column that has values

(1, 2, -1, -2, -1, -2, 1, 2)

I should get

(1, 3, 2, 0, 0, 0, 1, 3)

. Without the floor constraint, it's easy -

(X) OVER (PARTITION BY some_group ORDER BY ordering_col)

. Or if we just wanted to clip all the values at 0 after doing an actual cumulative sum. But I cannot figure out how to use window functions to achieve my desired result in general, though I can make it work for some special cases. Is this even possible?

Max Gibiansky (111 rep)

Mar 17, 2022, 06:15 PM • Last activity: Apr 7, 2025, 01:09 PM

1 votes

1 answers

3363 views

(Athena/Presto) Window function (lag) is not reading the previous lag row

window-functions amazon-presto

I have two tables, ```credit_memos``` and ```credit_memo_items``` and want to show each event individually and will thus do a insert (doing union in this case which is logically the same). I know the key for both tables and am doing a union with them while sharing the same partition key ```credit_me...

I have two tables,

and

and want to show each event individually and will thus do a insert (doing union in this case which is logically the same). I know the key for both tables and am doing a union with them while sharing the same partition key

that I'll be using for the window. I'm trying to use lag to fill up the rest of the null rows as shown.

The code for the second column looks as follows:

(invoice_nk, lag(invoice_nk, 1, '') over (partition by cm_dd order by event_at)) as invoice_nk

For some reason the third row shown in the picture is not able to interpret the previous lagged row and insert a value its row as a result. I presume it may have to do something where Athena does not actually execute the lag function recursively and takes the previous row in each partition literally and doesn't wait for the previous rows to get executed. Is there a way to grab values recursively as I'm trying to do? Thanks. Update: Here's the full query:

with
  cme as(
    with
      cm as(
        select
    --      change_log
          row_number() over (partition by id order by updated_at desc) r
        , cast(cm.invoice_id as varchar(99))                         as invoice_nk
        , cast(cm.id as varchar(99))                                 as cm_dd
        , cast(cm.created_at as timestamp)                           as cm_created_at
        , cast(cm.updated_at as timestamp)                           as cm_updated_at
        from "postgres-replica-parquet".credit_memos cm
      )
    select
      case when r = 1 then cm_created_at
           else cm_updated_at end as cm_begin_at
    , null as cmi_begin_at
    , invoice_nk
    , cm_dd
    , cm_created_at
    , cm_updated_at
    from cm
  )
, cmi as(
    with
      ci as(
        select
    --      ci.change_log
          row_number() over (partition by id order by updated_at desc) r
        , cast(ci.credit_memo_id as varchar(99)) as cm_id
        , cast(ci.created_at as timestamp)       as cmi_created_at
        , cast(ci.updated_at as timestamp)       as cmi_updated_at
        from "postgres-replica-parquet".credit_memo_items ci
      )
    select -- row 1 is the snapshot when cdc first started
      null as cm_begin_at
    , case when r = 1 then cmi_created_at
           else cmi_updated_at end as cmi_begin_at
    , null as invoice_nk
    , cm_id as cm_dd
    , null as cm_created_at
    , null as cm_updated_at
    from ci
  )
, combined as(
    with
      all as(
        select * from cme
        union select * from cmi
      )
    select
      greatest(
        coalesce(cm_begin_at, cast(date_parse('1970-01-01', '%Y-%m-%d') as timestamp))
      , coalesce(cmi_begin_at, cast(date_parse('1970-01-01', '%Y-%m-%d') as timestamp))
      ) as event_at
    , *
    from all
  )
select
  event_at
, cm_begin_at as event_credit_memo_last_at
, cmi_begin_at as event_credit_memo_item_last_at
, coalesce(invoice_nk, lag(invoice_nk, 1, '') over (partition by cm_dd order by event_at)) as invoice_nk
, cm_dd as credit_memo_dd
from combined

And a snapshot of the full result:

Rows 3 and 12 should have the invoice_nk based on the credit_memo cm is the subquery to grab the credit memos cmi is the subquery to grab the credit memo items combined is just to precompute the order by key in the window functions Note that both are pulled from a database that has the full change log (CDC). The end goal is to make one row for each event, I'm defining an event for the final table to be any time either credit memo or credit memo item undergoes a change. I would like to be able to snapshot the state of a credit memo combined with the credit memo item by windowing at the

for reporting.

Eugene Choi (11 rep)

Mar 11, 2022, 03:11 PM • Last activity: Nov 13, 2024, 02:01 PM

1 votes

3 answers

158 views

Presto/Mysql join validation

mysql amazon-presto

I have 2 tables, and the CASE statement below should only evaluate to 1/true if the record with the same lead_id exists in both tables. Here I have 2 ways of achieving this but just want to verify if its in fact the most optimal way of achieving this and which one is the better option here? The lead...

                                  I have 2 tables, and the CASE statement below should only evaluate to 1/true if the record with the same lead_id exists in both tables. Here I have 2 ways of achieving this but just want to verify if its in fact the most optimal way of achieving this and which one is the better option here? The lead_id in t1 is the primary key and will always exist. I need to verify if its moved into the 2nd table

**Query 1**

    SELECT
       CASE
          WHEN
             t1.lead_id = t2.lead_id 
          THEN
             1 
          ELSE
             0 
       END
       AS valid 
    FROM
       table1 t1 
       LEFT JOIN
          table2 t2 
          ON t1.lead_id = t2.lead_id

**Query 2**



    SELECT
       CASE
          WHEN
             t2.lead_id IS NOT NULL 
             AND t2.lead_id != 0 
          THEN
             1 
          ELSE
             0 
       END
       AS valid 
    FROM
       table1 t1 
       LEFT JOIN
          table2 t2 
          ON t1.lead_id = t2.lead_id



                                

code-is-life (117 rep)

Apr 28, 2022, 01:13 PM • Last activity: Apr 28, 2022, 06:09 PM

0 votes

2 answers

3170 views

PRESTO Left join using multiple operators

mysql amazon-presto

- I have 2 tables, one listing all our customers and then a 2nd listing every clients list of creditors/obligations - Firstly, I need to return all clients in t1 and then be able to determine if the client owns a vehicle or not by looking at t2 - This query seems to do the trick and works as expecte...

SELECT    t1.first_name,    t1.last_name,    t1.id_number,    CASE
          WHEN
             t2.uid IS NOT NULL 
          THEN
             1 
          ELSE
             0     END    AS vehicle_finance  FROM    db.client t1     LEFT JOIN
          db.obligation t2 
          ON t1.uid = t2.uid
          AND t2.creditor_name LIKE '%Vehicle Finance%'

- But recently I discovered additional variants/keywords to include for the creditor name and this is the updated query I came up with

SELECT
       t1.first_name,
       t1.last_name,
       t1.id_number,
       CASE
          WHEN
             t2.uid IS NOT NULL 
          THEN
             1 
          ELSE
             0 
       END
       AS vehicle_finance 
    FROM
       db.client t1 
       LEFT JOIN
          db.obligation t2 
          ON t1.uid = t2.uid
          AND lower(t2.creditor_name) LIKE '%vehicle finance%' 
          OR lower(t2.creditor_name) LIKE '%vehicle and asset finance%' 
          OR lower(t2.creditor_name) LIKE '%vehicle + asset finance%' 
          OR lower(t2.creditor_name) LIKE '%motor finance%'

- Can I use the operators(AND & OR) like this in a LEFT JOIN after ON? - I am not worried about duplicate rows being returned as the client will have several obligations, I have another process that takes care of the duplicates - Is this query correct and most efficient way of achieving this?

code-is-life (117 rep)

Feb 28, 2022, 06:05 PM • Last activity: Feb 28, 2022, 09:54 PM

3 votes

1 answers

3862 views

Getting the row before a row with a certain value in SQL

window-functions sql-standard hive amazon-presto

I have a table like below where user actions are stored with a timestamp. My goal is to identify the action that happened before a specific action (named reference_action) and count the number of those actions to see which actions happens before the specific actions and how they are distributed. I a...

                                  I have a table like below where user actions are stored with a timestamp. My goal is to identify the action that happened before a specific action (named reference_action) and count the number of those actions to see which actions happens before the specific actions and how they are distributed.

I am aware of window functions like LAG() where I can get the row before a certain row but can't figure out how to include a constraint like WHERE action_name = "reference_action".

The query engine is Presto and the tables are Hive tables but I'm mostly interested in the general SQL approach, therefore that shouldn't matter much.

| session | action_name | timestamp |
| ------- | ----------- | --------- |
| 1       | "some_action" | 1970-01-01 00:01:00 |
| 1       | "some_action" | 1970-01-01 00:02:00 |
| 1       | "some_action" | 1970-01-01 00:03:00 |
| 1       | "desired_action1" | 1970-01-01 00:04:00 |
| 1       | "reference_action" | 1970-01-01 00:05:00 |
| 1       | "some_action" | 1970-01-01 00:06:00 |
| 1       | "some_action" | 1970-01-01 00:07:00 |
| 2       | "some_action" | 1970-01-01 01:23:00 |
| 2       | "some_action" | 1970-01-01 02:34:00 |
| 2       | "desired_action1" | 1970-01-01 03:45:00 |
| 2       | "reference_action" | 1970-01-01 04:56:00 |
| 2       | "some_action" | 1970-01-01 05:58:00 |
| 3       | "some_action" | 1970-01-01 01:23:00 |
| 3       | "some_action" | 1970-01-01 02:34:00 |
| 3       | "desired_action2" | 1970-01-01 03:45:00 |
| 3       | "reference_action" | 1970-01-01 04:56:00 |
| 3       | "some_action" | 1970-01-01 05:58:00 |

The result should look like:

| action | count |
| ------ | ----- |
| "desired_action1" | 2 |
| "desired_action2" | 1 |

There are two rows where "desired_action1" is directly followed by a row with "reference_action", when ordered by timestamp, hence the count being 2. The same logic applies for why the count is 1 for "desired_action2".

The goal is to know what a user did before he made a purchase (purchase = reference_action). To understand what he did before, I want to look up the action that happened before a purchase. Therefore I need to know the action_name in the row before a reference_action. desired_actions have to be counted, reference_actions are just the rows after the actions I want to count and used to determine which values should be counted.
                                

Daniel Müller (133 rep)

May 18, 2021, 12:35 PM • Last activity: May 19, 2021, 10:08 AM

1 votes

1 answers

293 views

How to query from the result of a changed column of a table in postgresql

datetime amazon-presto

I am working with Presto on Amazon Athena. I have a string time column in a table and now I want to change that time to date time type and then query data for selected dates. Is there a direct way to do so? One way I could think of is 1) add a new column 2) insert values into it with converted date...

                                  I am working with Presto on Amazon Athena. I have a string time column in a table and now I want to change that time to date time type and then query data for selected dates.

Is there a direct way to do so? One way I could think of is 

    1) add a new column 
    2) insert values into it with converted date 
    3) Query using the new column

Here I am stuck with the 2nd step with INSERT so need help with that 

      ALTER TABLE "db"."table" 
        ADD COLUMN new_date timestamp
        
    -- Need correction in select statement that I don't understand
        INSERT INTO "db"."table" (new_date)
        (SELECT new_date from_iso8601_date(substr(timestamp,1,10))

Could some one help me with correction and if possible a better way of doing it?
                                

Atihska (129 rep)

Sep 20, 2018, 07:34 PM • Last activity: Oct 5, 2020, 09:08 PM

1 votes

1 answers

2621 views

AWS Athena: syntax for grouping data for week ending on a Saturday

amazon-presto

I'm having some issues with the Presto syntax in grouping data by separate weeks, ending on Saturdays. I have this field called "timestamp," and I've been using the the following to group by weeks. However, I noticed that date_truncs weeks ending on a Monday and using a date_add doesn't necessary gr...

                                  I'm having some issues with the Presto syntax in grouping data by separate weeks, ending on Saturdays. I have this field called "timestamp," and I've been using the the following to group by weeks. However, I noticed that date_truncs weeks ending on a Monday and using a date_add doesn't necessary group all rows correctly.

select date_add('day', 5, date_trunc('week', date(substr(timestamp, 1, 10)))) as week_ending

Any suggestions on how I can group weeks ending on Saturdays? Thanks!

Jamal (11 rep)

Aug 27, 2019, 06:17 AM • Last activity: Oct 3, 2020, 09:06 AM

1 votes

1 answers

1052 views

SQL query - Do I need a full join to produce this combined result?

join query amazon-presto

With two tables structured like this: Table1 Table2 ---------------- ---------------- id, time, x id, time, y, z 1 1 1 1 2 2 2 1 2 2 1 3 3 3 I am trying to join the tables to get a result that looks like this: Result -------------------------------- id, time, x, y, z 1 1 1 1 2 2 2 2 1 3 3 3 I was ab...

                                  With two tables structured like this:

    Table1                       Table2
    ----------------             ----------------
    id, time, x                  id, time, y, z
    1   1     1                  1   2     2  2
    1   2     2                  1   3     3  3

I am trying to join the tables to get a result that looks like this:

    Result
    --------------------------------
    id, time, x, y, z
    1   1     1  
    1   2     2  2  2
    1   3        3  3

I was able to get a partial result with a query like this, but it is returning all of the values from table2 (including undesired ids) along with the joined values of table1.

    select * 
    from Table1 
    full outer join Table2 on Table1.time=Table2.time and Table1.id ='1' and Table2.id ='1'
    order by Table2.time

Does anyone know how to fix? I'm trying to run this query in AWS Athena/Prestodb if that matters.

Thanks for the help!
                                

Jordan (13 rep)

Jan 4, 2019, 05:28 AM • Last activity: Jan 4, 2019, 06:09 PM

Showing page 1 of 8 total questions