Sample Header Ad - 728x90

Getting the row before a row with a certain value in SQL

3 votes
1 answer
3869 views
I have a table like below where user actions are stored with a timestamp. My goal is to identify the action that happened before a specific action (named reference_action) and count the number of those actions to see which actions happens before the specific actions and how they are distributed. I am aware of window functions like LAG() where I can get the row before a certain row but can't figure out how to include a constraint like WHERE action_name = "reference_action". The query engine is Presto and the tables are Hive tables but I'm mostly interested in the general SQL approach, therefore that shouldn't matter much. | session | action_name | timestamp | | ------- | ----------- | --------- | | 1 | "some_action" | 1970-01-01 00:01:00 | | 1 | "some_action" | 1970-01-01 00:02:00 | | 1 | "some_action" | 1970-01-01 00:03:00 | | 1 | "desired_action1" | 1970-01-01 00:04:00 | | 1 | "reference_action" | 1970-01-01 00:05:00 | | 1 | "some_action" | 1970-01-01 00:06:00 | | 1 | "some_action" | 1970-01-01 00:07:00 | | 2 | "some_action" | 1970-01-01 01:23:00 | | 2 | "some_action" | 1970-01-01 02:34:00 | | 2 | "desired_action1" | 1970-01-01 03:45:00 | | 2 | "reference_action" | 1970-01-01 04:56:00 | | 2 | "some_action" | 1970-01-01 05:58:00 | | 3 | "some_action" | 1970-01-01 01:23:00 | | 3 | "some_action" | 1970-01-01 02:34:00 | | 3 | "desired_action2" | 1970-01-01 03:45:00 | | 3 | "reference_action" | 1970-01-01 04:56:00 | | 3 | "some_action" | 1970-01-01 05:58:00 | The result should look like: | action | count | | ------ | ----- | | "desired_action1" | 2 | | "desired_action2" | 1 | There are two rows where "desired_action1" is directly followed by a row with "reference_action", when ordered by timestamp, hence the count being 2. The same logic applies for why the count is 1 for "desired_action2". The goal is to know what a user did before he made a purchase (purchase = reference_action). To understand what he did before, I want to look up the action that happened before a purchase. Therefore I need to know the action_name in the row before a reference_action. desired_actions have to be counted, reference_actions are just the rows after the actions I want to count and used to determine which values should be counted.
Asked by Daniel Müller (133 rep)
May 18, 2021, 12:35 PM
Last activity: May 19, 2021, 10:08 AM