Sample Header Ad - 728x90

How to get the last not-null value in an ordered column of a huge table?

14 votes
3 answers
14302 views
I have the following input: id | value ----+------- 1 | 136 2 | NULL 3 | 650 4 | NULL 5 | NULL 6 | NULL 7 | 954 8 | NULL 9 | 104 10 | NULL I expect the following result: id | value ----+------- 1 | 136 2 | 136 3 | 650 4 | 650 5 | 650 6 | 650 7 | 954 8 | 954 9 | 104 10 | 104 The trivial solution would be join the tables with a `= t1.id GROUP BY t2.id ) SELECT tmp.id, t.value FROM t, tmp WHERE t.id = tmp.lastKnownId; However, the trivial execution of this code would create internally the square of the count of the rows of the input table ( *O(n^2)* ). I expected t-sql to optimize it out - on a block/record level, the task to do is very easy and linear, essentially a for loop ( *O(n)* ). However, on my experiments, the latest MS SQL 2016 can't optimize this query correctly, making this query impossible to execute for a large input table. Furthermore, the query has to run quickly, making a similarly easy (but very different) cursor-based solution infeasible. Using some memory-backed temporary table could be a good compromise, but I am not sure if it can be run significantly quicker, considered that my example query using subqueries didn't work. I am also thinking on to dig out some windowing function from the t-sql docs, what could be tricked to do what I want. For example, cumulative sum is doing some very similar, but I couldn't trick it to give the latest non-null element, and not the sum of the elements before. The ideal solution would be a quick query without procedural code or temporary tables. Alternatively, also a solution with temporary tables is okay, but iterating the table procedurally is not.
Asked by peterh (2137 rep)
Mar 31, 2019, 05:19 PM
Last activity: Jan 15, 2023, 03:31 PM