How to get the last not-null value in an ordered column of a huge table?

14 votes

3 answers

14302 views

sql-server t-sql null window-functions running-totals

                          I have the following input:

     id | value 
    ----+-------
      1 |   136
      2 |  NULL
      3 |   650
      4 |  NULL
      5 |  NULL
      6 |  NULL
      7 |   954
      8 |  NULL
      9 |   104
     10 |  NULL

I expect the following result:

     id | value 
    ----+-------
      1 |   136
      2 |   136
      3 |   650
      4 |   650
      5 |   650
      6 |   650
      7 |   954
      8 |   954
      9 |   104
     10 |   104

The trivial solution would be join the tables with a `= t1.id
      GROUP BY t2.id
    )
    SELECT
      tmp.id, t.value
    FROM t, tmp
    WHERE t.id = tmp.lastKnownId;

However, the trivial execution of this code would create internally the square of the count of the rows of the input table ( *O(n^2)* ). I expected t-sql to optimize it out - on a block/record level, the task to do is very easy and linear, essentially a for loop ( *O(n)* ).

However, on my experiments, the latest MS SQL 2016 can't optimize this query correctly, making this query impossible to execute for a large input table.

Furthermore, the query has to run quickly, making a similarly easy (but very different) cursor-based solution infeasible.

Using some memory-backed temporary table could be a good compromise, but I am not sure if it can be run significantly quicker, considered that my example query using subqueries didn't work.

I am also thinking on to dig out some windowing function from the t-sql docs, what could be tricked to do what I want. For example, cumulative sum  is doing some very similar, but I couldn't trick it to give the latest non-null element, and not the sum of the elements before.

The ideal solution would be a quick query without procedural code or temporary tables. Alternatively, also a solution with temporary tables is okay, but iterating the table procedurally is not.
                        

Asked by peterh (2137 rep)

Mar 31, 2019, 05:19 PM
Last activity: Jan 15, 2023, 03:31 PM

How to get the last not-null value in an ordered column of a huge table?

Related Questions