Optimizing SELECT COUNT(DISTINCT) on a table increasing daily

0 votes

0 answers

293 views

query-performance optimization aws distinct

Let's say we have a table

which has the columns

, school_id, grade, timestamp

. We collect usage data of students daily and so the table grows daily (note that there could be multiple rows corresponding to the same student - the purpose is to track usages, so the student could be logging on and off, or using different apps etc). Using this table, we maintain another table called

and it has the columns

, grade, count.

The

is the number of unique students we have recorded for a pair of

(school_id, grade)

, so far. Then, once yesterday passes, we run the nightly query to create the table unique_students_to_date:

SELECT COUNT(DISTINCT(student_id)) AS count,
       grade,
       school_id
FROM Daily_users
WHERE timestamp < '"today's date"'
GROUP BY school_id, grade

to get the unique students up to yesterday inclusive. This is a simple enough query and it does the job. However, I can't help but thinking there is lots of redundancies here. The table Daily_users today is only different by one day's worth of data from its yesterday's version, so when we do COUNT(DISTINCT(student_id)), we are re-doing a lots of the calculations we did yesterday. So my question is - Can we optimize this to at least to minimize the redundant calculations ?

Asked by dezdichado (101 rep)

Oct 13, 2023, 09:59 PM
Last activity: Oct 14, 2023, 06:15 AM

Optimizing SELECT COUNT(DISTINCT) on a table increasing daily

Related Questions