Sample Header Ad - 728x90

Optimizing SELECT COUNT(DISTINCT) on a table increasing daily

0 votes
0 answers
293 views
Let's say we have a table
which has the columns
, school_id, grade, timestamp
. We collect usage data of students daily and so the table grows daily (note that there could be multiple rows corresponding to the same student - the purpose is to track usages, so the student could be logging on and off, or using different apps etc). Using this table, we maintain another table called
and it has the columns
, grade, count.
The
is the number of unique students we have recorded for a pair of
(school_id, grade)
, so far. Then, once yesterday passes, we run the nightly query to create the table unique_students_to_date:
SELECT COUNT(DISTINCT(student_id)) AS count,
       grade,
       school_id
FROM Daily_users
WHERE timestamp < '"today's date"'
GROUP BY school_id, grade
to get the unique students up to yesterday inclusive. This is a simple enough query and it does the job. However, I can't help but thinking there is lots of redundancies here. The table Daily_users today is only different by one day's worth of data from its yesterday's version, so when we do COUNT(DISTINCT(student_id)), we are re-doing a lots of the calculations we did yesterday. So my question is - Can we optimize this to at least to minimize the redundant calculations ?
Asked by dezdichado (101 rep)
Oct 13, 2023, 09:59 PM
Last activity: Oct 14, 2023, 06:15 AM