What exactly is being cached when opening/querying a SQLite database
1
vote
1
answer
4121
views
I was asked to improve existing code to query SQLite databases. The original code made a lot of separate calls to the database and filtered the results in Python. Instead, I opted to re-write the database creation and put the filtering logic in the SQL query.
After running benchmarks on a databases of different sizes. While comparing with the original implementation I found that the average query time for
n=3
of a query was a lot faster in the new implementation (3s vs. 46 **minutes**). I suspected that this was a caching issue, but I wasn't sure of its origin. Between every query I closed the database connection and deleted any lingering Python variables and ran gc
but the out-of-this-world persisted. Then I found that it was likely the system that was caching something. Indeed, when I clear the system's cache after every iteration with echo 3 > /proc/sys/vm/drop_caches
the performance is much more in line with what I expected (2-5x speed increase compared to 80.000x speed increase).
The almost philosophical issue that I have now is what I should report as an improvement: the cached performance (as-is) or the non-cached performance (explicitly deleting cache before queries). (I'll likely report both but I am still curious about what is being cached.) I think it comes down to the question what is actually being cached. In other words: does the caching represent a real-world scenario or doesn't it at all.
I would think that if the database or its indices are cached, then the fast default performance is a good representation of the real world as it would be applicable to new, unseen queries. However, if specific queries are cached instead, then the cached performance does not reflect on unseen queries.
Note: this might be an unimportant detail but I have found that the impact of this caching is especially noticeable when using fts5 virtual tables!
Tl;dr: when the system is caching queries to SQLite, what exactly is it caching, and does that positively impact new, unseen queries?
If it matters: Ubuntu 20.04 with sqlite3.
Asked by Bram Vanroy
(183 rep)
Aug 7, 2022, 12:02 PM
Last activity: Aug 7, 2022, 12:30 PM
Last activity: Aug 7, 2022, 12:30 PM