Sample Header Ad - 728x90

Cleanup / Prune Unreferenced Data in Dimension Tables

2 votes
2 answers
340 views
We have a star schema data warehouse running on MySQL 5.6. We keep a rolling 18 months of data in our fact tables using partitions by month. We have a number of dynamic dimension tables that are referenced by multiple fact tables. However, we have no easy way to remove the rows from dimension tables that are no longer referenced by any fact table. Quick summary looks like this: dim_url - 1B rows - 360GB fact_ranks - 2.3B rows - 240GB fact_seen - 2.8B rows - 295GB Currently we are attempting to use a combination of Percona Archiver and triggers to generate "used dimension keys" tables, so we can do the process online. We then use the key table to build a new dimension that only has referenced rows. However, we have been unable to complete this process in production and estimate it could take up to a month. This has to be a common problem with a more elegant solution.
Asked by Shaun (31 rep)
Dec 20, 2016, 02:21 PM
Last activity: Jul 14, 2022, 01:05 AM