Sample Header Ad - 728x90

Normalisation with an Underlying Hierarchy

0 votes
1 answer
155 views
I'm designing a new database schema, and every time I do, I like to check my assumptions on normalisation. Something I haven't been able to find a definitive answer for here or by searching the web is how to handle multi-level hierarchies. Best way to explain is by example. **Source Data** ref, state, city, suburb, population ------------------------- 1, ABC, x, qwe, 1234 2, ABC, y, rty, 1456 3, DEF, z, uio, 2000 If I were just normalising that at face value, I'd create four tables - Residents, States, Cities, Suburbs. However, let's add the stipulation that there's a hierarchy of State->City->Suburb. Now in order to preserve referential integrity, it should look something like: states ------- stateid, statename cities ------- cityid, stateid, cityname suburbs ------- suburbid, cityid, suburbname The question I haven't been able to answer is what should the Residents table look like? The sensible option is: ref, stateid, cityid, suburbid, population and the slightly less sensible option is: ref, suburbid, population So when it comes to normal forms, I don't think the second answer is a valid normal form. Even though suburbid seems like a valid super key, it would involve traversing upward through the hierarchy to retrieve city and state names. But then, I'm not sure the first option is valid either, because there's redundancy throughout the hierarchy - I can get the state name three ways (Residents.stateid, Cities.stateid, Suburbs.cityid->Cities.stateid) From a functional standpoint, the joins would be something like: --First Option SELECT [...] FROM Residents AS r JOIN States AS s ON s.stateid = r.stateid JOIN Cities AS c ON c.cityid = r.cityid JOIN Suburbs AS b ON b.suburbid = r.suburbid --Second Option SELECT [...] FROM Residents AS r JOIN Suburbs AS b ON b.suburbid = r.suburbid JOIN Cities AS c ON c.cityid = b.cityid JOIN States AS s ON s.stateid = c.stateid What I'm hoping to understand is many faceted: - Is there a better approach from a normalisation perspective? - Is there a better approach from a performance perspective? (Assume MSSQL if relevant) - Is there a reason to favour normal forms over performance, given that the underlying schema already preserves referential integrity? - Are there other options I should consider? Have I got it all wrong? Thanks in advance!
Asked by Vocoder (117 rep)
Jul 3, 2023, 06:39 AM
Last activity: Jul 23, 2025, 07:03 AM