Normalisation with an Underlying Hierarchy
0
votes
1
answer
155
views
I'm designing a new database schema, and every time I do, I like to check my assumptions on normalisation. Something I haven't been able to find a definitive answer for here or by searching the web is how to handle multi-level hierarchies. Best way to explain is by example.
**Source Data**
ref, state, city, suburb, population
-------------------------
1, ABC, x, qwe, 1234
2, ABC, y, rty, 1456
3, DEF, z, uio, 2000
If I were just normalising that at face value, I'd create four tables - Residents, States, Cities, Suburbs. However, let's add the stipulation that there's a hierarchy of State->City->Suburb.
Now in order to preserve referential integrity, it should look something like:
states
-------
stateid, statename
cities
-------
cityid, stateid, cityname
suburbs
-------
suburbid, cityid, suburbname
The question I haven't been able to answer is what should the Residents table look like? The sensible option is:
ref, stateid, cityid, suburbid, population
and the slightly less sensible option is:
ref, suburbid, population
So when it comes to normal forms, I don't think the second answer is a valid normal form. Even though suburbid seems like a valid super key, it would involve traversing upward through the hierarchy to retrieve city and state names.
But then, I'm not sure the first option is valid either, because there's redundancy throughout the hierarchy - I can get the state name three ways (Residents.stateid, Cities.stateid, Suburbs.cityid->Cities.stateid)
From a functional standpoint, the joins would be something like:
--First Option
SELECT [...]
FROM Residents AS r
JOIN States AS s ON s.stateid = r.stateid
JOIN Cities AS c ON c.cityid = r.cityid
JOIN Suburbs AS b ON b.suburbid = r.suburbid
--Second Option
SELECT [...]
FROM Residents AS r
JOIN Suburbs AS b ON b.suburbid = r.suburbid
JOIN Cities AS c ON c.cityid = b.cityid
JOIN States AS s ON s.stateid = c.stateid
What I'm hoping to understand is many faceted:
- Is there a better approach from a normalisation perspective?
- Is there a better approach from a performance perspective? (Assume MSSQL if relevant)
- Is there a reason to favour normal forms over performance, given that the underlying schema already preserves referential integrity?
- Are there other options I should consider? Have I got it all wrong?
Thanks in advance!
Asked by Vocoder
(117 rep)
Jul 3, 2023, 06:39 AM
Last activity: Jul 23, 2025, 07:03 AM
Last activity: Jul 23, 2025, 07:03 AM