Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes

1 answers

155 views

Normalisation with an Underlying Hierarchy

I'm designing a new database schema, and every time I do, I like to check my assumptions on normalisation. Something I haven't been able to find a definitive answer for here or by searching the web is how to handle multi-level hierarchies. Best way to explain is by example. **Source Data** ref, stat...

                                  I'm designing a new database schema, and every time I do, I like to check my assumptions on normalisation. Something I haven't been able to find a definitive answer for here or by searching the web is how to handle multi-level hierarchies. Best way to explain is by example.

**Source Data**

    ref, state, city, suburb, population
    -------------------------
    1, ABC, x, qwe, 1234 
    2, ABC, y, rty, 1456
    3, DEF, z, uio, 2000 

If I were just normalising that at face value, I'd create four tables - Residents, States, Cities, Suburbs. However, let's add the stipulation that there's a hierarchy of State->City->Suburb.

Now in order to preserve referential integrity, it should look something like:

    states
    -------
    stateid, statename
    
    cities
    -------
    cityid, stateid, cityname
    
    suburbs
    -------
    suburbid, cityid, suburbname

The question I haven't been able to answer is what should the Residents table look like? The sensible option is:

    ref, stateid, cityid, suburbid, population

and the slightly less sensible option is:

    ref, suburbid, population

So when it comes to normal forms, I don't think the second answer is a valid normal form. Even though suburbid seems like a valid super key, it would involve traversing upward through the hierarchy to retrieve city and state names.

But then, I'm not sure the first option is valid either, because there's redundancy throughout the hierarchy - I can get the state name three ways (Residents.stateid, Cities.stateid, Suburbs.cityid->Cities.stateid)

From a functional standpoint, the joins would be something like:

    --First Option
    SELECT [...] 
    FROM Residents AS r
    JOIN States AS s ON s.stateid = r.stateid
    JOIN Cities AS c ON c.cityid = r.cityid
    JOIN Suburbs AS b ON b.suburbid = r.suburbid
    
    --Second Option
    SELECT [...] 
    FROM Residents AS r
    JOIN Suburbs AS b ON b.suburbid = r.suburbid
    JOIN Cities AS c ON c.cityid = b.cityid
    JOIN States AS s ON s.stateid = c.stateid

What I'm hoping to understand is many faceted:

 - Is there a better approach from a normalisation perspective?
 - Is there a better approach from a performance perspective? (Assume MSSQL if relevant)
 - Is there a reason to favour normal forms over performance, given that the underlying schema already preserves referential integrity?
 - Are there other options I should consider? Have I got it all wrong?

Thanks in advance!
                                

Vocoder (117 rep)

Jul 3, 2023, 06:39 AM • Last activity: Jul 23, 2025, 07:03 AM

0 votes

2 answers

2353 views

While Loop - Parent/Child Tree

mysql hierarchy recursive

I'm trying to recursively loop through and return all `child_id`'s that have the root element of `9`. **The structure:** ->9 ->->8 ->->->17 ->->22 ->->->11 **Parent Child Link Table:** +----+-----------+----------+ | id | parent_id | child_id | +----+-----------+----------+ | 1 | 9 | 8 | | 2 | 8 | 1...

                                  I'm trying to recursively loop through and return all child_id's that have the root element of 9. 

**The structure:**

    ->9
    ->->8
    ->->->17
    ->->22
    ->->->11

**Parent Child Link Table:**

    +----+-----------+----------+
    | id | parent_id | child_id |
    +----+-----------+----------+
    |  1 |         9 |        8 |
    |  2 |         8 |       17 |
    |  3 |         8 |       33 |
    |  4 |         8 |       18 |
    |  5 |         9 |       22 |
    |  6 |        22 |       11 |
    |  7 |        22 |        4 |
    |  8 |         3 |        5 |
    +----+-----------+----------+

**Procedure (so far):**

    BEGIN
    
    DECLARE x INT(11)
    
    SET x = 0;
    SET @elements = "";
    SET @node = _root_; -- 9
    SET @child_count = count_children(@node) -- function returning the child count of @node;
    SET @children = get_children(@node); -- function returning the child id's of @node
    
    -- check IF node has children
    WHILE x <= @child_count DO
    	SET @elements = CONCAT(@elements,x,',');
    	SET x = x + 1;
    END WHILE
    SELECT @elements;
    
    END

**Desired Output:** [8,17,33,18,22,11,4]


**Question:** How can I modify my procedure to be able to return all child_id's of the parent?
                                

Jordan Davis (101 rep)

Jun 5, 2017, 09:53 PM • Last activity: Jul 21, 2025, 10:10 PM

0 votes

0 answers

47 views

Conceptual data modelling inheritance notation

database-design hierarchy

I am drawing the conceptual data model between `vehicle`, `car` and `motorbike`. As far as I know. **Hierarchy**. The simple "is a" feature. [![enter image description here][1]][1] **Disjunction** means that each ocurrence of superentity should only one subentity. Its notation is a "d" inside the tr...

                                  I am drawing the conceptual data model between vehicle, car and motorbike. As far as I know.

**Hierarchy**. The simple "is a" feature.

**Disjunction** means that each ocurrence of superentity should only one subentity. Its notation is a "d" inside the triangle

**Total restriction** means that each ocurrence of superentity (vehicle) should be a subentity (car or motorbike). Its notation is double line. In this case Im not sure what to put inside the triangle.

Are all of them valid? Which of these notations are **incorrect**?

user69507 (1 rep)

Mar 11, 2025, 01:59 PM

1 votes

2 answers

551 views

Hierarchical data with versioning

database-recommendation hierarchy

I have a domain where I deal with data which has a parent/child relationship, of arbitrary depth. Also, full time traversal needs to be enabled, to show the state of the data at a specific point in time. Currently I am looking at two different types of databases, and was curious as to which would be...

                                  I have a domain where I deal with data which has a parent/child relationship, of arbitrary depth.

Also, full time traversal needs to be enabled, to show the state of the data at a specific point in time. 

Currently I am looking at two different types of databases, and was curious as to which would be more appropriate, or how I would overcome some issues in either of the solutions. 

Any links to articles regarding this information would also be appreciated.

Solution 1: rdbms

 + good fit for time versioning with a solution like temporal tables in sql server (although the db needs to be open source and free, so I would have to add this to something like PostgreSQL)

 - not that great a fit for arbitrary hierarchical data; need to implement adherent list/nested set

Solution 2: graphdb 

 + natural fit for hierarchical data

 + I don’t know a proper way to model time versioning in a way which is performant.

So I am looking on some feedback on the advantages/disadvantages of either database type, and how to overcome some of the shortcomings of either.

I personally was leaning towards an graphdb solution, where I add a start_time and end_time attribute to all nodes and relationships, but I am not sure about the performance and if there are any better ways to get time versioning.

The two main considerations are of course performance and simplicity of the query.

I realize this is a rather open question; I am merely looking to see if I am overthinking this or perhaps fail to take some other positives/negatives inti account.

moi2877 (19 rep)

Apr 17, 2020, 07:06 AM • Last activity: Mar 7, 2025, 04:06 AM

2 votes

1 answers

614 views

SSAS Cube for tracking changes in parent child relationship over time

ssas hierarchy cube

I would like to build an SSAS cube which tracks how objects in a graph who's edges represent a "belongs to" relationship change over time (daily). There are two components to the change: 1. which object belongs to which 2. attributes of each object. Here is my schema: fact.Edge: the_date date parent...

                                  I would like to build an SSAS cube which tracks how objects in a graph who's edges represent a "belongs to" relationship change over time (daily). There are two components to the change:

 1. which object belongs to which
 2. attributes of each object.

Here is my schema:

    fact.Edge:
        the_date date
        parent_id int
        child_id int

    fact.Vertex:
        the_date date
        id int
        attribute1 int
        attribute2 int
        ...
        attributen int

    dim.attribute{1...n}:
       id int
       value1 nvarchar(64)
       value2 nvarchar(64)
       ...
       valuem nvarchar(64)

These tables get new data once daily. If nothing changes, then there are two copies of the exact same data in the two fact tables with sequential dates.

I would like to know if it is possible to define a parent child hierarchy in SSAS based on the fact.Edge table referencing itself (via child_id->parent_id) but also only when the_date = the_date. 

I am new to SSAS, but it seems only one attribute can be the parent attribute. Are there any workarounds?

Additionally, is it possible to treat the vertex table as two "fact" related dimensions -- ie parent_vertex and child_vertex? Or else do I need to include edges with either a null parent_id or null child_id and choose the other to have the only vertex reference?

If my questions don't quite make sense (likely due to my limited SSAS experience), is there an example cube definition that demonstrates best practices for this case?

I'd appreciate any insights you might have!
                                

Jonny (121 rep)

Sep 29, 2015, 11:31 PM • Last activity: Feb 14, 2025, 12:01 PM

0 votes

2 answers

79 views

Schema to model user authorization in hierarchical data

mysql role hierarchy access-control

I'm trying to add the ability for a user to be authorized with a given role on a contract/site/experiment. [![enter image description here][1]][1] More specifically, I want the ability for a user to have role A on a contract but at the same time role B on a specific site of this same contract and po...

                                  I'm trying to add the ability for a user to be authorized with a given role on a contract/site/experiment.

More specifically, I want the ability for a user to have role A on a contract but at the same time role B on a specific site of this same contract and possibly a role C on an experiment of this site.

My first idea was a join table between User and Role with a third column storing the id 
 of either a contract, site or experiment. 

This solution has many problems in my opinion:
 - I need a row for the contract, each authorized sites and each authorized experiments. It can grow very large very quickly.
 - No integrity. A non existant contract can be referenced
 - If I want to know all the sites and experiments of a contract where a user is authorized I have to first query all sites of this contract, then all the experiments of these sites and then find all the rows referencing those ids. It seems a bit hacky.

I feel like this solution could work but I was wondering if there were other way? Maybe some pattern I am not aware of?

PS: I am using MySQL v8

Renaud Aubert (111 rep)

Oct 18, 2024, 01:55 PM • Last activity: Oct 25, 2024, 10:19 AM

2 votes

1 answers

3639 views

How to query all entities associated with all children (recursive) of a node in SQL?

mysql performance hierarchy recursive mysql-8.0

This question is about SQL in general. Answering specifically for MySQL would be helpful but not necessary. --- Ok, I’m having trouble putting this into words… so bear with me. Say I have a tree of things (I’ll call them nodes), with a table that looks something like this (the structure can be chang...

                                  This question is about SQL in general. Answering specifically for MySQL would be helpful but not necessary.

---

Ok, I’m having trouble putting this into words… so bear with me.

Say I have a tree of things (I’ll call them nodes), with a table that looks something like this (the structure can be changed; this is just a simple version):

    +---------+-----------+
    | node_id | parent_id |
    +---------+-----------+
    | 1       | NULL      |
    | 2       | NULL      |
    | 3       | 1         |
    | 4       | 1         |
    | 5       | 5         |
    | 7       | 2         |
    | 8       | 5         |
    +---------+-----------+

Then I have a table of entities. Each entity is associated with a certain node. For example (again, the structure can be changed):

    +-----------+---------+
    | entity_id | node_id |
    +-----------+---------+
    | 1         | 1       |
    | 2         | 1       |
    | 3         | 4       |
    | 4         | 7       |
    | ...many more rows   |
    +-----------+---------+

This structure could represent a lot of things, e.g. each entity is a movie, while each node is a genre (but there can be unlimited levels of sub-genres).

So retrieving all the entities for a given node is simple; just running a query on the entities table for a specified node_id.

Here’s my question: how would I query the entities table for all entities associated with a given node, *and* all it’s children nodes (every level, recursively). In the movies example, I want to find all movies for a particular genre and all its sub-genres, and its sub-genres, etc.

I mentioned the word recursive, but that doesn’t mean that the query should be recursive, but conceptually it is. The query should be as fast as possible. The structure of the tables may need to be changed as well.

Thanks for your help!
                                

Luke (143 rep)

Mar 7, 2019, 07:59 AM • Last activity: Aug 28, 2024, 05:47 AM

0 votes

0 answers

84 views

Modeling hierarchical data for multiple entities, inherited properties, and varying hierarchy

database-design hierarchy tree

I am trying to model hierarchical data using a RDBMS but can't quite find the right solution. We are capturing information about storage levels in a warehouse system. There are 8 possible storage levels, each with a parent-child relationship to levels above and below. Stock is then associated with t...

Warehouse 1
  Level 1
    Level 2
      Level 3
        Levels 4-7...
          Level 8 -> associated with stock

You will always have Level 8, but you may not always have every parent Level. Here is the added complexity that is confusing me: 1. Levels can be skipped, making the hierarchy ragged as opposed to balanced. For example, a Level 6 might have Level 4 as a parent and skip Level 5. 2. The parent to a Level MUST be a level above in the hierarchy, i.e., Level 5 cannot have Level 6 as a parent; it can only be Levels 4 or less. 3. Child Levels inherit the properties of their parent. This means that if we designate a Level for "Receiving", this same property will be inherited by all child Levels. 4. Levels can be disabled by a parent. This means in a structure where you have Level 1 -> Level 2 -> Level 8, the user may later disable Level 2, making the hierarchy now Level 1 -> Level 8 and we will need to readjust the parent of all Level 8 entries in the lineage. 5. As a reminder, we always end up at Level 8. I have researched a lot but am still struggling with some things. Here's the open items I'm not sure about. 1. One route I can think of is to put all Levels in the same table, and prepare a seemingly standard adjacency list model where each Level points to its parent. I'm not sure if this is the best approach due to the added constraints based on Level Type; i.e., Levels cannot choose lower Levels as a parent, and also some parents may have disabled certain child Levels. We also need to manage inherited properties at each Level based on the properties of its parent. 2. I'm leaning towards placing each Level in a separate table, and then transforming the ragged hierarchy into a balanced hierarchy by using dummy Levels (unseen to the user) when they are skipped. In this way, Level 3 is always the parent to Level 4, for example, even if the user selects Level 2 as the parent of Level 4. This was a solution I read about in this SE post . If you have better ideas or insights to share, please do. I'm new at this and eager to learn so I apologize if some of my questions appear to be basic.

Thomas (1 rep)

Aug 2, 2024, 05:50 PM • Last activity: Aug 2, 2024, 05:54 PM

0 votes

2 answers

7154 views

System/database design for comments/replies and upvotes at scale

postgresql database-design high-availability hierarchy relations

Recently I started discovering a topic around "How to design db schema for storing structures similar to Instagram/Facebook/Reddit comments?". After extensive research, I was able to find a bunch of different answers on SO, SE, medium articles and etc. Meanwhile, all of these articles were pretty ba...

                                  Recently I started discovering a topic around "How to design db schema for storing structures similar to Instagram/Facebook/Reddit comments?".

After extensive research, I was able to find a bunch of different answers on SO, SE, medium articles and etc. Meanwhile, all of these articles were pretty basic and always point out a Closure table pattern , which I used once back in the day.

I did implement a comments/replies system only once a few years ago using PostgreSQL and since then the product is already not in production, so I don't know how my solution would scale in a data-intensive environment. 

Therefore, I decided to ask a specific question with specific requirements and constraints, so I could probably get a hint from someone who had this experience in production!

Here we go with two different tasks:

**Task 1**

*Requirements:*
 - When I open a post I see only the first level of comments. In particular, 50 most liked comments are ordered ascending by the number of likes.
 - For every top-level comment: if a comment has only one reply - display this reply too.
 - For every top-level comment: if a comment has multiple replies, display only the one which is the most liked.
 - When the user clicks on "more replies": Display replies in descending order by their created_datetime.
 - The max depth is 2: Only the top-level comment and replies to it can exist. Replies to lower-level comments(depth == 0) should always be displayed near their parents. The only thing that distinguishes them is just a mention of a user you reply to, like @ on instagram.

*Questions(only related to the design of relational database with Closure table):*
 - What are the problems you faced in **production** with it and how you had to fix them? What would you recommend to people who just start with this, what should they spend their time on at the beginning to prevent a cascade of mess in the future?
 - Is there a better pattern with RDBS nowadays for this purpose?


Let's imagine the system grows. We don't talk about thousands of requests, but we talk about hundreds of thousands of comments and replies to them. E.g. some celebrity posted a message and then all the fans started replying, having conversations and etc. It results in a lot
of rows in our records in both the comment and closure tables. Our queries to group by amount of likes start getting much slower on some posts, causing long-running transactions which cause a ton of mess and even probably downtimes.
Again, that's what it looks to me that could happen if we just use a closure table. But what really happens? Curious to hear stories of people who had problems with it in really data-intensive applications.
E.g. We can shard the table somehow, right? Or for really big posts we could cache a lot of stuff, right?

**Task 2**
 - The main difference to the first one: When I open a post I see 50 most liked comments but with all their children. Meaning I fetch the whole tree for these 50 first comments. Depth is not limited.

*Questions(only related to the design of relational database with Closure table):*
- Should we simplify the logic and become less ambitious, so we would go with business requirements similar to the ones in task 1? (when we don't have infinite depth and comments trees can grow only in width) I assume otherwise this is almost impossible to scale such a business logic when there are millions or billions of comments.
- If the answer to the first question is no, how the magic happens then? ( I don't believe that such product requirements could be scalable while infrastructure would still stay profitable; costs would grow exponentially imo)



**General questions to be answered first:**
 - Is a relational database still a case for such a problem nowadays? I don't know much about graph databases, but wouldn't it be optimal to store such hierarchical data there? Probably I just need to discover graph databases deeper, so please feel free to link the related articles. Doesn't seem I found them in a week, so I would definitely need help with finding the right materials :) 


**To sum up:**
I understand that my questions may seem pretty vague, but they are also quite complex and require the knowledge of someone who had this experience. Meanwhile, I am also quite opinionated on some topics (like growth of costs/sharding/caching) and that's why it is even more difficult for me to compile the opinion - I wanna have more thoughts gathered, not only mine.

In case you think an extensive answer would take too much of your time - please give me just short answers like yes or no and just link all the resources you think could really help me to build my opinion on this topic. 
Sharing your real production experience of working with such systems would be really helpful and appreciated! 

Thanks!
                                

IDobrodushniy (1 rep)

Sep 10, 2022, 05:06 PM • Last activity: Mar 28, 2024, 08:24 AM

0 votes

1 answers

365 views

What is the best way in Postgres to inherit values from parent row recursively if not provided?

postgresql recursive hierarchy

Is there a pattern in SQL where a child row inherits empty (null?) values from a parent? E.g., given the following 'chickens' table (sorry): | id | parent_id | name | flys | noise | weight | egg_color | special_feathers | |----|-----------|----------------|-------|---------|--------|-----------|------------------| | 1 | | chicken | true | clucks | 5lbs | white | false | | 2 | 1 | maran | | | | brown | | | 3 | 1 | silkie | false | squeeks | 4lbs | | | | 4 | 3 | silkie frizzle | | | | | true | I want to easily be able to do:

select * from chickens where egg_color = 'white'

and have it return all the chickens except for the maran. I don't want to explicitly set the values in child rows because I want to be able to update the parent data and have it updated everywhere. I think this can be accomplished via a recursive view, however I am curious if there are any other solutions. I am also not sure how to handle explicitly empty states other than with an empty string (which won't work for non-string fields). This might not be something we need to accommodate but want to see if anyone has ideas.

Sabrina Leggett (101 rep)

Mar 21, 2024, 02:23 AM • Last activity: Mar 23, 2024, 05:22 AM

0 votes

0 answers

268 views

Recursive CTE hiearchy snowflake -- join/ expand outward (columns) instead of rows?

cte recursive hierarchy snowflake

I'm doing a dependency mapping explosion (like parts explosion). I did a typical recursive CTE with a union all. It looks like with CTE as ( select abc from myTable where start_point = X union all select abc from CTE join myTable where myTable.parent = CTE.child ) select * from CTE ... However this...

                                  I'm doing a dependency mapping explosion (like parts explosion).

I did a typical recursive CTE with a union all.

It looks like


    with CTE as
    
    ( select abc from myTable where start_point = X
    union all
    select abc from CTE join myTable where myTable.parent = CTE.child
    )
    select * from CTE

... However this ends up with a list like

    Root -> Child 1
    Root -> Child 2
    Root -> Child 3
    Child 1-> 1 Grandchild 1
    Child 2 -> 2 Grandchild 1
    Child 2 -> 2 Grandchild 2

I'd prefer it looked like

    Root -> Child 1 > Grandchild 1
    Root -> Child 2 - >2 Grandchild 1
    Root -> Child 2 -> 2 Grandchild 2
    Root -> Child 3

It's like ... I need a recursive join, not a recursive union -- but when I replace the union with a join, I can't quite get it to work. Any ideas?


                                

user45867 (1739 rep)

Mar 14, 2024, 11:22 PM

0 votes

1 answers

236 views

SQL Pattern to get "root/ start" of uneven hierarchy dataset

join recursive hierarchy snowflake

I feel this is a common problem and I've seen it in some SQL challenges even but for the life of me, cannot think clearly about a solution. Say you have an uneven hierarchy. Elements that belong to other elements but you don't know the top. Let's say it's a Company Org Chart to keep it simple (reall...

                                  I feel this is a common problem and I've seen it in some SQL challenges even but for the life of me, cannot think clearly about a solution.

Say you have an uneven hierarchy. Elements that belong to other elements but you don't know the top.

Let's say it's a Company Org Chart to keep it simple (really it's task dependencies but eh).

So there's a table. Employee Name and Boss Name.

    Employee name: Bob .... Boss Name: Dora
    Employee name Dora .... Boss Name: Kim

And on and on. In my case there is an added piece of information. One person only reports to one person ever. One-to-one relationship.

There are N elements at the top of the chain that have Name: Whoever Boss: Null.

So I was doing something as follows:

    select employee_name, boss_name
    from boss_table b1
    left join boss_table b2 on b1.boss_name = b2.employee_name
    left join boss_table b3 on b2.boss_name = b3.employee_name

And on and on to attempt to find the 'Root Boss' or 'Top Boss' of each employee.
However some of these nested elements are VERY deep -- I don't want to do 20 joins ... or at least type them out --
I feel a recursive function is the obvious answer, but can't figure it out -- thoughts?
                                

user45867 (1739 rep)

Nov 14, 2023, 03:28 PM • Last activity: Nov 14, 2023, 05:12 PM

0 votes

1 answers

628 views

What is the most straightforward way to create subtypes and supertypes at the same time?

database-design insert hierarchy relations subtypes

I hope this isn't too basic a question, but I am hoping that the DB engineers here will help me out. Do relational DBs usually have a straightforward way of doing the following?: 1. Create a supertable with a UUID as the primary key 2. Create several subtables where each have as their primary key the primary key from the supertable as a foreign key. 3. When I add a row to a subtable, a corresponding row is created in the supertable. Probably by automatically creating the supertable row first before creating a new row in the subtable with it as a foreign key. In my case, I am designing database that will store several types of post (text, video, image). Each (sub)type of post has unique attributes, and thus should get its own table. However, I want all posts to be organized by a GUID in a supertable that contains attributes shared by all post subtypes. I've been able to set up these tables, but I dislike the fact that I cannot enter into a subtable a post of a given subtype and automatically populate the supertable. It's been suggested that I just create both supertable and subtable entries with one command like this:

with new_post as (
  insert into posts (name) values ('My new video post')
  returning id
)
insert into videos (guid)
select id 
from new_post;

However this seems awkward. I know there are lots of ways to set up relations and backfill columns in other tables, yet I cannot seem to find an example of this particular type of backfilling relationship. I happen to be using Postgres via SQLAlchemy, but an answer in general terms about how this problem would be approached in any DB would be welcome.

Logos Masters (3 rep)

Jul 22, 2023, 06:39 PM • Last activity: Jul 23, 2023, 03:56 PM

0 votes

1 answers

31 views

Database for inheritance hierarchy queries

hierarchy

I'm looking for a way to store hierarchical data, in a way that will facilitate "inheritance" queries that will coalesce upwards in the hierarchy. Simple example: ``` Level 1, property X = 123 - Level 2, property x = null - - Level 3, property x = null ``` What is the effective value of property x a...

I'm looking for a way to store hierarchical data, in a way that will facilitate "inheritance" queries that will coalesce upwards in the hierarchy. Simple example:

Level 1, property X = 123
 - Level 2, property x  = null
 - - Level 3, property x = null

What is the effective value of property x at level 3? Answer: 123 (because it coalesces up to level 1). Is there a database engine that is purposely made for this type of structure? This can be done relationally, or possibly in a graph, with some effort, but I am trying to find an engine natively optimized for this.

Ouananiche (101 rep)

Jul 7, 2023, 05:23 PM • Last activity: Jul 7, 2023, 05:55 PM

-1 votes

1 answers

38 views

Modeling combinations (options and dependencies)

postgresql database-design hierarchy

How would you modelize the following data? I have a solution I wrote in a separate answer. Don't forget to read it. Context ----------------- It's for a website. Speed is of the essence. Data ------- * You have objects that are linked to many tables describing their properties, such as an object__tag table. Any object can be paired with a color. * Thing is, objects can be made of objects. I call them superobjects when they assume that container role (as objects, they can be part of other superobjects). Objects can be shuffled in all sorts of unpredictable ways because it is less about how they are made than how they are sold. It is therefore useless to think that if one object is contained inside some superobject A, if that superobject A will be contained inside another superobject B, superobject B will inherit all objects from superobject A. * In a superobject, multiple objects can be grouped together into a set of options. One option can be made of multiple objects (majority of the time, just one). You can choose how many options minimum and maximum (aka a range) you can choose from a set (majority of the time, just one). It's possible to have a NONE choice aka you don't choose any option. * Any object can have 0 (common), 1 (common), or more (very rare) dependencies. Note: the bit about sets of options having a min-max selection should be ignored safely has it can be converted in single-selection options. Example: options A, B, C with zero to two selections is, in the end, just A, B, C, AB, AC, BC, NONE. The bit about NONE should also be safely ignored as I am building a search engine which doesn't care about the absence of a certain option. Dummy example --------- Think of cars (don't get too worked up on that example as it's a dummy one and I don't know a thing about cars). You can buy one red, green, or yellow (3 options). You can choose manual or automatic transmission (2 options). If you buy the manual transmission, you can choose hubcaps (hubcaps are dependent on manual transmission). Tree representation -------------- Think of "( )" as radio buttons. Empty line breaks are separating sets of options. X indicates mandatory nodes (unless their parent is not chosen). "(n-m)" represents the min-max range for a set of options.

**Car A (superobject)**

(1-1) Car A + Green
(1-1) Car A + Yellow
(1-1) Car A + Red

(x) Engine V2000

(1-1) Manual transmission
│ (x) Speakers
│
│ (1-2) 20-inch wheels
│ (1-2) 18-inch wheels + Hubcaps
│
│ (0-1) Leather seats + Black
│ (0-1) Vinyl seats + Yellow {requires Car A + Yellow}
│
(1-1) Automatic transmission
│ (x) Leather seats + Blue
│
│ (x) 20-inch wheels

A user can customize their car however they want. - Yellow car A + Engine V2000 + Manual transmission + Speakers + 20-inch wheels + Yellow vinyl seats is one legal combination. - Green car A + Engine V2000 + Automatic transmission + Blue leather seats + 20-inch wheels is another one. - Yellow car A + Engine V2000 + Automatic transmission + 18-inch wheels + Hubcaps *is an illegal combination.* It doesn't matter in which order the transmission or the engine appear in the tree. They can switch place and that is why this is not a proper hierarchy. You go down the tree and must check every branch until you reach the last leaf, choosing options as necessary, before jumping back on the trunk until you reach the roots. Purpose ---------- I don't really care at the moment as of how to render that tree above as it doesn't appear to be a big challenge. What I care is building a performant search engine to find superobjects given columns of objects (subqueries) selected through joining objects with their related tableS such as object__tag. Given the previous example, when I search for Green car A + Vinyl seats, I must not find superobject Car A. Of course, you have to understand that I will not search for one specific object "vinyl seats" but for tag "vinyl seats" which will give me **a column of thousands of objects**. And same, I will search for color "green" linked to object of class "car" and, again, it will return this time **a column of pairs object-color**. The combination of these two or much more subqueries must return superobjects that fit the constraints. Because of that, the model must stay normalized as I believe you can't hit an array of id, or a ltree with such subqueries. Problem ----------- The hierarchy part is not as hard to modelize as it seems. Each superobject is pretty flat. If you think in term of paths (like in materialized paths), after putting the objects at the root one after the other, you don't get many paths. With the above example, you only have three paths because of the transmission. * Green/Yellow/Red Car A > Engine V2000 > Manual transmission > Speakers > 20-inch wheels/18-inch wheels + Hubcaps > Black Leather seats * Yellow Car A > Engine V2000 > Manual transmission > Speakers > 20-inch wheels/18-inch wheels + Hubcaps > Yellow vinyl seats * Green/Yellow/Red Car A > Engine V2000 > Automatic transmission > Blue leather seats > 20-inch wheels In fact, seven, as > (1-2) 20-inch wheels > (1-2) 18-inch wheels + Hubcaps will be modelized in my solution as * (Manual transmission > Speakers >) 20-inch wheels * (Manual transmission > Speakers >) 18-inch wheels > Hubcaps * (Manual transmission > Speakers >) 20-inch wheels > 18-inch wheels > Hubcaps as I can't add multiple objects to a node. I have to expand the more-than-one-selectable-option into further branches. Now, the big problem is that each new set of options makes the hierarchy grows exponentially. If you have 3 options, then 4, then 5, the tree will have 60 paths. If you add another one of 3 it's 180. I estimated that you have some freak superobjects that could spawn more than 1000 possibilities. It's insane just to check few sets of options! I just need some freak exceptions I never imagined would exist to clog my database, meaning I would have to limit the number of combinations a superobject could have to be on the safe side. And how do you even begin being efficient if it's to throw at such a hierarchy four columns of object.id to find paths that hit a combination?? If I have to check recursively hundreds of paths that do not even have a predictable order just to find a list of superobjects, I don't see my website becoming very performant.

Some_user (61 rep)

Jun 2, 2023, 10:11 AM • Last activity: Jun 4, 2023, 08:55 PM

1 votes

0 answers

221 views

How to write query to link complex Parent-Child relationships

sql-server t-sql cte recursive hierarchy

We have a table that has a DefectID field, and a PreviousDefectID field, in order to link defect records to each other as they change through time. I need to link these fields together by creating a view that will have a third column, maybe called GroupID, so that each chain will receive its own GroupID. There are three different types of cases that need to be solved. 1. Simple case of A->B->C. In the view, all three of these records would receive the same GroupID 2. Two or more DefectIDs come from the same PreviousDefectID. In this case, the view would have a different GroupID for each new branch that is created. This can keep happening in more than one generation. 3. A DefectID has two or more PreviousDefectIDs (comma seprated) So, the following table:

CREATE TABLE #Temp (
ID VARCHAR(50),
PreviousID VARCHAR(50))

INSERT INTO #Temp (ID, PreviousID)
VALUES ('A', NULL),
       ('B', 'A'),
       ('C', 'B'),
       ('D', NULL),
       ('E', 'D'),
       ('F', 'D'),
       ('G', 'D'),
       ('H', 'E'),
       ('I', 'E'),
       ('K', NULL),
       ('L', NULL),
       ('M', NULL),
       ('N', 'L,M'),
       ('O', 'K,N')

Would result in a view that looks like the following table:

CREATE TABLE #Final (
GroupID INT,
ID VARCHAR(50),
PreviousID VARCHAR(50))

INSERT INTO #Final (GroupID, ID, PreviousID)
VALUES (1, 'A', NULL),
       (1, 'B', 'A'),
       (1, 'C', 'B'),
       (2, 'D', NULL),
       (3, 'D', NULL),
       (4, 'D', NULL),
       (2, 'E', 'D'),
       (3, 'F', 'D'),
       (4, 'G', 'D'),
       (7, 'D', NULL),
       (7, 'E', 'D'),
       (7, 'H', 'E'),
       (7, 'I', 'E'),
       (5, 'K', NULL),
       (5, 'L', NULL),
       (5, 'M', NULL),
       (5, 'N', 'L,M'),
       (5, 'O', 'K,N')

Note, there are more records in the resulting view than in the table, because when a single DefectID leads to multiple other DefectIDs, there is a new branch created for each chain and each gets their own GroupID number. I know this is a complicated question, any help at all would be appreciated. I am attaching what I have so far, but it isn't quite correct, and doesn't yet address the third type of situation.

select ID,
PreviousID, count(PreviousID) over (partition by ID, PreviousID) as 'cnt', null
from #temp temp
union all
(
select temp.ID, temp.PreviousID, null, row_number () over (partition by temp.ID, temp.PreviousID order by temp.PreviousID) as 'rn'
from #temp temp inner join #temp temp1 on temp.ID = temp1.PreviousID
)

order by 1

MM3000 (11 rep)

Apr 10, 2023, 11:07 PM • Last activity: Apr 14, 2023, 01:02 AM

-1 votes

1 answers

144 views

Creating a hierarchy from text

sql-server sql-server-2012 hierarchy

I'm trying to create a hierarchy from a text field I have in my table. Visually it is relatively easy to see who is parent and child. For example: (*see a screenshot of the table results*) 1M -----1M5 ----------1M5.5445 ----------1M5.6 AC -----AC ----------AC.1 ----------AC.2 ----------AC.3 AD -----...

                                  I'm trying to create a hierarchy from a text field I have in my table. 

Visually it is relatively easy to see who is parent and child.

For example: (*see a screenshot of the table results*)

    1M
    -----1M5
    ----------1M5.5445
    ----------1M5.6
    AC
    -----AC
    ----------AC.1
    ----------AC.2
    ----------AC.3
    AD
    -----AD.1
    -----AD.2
    -----AD.3


Note that the dot in the string is not always present and constant.

How can I proceed in order to build this hierarchy of parent, child which can have several levels?



Microsoft SQL Server 2012 (SP2-GDR) (KB3194719) - 11.0.5388.0 (X64)
                                

Olivier (1 rep)

Mar 22, 2023, 03:41 PM • Last activity: Mar 27, 2023, 10:39 AM

1 votes

0 answers

237 views

MySQL: Hierarchical Organization of Tables

mysql database-design tablespaces hierarchy

There are several SE Q&As regarding how to hierarchically organize tables in MySQL: - [How to use a naming convention for large databases?](https://stackoverflow.com/questions/2603120/how-to-use-a-naming-convention-for-large-databases) Asked 12 years, 11 months ago - [How can I organize a glut of mysql tables?](https://stackoverflow.com/questions/4217157/how-can-i-organize-a-glut-of-mysql-tables) Asked 12 years, 3 months ago - [Grouping tables within a MySQL database](https://stackoverflow.com/questions/4465468/grouping-tables-within-a-mysql-database) Asked 12 years, 2 months ago - [Is it possible to organize tables or others object in folder?](https://stackoverflow.com/questions/20226247/is-it-possible-to-organize-tables-or-others-object-in-folder) Asked 9 years, 3 months ago - [How to organize MySQL tables names](https://stackoverflow.com/questions/33028910/how-to-organize-mysql-tables-names) Asked 7 years, 4 months ago - [MySQL Logical grouping of tables](https://dba.stackexchange.com/questions/173417/mysql-logical-grouping-of-tables) Asked 5 years, 9 months ago - [MySQL namespace in create table](https://stackoverflow.com/questions/57885126/mysql-namespace-in-create-table) Asked 3 years, 5 months ago Most answers revolve around something like: - There is no concept of hierarchy in MySQL (ie: no folders, containers nor namespaces for the databases). - One solution is to keep related tables together by naming conventions (ex: use the same prefix hr_ for tables primarily related to Human Resources). - One hack is to use different databases for different sets of tables. However, this can create more problems than it is worth. - "I have not found a need for an extra layer of grouping.", _a long time MySQL user_ ### Questions 1. Notice that the Q&As links above are considerably old (3~13yo). Are the aforementioned answers still up to date? 2. Regarding the "hack" of using different databases for different sets of tables. What are the pros and cons? 3. Some databases have namespaces (ex: PostgreSQL) which can be used like:

SELECT * FROM articles;
   SELECT * FROM articles.comments;
   SELECT * FROM articles.author;

Is there any reason why MySQL doesn't have any such features? How does the MySQL's devs expect users to "organize a glut of mysql tables"?

Jeron Baffom (11 rep)

Mar 4, 2023, 09:24 PM • Last activity: Mar 5, 2023, 11:21 AM

0 votes

1 answers

43 views

sql query for category heirarchy validations

sql-server query hierarchy

I need to add validation on category creation. **CASE 1:** `parentId` should be valid if supplied **CASE 2:** `name` of sibling could not be duplicated I have this table: **(categories)** id | parentId | name -----|-----------|------ 1 | NULL | CatA 2 | 1 | CatA.1 (Note: My parent child hierarchy ca...

                                  I need to add validation on category creation.

**CASE 1:** parentId should be valid if supplied

**CASE 2:** name of sibling could not be duplicated

I have this table: **(categories)**

        id 	| parentId 	| name 	
       -----|-----------|------
        1 	| NULL 		| CatA
        2 	| 1 		| CatA.1

(Note: My parent child hierarchy can go up-to nth level)

Now in the above scenario what should not be allowed is:

 1. I cannot provide an invalid parentId
 2. I cannot create a category with name: CatA where parentId = null
 3. I cannot create a category where name: CatA.1 where parentId = 1

Now I am in a nodejs so I need to return these 2 validations errors:

1) The provided parentId is invalid
2) Duplicate name detected

Now I want to achieve this using a single optimized SQL query. 
I can use if else statements later on the base of query response.
But for me it is really important that I use single query and that query should be as opptimized as possible.

What I tried so far is:

    SELECT
        TOP 1 parentId,
        name,
        (
            CASE
                WHEN name = 'CatA.2' THEN 1
                ELSE 0
        ) sortOrder
    FROM
        catagories
    WHERE
        parentId = 1
    ORDER BY
        sortOrder 
    DESC

Now the issue with  my query is that it doesn't cover all the scenarios.

Can anyone help me with the query?
                                

StormTrooper (103 rep)

Jan 22, 2023, 03:36 PM • Last activity: Jan 22, 2023, 07:28 PM

0 votes

3 answers

369 views

Information sources for multiple hierachy trees in a single table

database-theory hierarchy tree

I need some quality sources - books, websites etc. - to educate myself about putting multiple hierarchy trees in a single table using SQL. I'm looking for some good theoretical and practical information.

                                  I need some quality sources - books, websites etc. - to educate myself about putting multiple hierarchy trees in a single table using SQL. 

I'm looking for some good theoretical and practical information.

Wozilla (1 rep)

Sep 22, 2012, 06:28 PM • Last activity: Aug 15, 2022, 06:19 PM

Showing page 1 of 20 total questions