Modeling combinations (options and dependencies)
-1
votes
1
answer
38
views
How would you modelize the following data? I have a solution I wrote in a separate answer. Don't forget to read it.
Context
-----------------
It's for a website. Speed is of the essence.
Data
-------
* You have objects that are linked to many tables describing their properties, such as an object__tag table. Any object can be paired with a color.
* Thing is, objects can be made of objects. I call them superobjects when they assume that container role (as objects, they can be part of other superobjects). Objects can be shuffled in all sorts of unpredictable ways because it is less about how they are made than how they are sold. It is therefore useless to think that if one object is contained inside some superobject A, if that superobject A will be contained inside another superobject B, superobject B will inherit all objects from superobject A.
* In a superobject, multiple objects can be grouped together into a set of options. One option can be made of multiple objects (majority of the time, just one). You can choose how many options minimum and maximum (aka a range) you can choose from a set (majority of the time, just one). It's possible to have a NONE choice aka you don't choose any option.
* Any object can have 0 (common), 1 (common), or more (very rare) dependencies.
Note: the bit about sets of options having a min-max selection should be ignored safely has it can be converted in single-selection options. Example: options A, B, C with zero to two selections is, in the end, just A, B, C, AB, AC, BC, NONE. The bit about NONE should also be safely ignored as I am building a search engine which doesn't care about the absence of a certain option.
Dummy example
---------
Think of cars (don't get too worked up on that example as it's a dummy one and I don't know a thing about cars).
You can buy one red, green, or yellow (3 options). You can choose manual or automatic transmission (2 options).
If you buy the manual transmission, you can choose hubcaps (hubcaps are dependent on manual transmission).
Tree representation
--------------
Think of "( )" as radio buttons. Empty line breaks are separating sets of options. X indicates mandatory nodes (unless their parent is not chosen). "(n-m)" represents the min-max range for a set of options.
**Car A (superobject)**
(1-1) Car A + Green
(1-1) Car A + Yellow
(1-1) Car A + Red
(x) Engine V2000
(1-1) Manual transmission
│ (x) Speakers
│
│ (1-2) 20-inch wheels
│ (1-2) 18-inch wheels + Hubcaps
│
│ (0-1) Leather seats + Black
│ (0-1) Vinyl seats + Yellow {requires Car A + Yellow}
│
(1-1) Automatic transmission
│ (x) Leather seats + Blue
│
│ (x) 20-inch wheels
A user can customize their car however they want.
- Yellow car A + Engine V2000 + Manual transmission + Speakers + 20-inch wheels + Yellow vinyl seats is one legal combination.
- Green car A + Engine V2000 + Automatic transmission + Blue leather seats + 20-inch wheels is another one.
- Yellow car A + Engine V2000 + Automatic transmission + 18-inch wheels + Hubcaps *is an illegal combination.*
It doesn't matter in which order the transmission or the engine appear in the tree. They can switch place and that is why this is not a proper hierarchy. You go down the tree and must check every branch until you reach the last leaf, choosing options as necessary, before jumping back on the trunk until you reach the roots.
Purpose
----------
I don't really care at the moment as of how to render that tree above as it doesn't appear to be a big challenge. What I care is building a performant search engine to find superobjects given columns of objects (subqueries) selected through joining objects with their related tableS such as object__tag.
Given the previous example, when I search for Green car A + Vinyl seats, I must not find superobject Car A.
Of course, you have to understand that I will not search for one specific object "vinyl seats" but for tag "vinyl seats" which will give me **a column of thousands of objects**. And same, I will search for color "green" linked to object of class "car" and, again, it will return this time **a column of pairs object-color**. The combination of these two or much more subqueries must return superobjects that fit the constraints. Because of that, the model must stay normalized as I believe you can't hit an array of id, or a ltree with such subqueries.
Problem
-----------
The hierarchy part is not as hard to modelize as it seems. Each superobject is pretty flat. If you think in term of paths (like in materialized paths), after putting the objects at the root one after the other, you don't get many paths.
With the above example, you only have three paths because of the transmission.
* Green/Yellow/Red Car A > Engine V2000 > Manual transmission > Speakers > 20-inch wheels/18-inch wheels + Hubcaps > Black Leather seats
* Yellow Car A > Engine V2000 > Manual transmission > Speakers > 20-inch wheels/18-inch wheels + Hubcaps > Yellow vinyl seats
* Green/Yellow/Red Car A > Engine V2000 > Automatic transmission > Blue leather seats > 20-inch wheels
In fact, seven, as
> (1-2) 20-inch wheels
> (1-2) 18-inch wheels + Hubcaps
will be modelized in my solution as
* (Manual transmission > Speakers >) 20-inch wheels
* (Manual transmission > Speakers >) 18-inch wheels > Hubcaps
* (Manual transmission > Speakers >) 20-inch wheels > 18-inch wheels > Hubcaps
as I can't add multiple objects to a node. I have to expand the more-than-one-selectable-option into further branches.
Now, the big problem is that each new set of options makes the hierarchy grows exponentially. If you have 3 options, then 4, then 5, the tree will have 60 paths. If you add another one of 3 it's 180. I estimated that you have some freak superobjects that could spawn more than 1000 possibilities. It's insane just to check few sets of options! I just need some freak exceptions I never imagined would exist to clog my database, meaning I would have to limit the number of combinations a superobject could have to be on the safe side.
And how do you even begin being efficient if it's to throw at such a hierarchy four columns of object.id to find paths that hit a combination?? If I have to check recursively hundreds of paths that do not even have a predictable order just to find a list of superobjects, I don't see my website becoming very performant.
Asked by Some_user
(61 rep)
Jun 2, 2023, 10:11 AM
Last activity: Jun 4, 2023, 08:55 PM
Last activity: Jun 4, 2023, 08:55 PM