EAV or JSONB column for dynamic data that needs good indexing
1
vote
1
answer
1124
views
I have some data in a Postgres database that does not fit into a static schema. There is a table "content" and each piece of content has a user-defined type. Each type can have different user-defined fields. Within a single type those fields are always the same though. You can assume that a typical database has ~5-10 types with each having ~5-25 fields.
My original approach to store this was to use JSONB and store the data like this:
{
"my_field": "foo",
"another_field": "bar",
"some_number_field": 5,
"a_bool_field": true
}
So as key/value pairs where each field has a string id used as the key and the value stored as the type of the field. So of course you have to know if the specific field you are querying is a number of a string, but that information is stored in the DB elsewhere for all content types and their fields.
This is indexed with a GIN index using jsonb_path_ops and then can be queried using the @>
containment operator. This works pretty nicely for checking equality, but doesn't support any other cases.
The problem is that I need to also support more advanced queries here, specifically some that require support for >
and <
operators. Part of this is because I'm adding timestamps as a type for the fields, and queries that restrict the range based on a timestamp are a very common use case there.
As far as I understand this is not possible to do in a generic way using a JSONB column, there is no index similar to the GIN index that would allow these kinds of queries. So I see only two ways to handle this:
- dynamically create the right indexes for specific fields
- store the data in a EAV pattern with columns for different data types like timestamp, int, ...
The first option to create the indexes in the running application based on user input is a bit unorthodox, though I think it would be possible to do this reasonably safely in this particular case. The tenants are separated by schemas, and I'd use partial indexes that are created concurrently to avoid locking the tables.
The second option with an entity attribute value system feels icky, but it does seem to be a valid solution for this particular requirement. But everyone seems to strongly advise against EAV every time this comes up, so I'm not entirely sure how problematic this solution would be.
Am I missing any solutions here? What are my options here to store this kind of flexible data in a way that still allows fast comparison queries on types like timestamps and numbers?
Asked by Fabian
(115 rep)
Jan 4, 2024, 02:08 PM
Last activity: Jul 7, 2024, 05:03 AM
Last activity: Jul 7, 2024, 05:03 AM