Database Administrators
Q&A for database professionals who wish to improve their database skills
Latest Questions
1
votes
1
answers
168
views
Table filtered for multiple column: eav support table can be the best solution for search performance improvement?
I have **a table with 10~15 columns** (some varchar, some int, some double, and some date). **Table can contain tens of millions of records.** **User can filter from all columns using all filter operator** (equal, start with, contains, ends with, major, minor..) and setting multiple filters (ex. Fil...
I have **a table with 10~15 columns** (some varchar, some int, some double, and some date). **Table can contain tens of millions of records.**
**User can filter from all columns using all filter operator** (equal, start with, contains, ends with, major, minor..) and setting multiple filters (ex. Filter for column1 and column2 and column5 and column 8)
**I would like to implement a solution who optimize the search response having a good compromise for insert performance.**
**Solution1**:
Create an index for each column.
I think is not the best solution because it will compromise the insert performance and not allow to cover all combination of filters.
**Solution2**:
Using an EAV support table who contains the column data allowed to filter.
CREATE TABLE FlatTable
(
TableId UNIQUEIDENTIFIER NOT NULL,
Column1 VARCHAR(64),
Column2 INTEGER,
Column3 DATETIME,
Column4 FLOAT,
...
Column10 VARCHAR(512),
CONSTRAINT PK_FlatTable PRIMARY KEY (TableId)
)
CREATE TABLE EavTable
(
EavTableId UNIQUEIDENTIFIER NOT NULL,
TableId UNIQUEIDENTIFIER NOT NULL,
PropertyName VARCHAR (512) NOT NULL,
StringValue VARCHAR (512),
StringValueReverse AS REVERSE(StringValue),
StringValueFullText VARCHAR(MAX),
NumericValue MONEY,
DateValue DATETIME,
CONSTRAINT PK_EavTable PRIMARY KEY (EavTableId),
CONSTRAINT FK_EavTable FOREIGN KEY (TableId) REFERENCES FlatTable (TableId) ON DELETE CASCADE
)
CREATE UNIQUE INDEX UX_EavTable ON EavTable (TableId, StringValue, PropertyName)
CREATE UNIQUE INDEX UX2_EavTable ON EavTable (TableId, StringValueReverse, PropertyName)
CREATE UNIQUE INDEX UX3_EavTable ON EavTable (TableId, NumericValue, PropertyName)
CREATE UNIQUE INDEX UX4_EavTable ON EavTable (TableId, DateValue, PropertyName)
CREATE FULLTEXT INDEX ON EavTable(StringValueFullText)
KEY INDEX PK_EavTable
WITH STOPLIST = OFF;
**Remarks**:
*StringValueReverse* column is used to filter string values with "ends with operator" allow to resolve the query with a like 'reverseValue%' filter and perform and index seek on index UX2_EavTable.
*StringValueFullText* column contains string value splitted on multiple words allow to use fulltext index on some circumstances when filter operator is "contains".
*NumericValue* column contains the numeric value of the property if it is a number allow to perform and index seek on index UX3_EavTable
*DateValue* column contains the datetime value of the property if it is a date allownto perform and index seek on index UX4_EavTable
**With this schema i can filter on multiple columns using multiple EXISTS conditions with INDEX SEEK on all filters** (except some case filtering with contains filter)
For example:
SELECT *
FROM FlatTable
WHERE EXISTS (SELECT *
FROM EavTable
WHERE TableId = FlatTable.TableId
AND PropertyName = 'Property1'
AND StringValue LIKE 'value%'
)
AND EXISTS (SELECT *
FROM EavTable
WHERE TableId = FlatTable.TableId
AND PropertyName = 'Property2'
AND StringValueReverse LIKE 'eulav%'
)
AND EXISTS (SELECT *
FROM EavTable
WHERE TableId = FlatTable.TableId
AND PropertyName = 'Property3'
AND CONTAINS(StringValueFullText, '"value*'")
)
AND EXISTS (SELECT *
FROM EavTable
WHERE TableId = FlatTable.TableId
AND PropertyName = 'Property4'
AND NumericValue > 100
)
AND EXISTS (SELECT *
FROM EavTable
WHERE TableId = FlatTable.TableId
AND PropertyName = 'Property5'
AND DateValue > '20240101'
)
How you evaluate this solution?
There is a better one?
Thanks
Claudio
(11 rep)
Dec 25, 2023, 07:27 PM
• Last activity: Jul 26, 2025, 10:03 AM
0
votes
1
answers
163
views
How to store a bunch of columns and where you don't know how many there will be, the columns change, and new columns are introduced
So the so source of the data is the internet, and random bots that submit values to my apache server. Periodically new columns appear, but they are not always present. The other problem is then you have a bunch of NULLs which you don't want to store at all if possible. Later, I would statistically a...
So the so source of the data is the internet, and random bots that submit values to my apache server.
Periodically new columns appear, but they are not always present.
The other problem is then you have a bunch of NULLs which you don't want to store at all if possible.
Later, I would statistically analyze the data, but we are just trying to store it here.
Currently there are 99 columns, with a minimum of 27 and a max of 59, but the max will probably go up slowly.
I am currently using a single master table, but the downside is there are tons of NULLs stored for each record. Lots of wasted space. Presently the most common value is 64 NULLs per row.
So I had an idea, and was wondering is anyone can improve on it.
You need a master table, and every submitted column has a lookup table with an index and a value.
master
index num_columns table_index
1 27 1
There would be table for simplicity named tables27 to table100
table27 would be
index , columns_names_csv, col1_index,col2_index,...col27_index
table28 would be much the same except
index, columns_names_csv, col1_index,col2_index,...col28_index
All the indexes would be integers used to point to the correct values in the corresponding lookup table.
Since they would all be integers there would be no need to worry about data types.
So in addition to the master table and 99 (and growing) lookup tables you would have tables27 through, currently, 59 but that number could eventually increase.
Is there a better way to do this?
For fun here's the table
+-----------------------------------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------------------+---------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| lstREMOTE_ADDR | int(11) | NO | | NULL | |
| lstCONTENT_LENGTH | int(11) | YES | | NULL | |
| lstREQUEST_TIME_FLOAT | datetime(4) | NO | | NULL | |
| lstUNIQUE_ID | varchar(128) | NO | | NULL | |
| lstSERVER_PORT | int(5) | NO | | NULL | |
| lstREMOTE_PORT | int(5) | NO | | NULL | |
| lstREDIRECT_UNIQUE_ID | int(10) unsigned | YES | | NULL | |
| lstREDIRECT_REQUEST_METHOD | int(10) unsigned | YES | | NULL | |
| lstREDIRECT_STATUS | int(10) unsigned | YES | | NULL | |
| lstHTTP_HOST | int(10) unsigned | YES | | NULL | |
| lstHTTP_USER_AGENT | int(10) unsigned | YES | | NULL | |
| lstHTTP_ACCEPT | int(10) unsigned | YES | | NULL | |
| lstHTTP_ACCEPT_LANGUAGE | int(10) unsigned | YES | | NULL | |
| lstHTTP_ACCEPT_ENCODING | int(10) unsigned | YES | | NULL | |
| lstHTTP_CONNECTION | int(10) unsigned | YES | | NULL | |
| lstHTTP_COOKIE | int(10) unsigned | YES | | NULL | |
| lstHTTP_UPGRADE_INSECURE_REQUESTS | int(10) unsigned | YES | | NULL | |
| lstHTTP_CACHE_CONTROL | int(10) unsigned | YES | | NULL | |
| lstPATH | tinyint(3) unsigned | YES | | NULL | |
| lstSERVER_SIGNATURE | tinyint(3) unsigned | YES | | NULL | |
| lstSERVER_SOFTWARE | int(10) unsigned | YES | | NULL | |
| lstSERVER_NAME | int(10) unsigned | YES | | NULL | |
| lstSERVER_ADDR | int(10) unsigned | YES | | NULL | |
| lstDOCUMENT_ROOT | tinyint(3) unsigned | YES | | NULL | |
| lstREQUEST_SCHEME | int(10) unsigned | YES | | NULL | |
| lstCONTEXT_PREFIX | int(10) unsigned | YES | | NULL | |
| lstCONTEXT_DOCUMENT_ROOT | int(10) unsigned | YES | | NULL | |
| lstSERVER_ADMIN | int(10) unsigned | YES | | NULL | |
| lstSCRIPT_FILENAME | tinyint(3) unsigned | YES | | NULL | |
| lstREDIRECT_URL | int(10) unsigned | YES | | NULL | |
| lstGATEWAY_INTERFACE | int(10) unsigned | YES | | NULL | |
| lstSERVER_PROTOCOL | int(10) unsigned | YES | | NULL | |
| lstREQUEST_METHOD | int(10) unsigned | YES | | NULL | |
| lstQUERY_STRING | int(10) unsigned | YES | | NULL | |
| lstREQUEST_URI | int(10) unsigned | YES | | NULL | |
| lstSCRIPT_NAME | tinyint(3) unsigned | YES | | NULL | |
| lstPHP_SELF | int(10) unsigned | YES | | NULL | |
| lstREDIRECT_spider | int(10) unsigned | YES | | NULL | |
| lstspider | int(10) unsigned | YES | | NULL | |
| lstHTTP_PRAGMA | int(10) unsigned | YES | | NULL | |
| lstHTTP_FROM | int(10) unsigned | YES | | NULL | |
| lstREDIRECT_QUERY_STRING | int(10) unsigned | YES | | NULL | |
| lstHTTP_REFERER | int(10) unsigned | YES | | NULL | |
| lstHTTP_DNT | int(10) unsigned | YES | | NULL | |
| lstHTTP_X_REQUESTED_WITH | int(10) unsigned | YES | | NULL | |
| lstCONTENT_TYPE | int(10) unsigned | YES | | NULL | |
| lstREDIRECT_ERROR_NOTES | int(10) unsigned | YES | | NULL | |
| lstHTTP_X_FORWARDED_FOR | int(10) unsigned | YES | | NULL | |
| lstHTTP_IF | int(10) unsigned | YES | | NULL | |
| lstREDIRECT_HTTPS | int(10) unsigned | YES | | NULL | |
| lstREDIRECT_SSL_TLS_SNI | int(10) unsigned | YES | | NULL | |
| lstHTTPS | int(10) unsigned | YES | | NULL | |
| lstSSL_TLS_SNI | int(10) unsigned | YES | | NULL | |
| lstPHP_AUTH_USER | int(10) unsigned | YES | | NULL | |
| lstPHP_AUTH_PW | int(10) unsigned | YES | | NULL | |
| lstHTTP_AMP_CACHE_TRANSFORM | int(10) unsigned | YES | | NULL | |
| lstHTTP_ACCEPT_CHARSET | int(10) unsigned | YES | | NULL | |
| lstHTTP_CLIENT_IP | int(10) unsigned | YES | | NULL | |
| lstHTTP_X_REMOTE_IP | int(10) unsigned | YES | | NULL | |
| lstHTTP_X_ORIGINATING_IP | int(10) unsigned | YES | | NULL | |
| lstHTTP_X_REMOTE_ADDR | int(10) unsigned | YES | | NULL | |
| lstHTTP_X_REAL_IP | int(10) unsigned | YES | | NULL | |
| lstHTTP_REVERSE_VIA | int(10) unsigned | YES | | NULL | |
| lstHTTP_X_VARNISH | int(10) unsigned | YES | | NULL | |
| lstHTTP_X_UA_COMPATIBLE | int(10) unsigned | YES | | NULL | |
| lstHTTP_X_POWERED_BY | int(10) unsigned | YES | | NULL | |
| lstHTTP_TE | int(10) unsigned | YES | | NULL | |
| lstHTTP_X_PIPER_ID | int(10) unsigned | YES | | NULL | |
| lstHTTP_X_UCBROWSER_UA | int(10) unsigned | YES | | NULL | |
| lstHTTP_X_WAP_PROFILE | int(10) unsigned | YES | | NULL | |
| lstHTTP_X_EBO_UA | int(10) unsigned | YES | | NULL | |
| lstHTTP_X_OPERAMINI_FEATURES | int(10) unsigned | YES | | NULL | |
| lstHTTP_X_OPERAMINI_PHONE | int(10) unsigned | YES | | NULL | |
| lstHTTP_X_OPERAMINI_ROUTE | int(10) unsigned | YES | | NULL | |
| lstHTTP_DEVICE_STOCK_UA | int(10) unsigned | YES | | NULL | |
| lstHTTP_X_OPERAMINI_PHONE_UA | int(10) unsigned | YES | | NULL | |
| lstHTTP_FORWARDED | int(10) unsigned | YES | | NULL | |
| lstHTTP_X_ACCEL_INTERNAL | int(10) unsigned | YES | | NULL | |
| lstHTTP_WAP_CONNECTION | int(10) unsigned | YES | | NULL | |
| lstHTTP_X_CONTENT_OPT | int(10) unsigned | YES | | NULL | |
| lstHTTP_KEEP_ALIVE | int(10) unsigned | YES | | NULL | |
| lstHTTP_APPID | int(10) unsigned | YES | | NULL | |
| lstHTTP_CLIENTID | int(10) unsigned | YES | | NULL | |
| lstHTTP_DID | int(10) unsigned | YES | | NULL | |
| lstHTTP_PLATFORM | int(10) unsigned | YES | | NULL | |
| lstHTTP_RISKUDID | int(10) unsigned | YES | | NULL | |
| lstHTTP_SIGNTYPE | int(10) unsigned | YES | | NULL | |
| lstHTTP_TRACKERID | int(10) unsigned | YES | | NULL | |
| lstHTTP_WORKSPACEID | int(10) unsigned | YES | | NULL | |
| lstHTTP_VIA | int(10) unsigned | YES | | NULL | |
| lstHTTP_PROXY_CONNECTION | int(10) unsigned | YES | | NULL | |
| lstHTTP_DEPTH | int(10) unsigned | YES | | NULL | |
| lstHTTP_TRANSLATE | int(10) unsigned | YES | | NULL | |
| lstHTTP_MIME_VERSION | int(10) unsigned | YES | | NULL | |
| lstHTTP_X_VERMEER_CONTENT_TYPE | int(10) unsigned | YES | | NULL | |
| lstHTTP_X_CNECTION | int(10) unsigned | YES | | NULL | |
| lstHTTP_EXPECT | int(10) unsigned | YES | | NULL | |
| lstHTTP_IF_MODIFIED_SINCE | int(10) unsigned | YES | | NULL | |
+-----------------------------------+---------------------+------+-----+---------+----------------+
cybernard
(151 rep)
Aug 24, 2019, 03:29 PM
• Last activity: Jul 12, 2025, 10:01 AM
0
votes
1
answers
183
views
EAV, multiple left join query optimisation
Any assitance would be greatly welcomed. Pardon my ignorance however I am wondering if anyone else has had to encounter bad EAV performance issues. I have a particular query which takes a long time. What is the best way to increase the speed on this? ``` SELECT * FROM ( SELECT f.*, p.`timestamp` FRO...
Any assitance would be greatly welcomed. Pardon my ignorance however I am wondering if anyone else has had to encounter bad EAV performance issues. I have a particular query which takes a long time. What is the best way to increase the speed on this?
SELECT * FROM
(
SELECT f.*, p.timestamp
FROM tObjectValues
ov
INNER JOIN tObjectProperties
op ON ov.propertyId
= op.id
INNER JOIN tObjects
o ON op.objectId
= o.id
INNER JOIN files
f ON SUBSTRING(ov.value
, CHAR_LENGTH('cache://content//')) = f.filename
LEFT JOIN publishFiles
pf ON f.id
= pf.fileId
LEFT JOIN publishes
p ON p.id
= pf.publishId
AND o.appId
= p.appId
AND p.type
= 'l'
WHERE o.status
= 1 AND op.status
= 1 AND ov.status
= 1 AND ov.value
LIKE 'cache://content/%' AND o.appId
= 2
ORDER BY p.timestamp
DESC
LIMIT 0, 10000000
) s
GROUP BY s.id
I have run the explain function to delve further into the issue however i was wondering if anyone who has had similar problem could help in the optimisation.

funnypeople
(1 rep)
Mar 20, 2020, 08:35 AM
• Last activity: Jul 9, 2025, 06:05 PM
0
votes
2
answers
213
views
I would like to get a unique based on email address (MySQL)
I can work on basic queries and have been having a bit of trouble the way Wordpress stores their database information. The below query I have been using to download all form submissions for contests that we run. I usually download the results and then sort what I need to using excel or google sheets...
I can work on basic queries and have been having a bit of trouble the way Wordpress stores their database information. The below query I have been using to download all form submissions for contests that we run. I usually download the results and then sort what I need to using excel or google sheets. Our last bunch of contests have had a lot of submissions, and I would like to take the following query and take it one step further :
SELECT b.id, b.date_created, a.lead_id, a.form_id,
max(case when a.field_number = 13 then value end) as FirstName,
max(case when a.field_number = 14 then value end) as LastName,
max(case when a.field_number = 15 then value end) as Address,
max(case when a.field_number = 16 then value end) as Address2 ,
max(case when a.field_number = 17 then value end) as City,
max(case when a.field_number = 18 then value end) as Province,
max(case when a.field_number = 19 then value end) as PostalCode,
max(case when a.field_number = 20 then value end) as Country,
max(case when a.field_number = 4 then value end) as PhoneNumber,
max(case when a.field_number = 3 then value end) as EmailAddress,
max(case when a.field_number = 44 then value end) as Question,
max(case when a.field_number LIKE 42.1 then value end) as ClientNewsletter,
max(case when a.field_number LIKE 27.1 then value end) as LocalNewsletter
FROM wp_rg_lead_detail a
LEFT JOIN wp_rg_lead b
ON a.lead_id = b.id
WHERE a.form_id = 45
GROUP BY a.lead_id;
This gets me my full list of entries. I would like to be able to generate three more results.
This current result I would like to get unique entries by **EmailAddress**
The two new lists I would like are based on whether **ClientNewsletter** is selected, as well as **LocalNewsletter**.
Any help is appreciated.
kaykills
(1 rep)
Jan 8, 2019, 03:14 PM
• Last activity: Jun 15, 2025, 02:04 AM
1
votes
1
answers
1565
views
Database Design - Custom Attributes / Columns
So I'm trying to figure out the best way to go about handling this database design wise. This is a rough idea of how I THINK it should go, but not really sure. If I have orders that have their own necessary attributes that a user can create for all of their orders to have, and if the order can be co...
So I'm trying to figure out the best way to go about handling this database design wise. This is a rough idea of how I THINK it should go, but not really sure.
If I have orders that have their own necessary attributes that a user can create for all of their orders to have, and if the order can be comprised of items, and those items also can have their own custom attributes, and not necessarily the same attributes as the order, what makes the most sense?
Orders ====== id client_id Items ===== id user_id name Attributes ========== id user_id is_order_attr is_item_attr name type Orders_Attributes ================= id order_id attribute_id value Items_Attributes ================ id item_id attribute_id value1. Have an Attributes table for Orders and an Attributes table for Items, subsequently storing the values in another table, say Orders_Attributes_Values and Items_Attributes_Values? 2. Storing everything in a JSON Data Type column in some type of Orders_Configs table and then storing the values in additional tables with the JSON Data Type? 3. Something completely different?
smb
(143 rep)
Aug 24, 2018, 04:16 AM
• Last activity: Jun 1, 2025, 12:03 AM
0
votes
1
answers
300
views
MYSQL Multiple table optimization for advanced search
I have a `users` table where there is the basic information of each user: CREATE TABLE `users` ( `user_id` int(11) NOT NULL AUTO_INCREMENT, `account_type` varchar(10) NOT NULL, `email` varchar(255) NOT NULL, `password` varchar(500) NOT NULL, `registration_date` timestamp NULL DEFAULT NULL, `account_...
I have a
users
table where there is the basic information of each user:
CREATE TABLE users
(
user_id
int(11) NOT NULL AUTO_INCREMENT,
account_type
varchar(10) NOT NULL,
email
varchar(255) NOT NULL,
password
varchar(500) NOT NULL,
registration_date
timestamp NULL DEFAULT NULL,
account_status
tinyint(2) NOT NULL DEFAULT '0',
PRIMARY KEY (user_id
)
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
On the other hand, I have a `user_data' table in which we put every other data:
CREATE TABLE user_data
(
data_id
int(11) NOT NULL AUTO_INCREMENT,
user_id
int(11) NOT NULL,
name
varchar(50) NOT NULL,
value
text,
PRIMARY KEY (data_id
)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1
Basically, the data is stored as a name
=> value
I fetch the data in the following fashion:
SELECT email, ud_desc.value as description, ud_lang.value as language
FROM users
JOIN user_data as ud_desc
ON ud_desc.user_id = users.user_id
AND ud_desc.name = 'description'
JOIN user_data as ud_lang
ON ud_lang.user_id = users.user_id
AND ud_lang.name = 'language'
I made that second table this way because there is a ton of different data for each user, certain depending on their users.account_type
.
Also my client keeps adding, removing and changing different kinds of data all the time.
So I needed something quite flexible and this is the best idea I came up with so far, from a developer point of view. Because I fear that this so called "best idea" is just a faint dream and will be too slow on the long term.
Knowing that I might have to perform LIKE
and MATCH AGAINST
queries on the large volume of data that user_data
will hold.
Also, you might take note that I am not very familiar with indexes. But at this point, I have no idea what is the most efficient, given that the data is fetch based on two columns user_data.user_id
and user_data.name
.
Should I make both of those columns (user_id
and name
) indexes? Should user_id
be a foreign key (for the users
table JOIN)?
Is there a way to improve either my database model or my queries?
I thank you all in advance. I am a bit lost at the moment and need some fresh points of view on this matter. Don't hesitate to tell me if you need more details.
Jake
(1 rep)
Aug 2, 2018, 12:49 PM
• Last activity: May 2, 2025, 06:03 PM
0
votes
1
answers
302
views
EAV table question
What's the best design for taxonomies in EAV tables? taxonomy id autoincrement name unique terms id autoincrement tax_id references taxonomy(id) name text unique value text relationships id autoincrement term_id references terms(id) post_id references posts(id) vs a single table: relationships post_...
What's the best design for taxonomies in EAV tables?
taxonomy
id autoincrement
name unique
terms
id autoincrement
tax_id references taxonomy(id)
name text unique
value text
relationships
id autoincrement
term_id references terms(id)
post_id references posts(id)
vs a single table:
relationships
post_id references posts(id)
tax_name text
term_name text
term_value text
I think that if I use a single table the db will use slightly more space, because it will store the tax name and term name for each record (relationship). But would it be faster when performing queries that select records based on multiple terms and term values?
user61637
(1 rep)
Mar 18, 2015, 07:33 PM
• Last activity: May 2, 2025, 11:07 AM
1
votes
2
answers
1641
views
Mysql: Create a view with multiple self joins without duplicates in result
Just to clarify i can't change the tables structure, so please leave out the "you should change your tables to this and that" answers, thank you. So i have a table **entities_attributes_values** where an entity has a lot of attributes and the value of that attribute, basically imagine 3 fields: - en...
Just to clarify i can't change the tables structure, so please leave out the "you should change your tables to this and that" answers, thank you.
So i have a table **entities_attributes_values** where an entity has a lot of attributes and the value of that attribute, basically imagine 3 fields:
- entity_id
- entity_attributes_id
- value
Because every entities attribute and its value is on row getting more values is not so easy i was thinking of multiple self joins, and because this query will be very common i created a view, which is built with this query:
SELECT
So as you can see the first three result are not good for me, i only need the fourth one, where i have all my data about one entity.
Thank you in advance for any help!
L1
.entity_id
,
L1
.value
as 'company_id',
L2
.value
as 'entity_name',
P
.value
as 'person_name',
L4
.value
as 'establishment_id',
L5
.value
as 'department_id'
FROM entities_attributes_values
L1
LEFT JOIN entities_attributes_values
L2
ON L1
.entity_id
= L2
.entity_id
AND L2
.entity_attributes_id
= 1
LEFT JOIN entities_attributes_values
L3
ON L1
.entity_id
= L3
.entity_id
AND L3
.entity_attributes_id
= 3
LEFT JOIN persons_attributes_values
P
ON L3
.value
= P
.core_persons_id
AND P
.core_persons_attributes_id
= 4
LEFT JOIN entities_attributes_values
L4
ON L1
.entity_id
= L4
.entity_id
AND L4
.entity_attributes_id
= 12
LEFT JOIN entities_attributes_values
L5
ON L1
.entity_id
= L5
.entity_id
AND L5
.entity_attributes_id
= 13
WHERE L1
.entity_attributes_id
= 2
So this works but i have one problem i get "duplicate" values and its not really duplicate but **the point is that in my view i want every entity to be only one row with all its attributes values** but instead i get this:

Mr. Sam
(111 rep)
Jun 24, 2015, 10:08 AM
• Last activity: Apr 17, 2025, 09:07 PM
1
votes
2
answers
1358
views
Store parameters of various types
I am a (poor, young) scientist working to develop a database to allow a project to scale up. I am experimenting with MySQL and have had very encouraging experiences so far, but I am still novice and am coming at this from perhaps an unusual perspective. Below is a diagram of the relevant parts of my...
I am a (poor, young) scientist working to develop a database to allow a project to scale up. I am experimenting with MySQL and have had very encouraging experiences so far, but I am still novice and am coming at this from perhaps an unusual perspective.
Below is a diagram of the relevant parts of my schema, as I have it modelled right now. The fields are essentially place holders at this point, but in general I would like to leave room for documentation of just about every unique element in the database (except probably not for jobs, but I digress). There are a couple of qualities that I am shooting for:
1. Valid parameters are algorithm dependent, and we may add new algorithms at any time. I'd like to be able to register a new algorithm, and associate it with particular parameters.
2. Because new algorithms may require new parameters, I should be able to register new parameters as well.
3. A particular experiment will use one algorithm. Jobs are unique parametrizations of an experiment/algorithm. Through the relation of an experiment and an algorithm, I can validate the parameters submitted along with a particular job.
My point of confusion, given this schema, is how to store the values of the parameters for a particular job. Right now, as a place holder, I have a
----------
**Edit:** I think that this schema is inadvertently adopting an entity-attribute-value style. I read that this is a bad thing. https://www.simple-talk.com/sql/t-sql-programming/avoiding-the-eav-of-destruction/ Perhaps someone can help set me on a better path?
values
column in the table job_has_parameter
, but this is not going to work. Some parameters are integers, others doubles, still others character strings. Some algorithms may even take lists as parameters... but I'm willing to punt on that for now. At any rate, a single value
column is not going to work.
Does this schema make sense at all? If it does make sense, generally speaking, what would be a reasonable way to handle storing the parameter values? I welcome criticism; I am under no delusion that I am on the right track here, but it's an effort.

Chris Cox
(43 rep)
Sep 10, 2015, 03:08 PM
• Last activity: Feb 9, 2025, 12:06 AM
0
votes
1
answers
46
views
Again EAV - flexible DB design to allow user added properties
What is the problem? I want to make a material database. The User should be able to create different material types and create materials of this type. But different material types have different material properties. And new materials may have new properties. Example: Material 1: Oil - Viscosity - De...
What is the problem? I want to make a material database. The User should be able to create different material types and create materials of this type. But different material types have different material properties. And new materials may have new properties.
Example:
Material 1: Oil
- Viscosity
- Density
- Melting Point
Material 2: Metal
- Density
- Melting Point
- Ductility
Material 3: cement
- Finess of cement
- Soundness of cement
- Strength
- Setting time
- Heat of Hydration
Material X: xxxxx
You see that different materialtypes has sometimes same or different properties. Some materialtypes have few properties others many. The properties may be of complete different data types, like STRING, FLOAT, INT, etc.
Adding a new materialtype should not result in a "development of new code" and an admin to create / change the DB / Table Schema.
So I need an approach where an User can create a new material type in the application, can create new properties in the application and create the relationship between both, meaning he defines (=relationship) which properties a materialtype has.
When then creating material, he selects the material type of the material and fills out only the properties the material has.
Under the DB normalization rules, materialtype, properties and material would be 3 different tables. Also there would be relationtables which create the relations between the tables and the values of the properties.
What would be the best approach? My ideas:
1. EAV with all the problems associated, values are stored in STRING format, whatever they are
2. EAV with the difference, that in the Table for the main DATA Types like STRING, FLOAT, BOOLEAN, BLOB I will have a separate column, the Data Type column defines in which column the data will be stored
3. Each datatype will create a OWN Table (this customtable's name will be stored in a column/field of the Materialtype table). The created table will have fields/columns with the right datatype of each property [maybe many different tables]
Are there other options and what would you recommend?
Oliver R.
(1 rep)
Dec 23, 2024, 05:05 PM
• Last activity: Dec 24, 2024, 12:51 PM
0
votes
3
answers
105
views
MySQL where clause with conditions based on other columns
I apologise for the vagueness of the title, but I couldn't find a better way. There's this MySQL table which consists of entity config mapping in a flattened structure. ``` CREATE TABLE `entity_config` ( `id` int unsigned NOT NULL AUTO_INCREMENT, `entity_id` bigint NOT NULL, `config_id` bigint NOT N...
I apologise for the vagueness of the title, but I couldn't find a better way.
There's this MySQL table which consists of entity config mapping in a flattened structure.
CREATE TABLE entity_config
(
id
int unsigned NOT NULL AUTO_INCREMENT,
entity_id
bigint NOT NULL,
config_id
bigint NOT NULL,
config_value
bigint NOT NULL,
PRIMARY KEY (id
),
KEY entityconfig_entity_id_config_id_config_value
(entity_id
,config_id
,config_value
),
KEY idx_config_id_config_value
(config_id
,config_value
),
) ENGINE=InnoDB AUTO_INCREMENT=17382 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
Apart from the id
, there are 2 composite indexes on (entity_id, config_id, config_value)
and (config_id, config_value)
.
We have to support filter/search queries based on both config_id
and config_value
in the following way
- Find all entity_id
s that have config_id=10
and config_value=1000
and also config_id=20
and config_value=5000
The simplest way to do this is via joins
select ec1.entity_id
from entity_config ec1
inner join entity_config ec2 on ec1.entity_id = entity_id
where ec1.config_id=10
and ec1.config_value=1000
and ec2.config_id=20
and ec2.config_value=5000
1 join is fine but the more configs the customer wants to search, we'll have to keep adding joins and it can come to a point where we're joining the table with itself 10 times just because of 10 filters by the customer.
I tried to use case when then
inside the where
clause like so
select entity_id from entity_config
where (
case when config_id=10 then config_value = 1000 when config_id=20 then config_value=5000 end
)
But this is just an OR
condition which gives entity_id
s that have EITHER the 1st config combo or the 2nd one.
Does anyone know any better or more efficient way of solving this? Thank you!
Sidharth Samant
(203 rep)
Nov 14, 2024, 05:39 AM
• Last activity: Nov 16, 2024, 02:16 AM
1
votes
2
answers
94
views
Given that they require schema modification locks, why are sparse columns considered an alternative to EAV?
_Pro SQL Server Relational Database Design and Implementation: Best Practices for Scalability and Performance_ is a very reliable source of good database design ideas. As an alternative to Entity-Attribute-Value it suggests using dynamic SQL to allow your users to add new sparse columns to a table....
_Pro SQL Server Relational Database Design and Implementation: Best Practices for Scalability and Performance_ is a very reliable source of good database design ideas. As an alternative to Entity-Attribute-Value it suggests using dynamic SQL to allow your users to add new sparse columns to a table.
To me, this idea stinks. Adding new columns requires a schema modification lock. This is a very serious lock and I would rather my users not have the ability to obtain it.
Is there any property of sparse columns that makes allowing users to take such a serious lock not as terrible as it would be for other types of column?
J. Mini
(1225 rep)
Oct 19, 2024, 11:34 PM
• Last activity: Oct 21, 2024, 12:32 PM
0
votes
1
answers
137
views
How to build an efficient EAV model over JSONB in Postgresql
## Context Hello, we have a project we integrate with a ton of different CRMs. That means that when a new client comes in, we get all their data from their CRM and store it in our own postgresql database. This means that having fixed schemas for our application is pretty hard. ## Requirements Our ap...
## Context
Hello, we have a project we integrate with a ton of different CRMs. That means that when a new client comes in, we get all their data from their CRM and store it in our own postgresql database. This means that having fixed schemas for our application is pretty hard.
## Requirements
Our app adds geographical capabilities to all the CRMs we integrate with by adding a map. Over that map, we then built some features on top. The base needs of these features are:
- Text search: common full text search.
- Filtering / colorising: filtering by range (numeric, datetime), specific values of picklist, location on map (we have a tile server for this), etc. We also support filtering by null or filled values. For example, show me all records where field
field
is null or not.
- Insert speed is kind of important, because we have to make imports of hundred of thousands of records when integrating a new company. But these are less and less frequent because we are building an event-based update mechanism to be always up to date with the CRMs data.
## Current solution
We chose some time ago to fully implement an entity-atribute-value (EAV) data model. This seemed the best idea given the shape of our data (more or less shapeless, due to how many different objects you have in the CRMs). The structure is more or less the usual in an EAV model:
- One table to store some core fields of the entity (id, creator, created date and so on)
- 5 tables for the 5 different built-in types we support (number, string, picklist, datetime, address).
This seemed great some time ago, but now we are having some difficulties when adding new features to the system. For instance:
- Full text search: this is somewhat hard to implement for some of the field types that we support (because we need to make some data redundant).
- Adding new field types: putting all the information of a complex field type into one field only forces us to create some pretty weird code.
- The tile server we have built to implement the filtering, needs to run some pretty big and prone to error queries to combine filters by several fields.
- Model hard to understand by new joiners to the project.
## Possible new solution
We were thinking about reducing the complexity of the project by combining Schemas and JSONB. Basically just store all records in just one table:
- Some fixed columns in the table with the most used values.
- Json b columns with all the fields coming from the CRM, stored there to be queried by the filter/colorising functionality. We would use GIN index to support full text search.
I have some concerns about the filtering speed. Customers are used to that feature working pretty fast. I'm unsure about how slow will it get if we store everything in a JSONB format. Right now we don't have a crazy amount of records, so I was thinking that this solution could get us to support more clients, and once we have more clients, move the filtering part to a Reversed Index solution like Elastic Search, while leaving all the other functionalities working with Postgresql.
What would you recommend in this case? Is everything clear? Please let me know if I need to clarify something!
Right now we have these five fields, but they could be reduced by:
- Combining picklist table into the string one.
- Casting all datetimes to strings (we store everything in UTC and we only care about range filters, which given that all is in UTC, then alphabetical comparisons can be used to implement the datetime range filter).
Antonio Gamiz Delgado
(111 rep)
Aug 26, 2024, 08:54 AM
• Last activity: Aug 26, 2024, 10:04 AM
1
votes
1
answers
1124
views
EAV or JSONB column for dynamic data that needs good indexing
I have some data in a Postgres database that does not fit into a static schema. There is a table "content" and each piece of content has a user-defined type. Each type can have different user-defined fields. Within a single type those fields are always the same though. You can assume that a typical...
I have some data in a Postgres database that does not fit into a static schema. There is a table "content" and each piece of content has a user-defined type. Each type can have different user-defined fields. Within a single type those fields are always the same though. You can assume that a typical database has ~5-10 types with each having ~5-25 fields.
My original approach to store this was to use JSONB and store the data like this:
{
"my_field": "foo",
"another_field": "bar",
"some_number_field": 5,
"a_bool_field": true
}
So as key/value pairs where each field has a string id used as the key and the value stored as the type of the field. So of course you have to know if the specific field you are querying is a number of a string, but that information is stored in the DB elsewhere for all content types and their fields.
This is indexed with a GIN index using jsonb_path_ops and then can be queried using the @>
containment operator. This works pretty nicely for checking equality, but doesn't support any other cases.
The problem is that I need to also support more advanced queries here, specifically some that require support for >
and <
operators. Part of this is because I'm adding timestamps as a type for the fields, and queries that restrict the range based on a timestamp are a very common use case there.
As far as I understand this is not possible to do in a generic way using a JSONB column, there is no index similar to the GIN index that would allow these kinds of queries. So I see only two ways to handle this:
- dynamically create the right indexes for specific fields
- store the data in a EAV pattern with columns for different data types like timestamp, int, ...
The first option to create the indexes in the running application based on user input is a bit unorthodox, though I think it would be possible to do this reasonably safely in this particular case. The tenants are separated by schemas, and I'd use partial indexes that are created concurrently to avoid locking the tables.
The second option with an entity attribute value system feels icky, but it does seem to be a valid solution for this particular requirement. But everyone seems to strongly advise against EAV every time this comes up, so I'm not entirely sure how problematic this solution would be.
Am I missing any solutions here? What are my options here to store this kind of flexible data in a way that still allows fast comparison queries on types like timestamps and numbers?
Fabian
(115 rep)
Jan 4, 2024, 02:08 PM
• Last activity: Jul 7, 2024, 05:03 AM
0
votes
3
answers
1454
views
Mariadb querying is slow in a table of 2.6M records with indexes
TLDR: I have a table with a lot of records and my queries are slow. I have indexes, but still the query takes a long time to execute. Need help figuring out what the problem is, or atleast a pointer in the right direction. Long story below:- I have a table of products with roughly 2.6M records and t...
TLDR: I have a table with a lot of records and my queries are slow. I have indexes, but still the query takes a long time to execute. Need help figuring out what the problem is, or atleast a pointer in the right direction.
Long story below:-
I have a table of products with roughly 2.6M records and the size is set to grow. The table is de normalized and has around 100 odd columns to handle attributes of products objects. Products are categorized with a category name. Each category can have around 5-20 attribute columns. I am using MariaDB 11.0.2 running on Ubuntu on a 4 CPU core linode instance with 8GB Memory
Each record stores its own attributes in respective columns and while querying my code picks the right columns for the attributes that are being sorted or filtered. I kept the table de-normalized to avoid large joins and to speed up sorting and filtering. I need to filter and sort and filter based on a lot of these columns, sometimes at the same time.
I tried indexes with single key and multiple keys and was able to get speed improvements in certain cases. When multiple columns are being used, multicolumn indexes are used, but when columns of different indexes are used, the indexes are not used a full table scan is done. I tried to force an index but still the query took around 20 seconds.
For example when trying to count number of products meeting two where conditions take around 9 seconds to execute
There is a limit of 16 keys per index and a keys size of 3075 bytes (IIRC). If I try to filter or sort across keys from multiple indexes, the operation resorts to a full table scan and the whole operation takes around 20s to complete. Can the mentioned limits be increased using configuration entries? There are no frequently accessed columns to create a multicolumn index just on those columns. Almost all columns are equally possible.
For more context, the table hosts electronic products each with about 20 or more attributes. I know Mysql is capable of handling 100M records, please point me in the right direction.
The schema of the table is given below.
MariaDB [productsdb]> desc products;
https://pastebin.com/GAedK5kq
I have the following indexes on the table as well. The indexes starting with name 'abcd' was created for testing and will not be used in production.
MariaDB [productsdb]> show indexes from products;
https://pastebin.com/TLHuShMZ
I tried to issue a select query from products table using multiple columns in where clause.
MariaDB [productsdb]> SELECT count(products
.id
) FROM products
WHERE (products.main_category_id = 439 and ( attr_3 in ('400mW','250mW') ) and ( attr_9 in ('7') ));
+------------------------+
| count(products
.id
) |
+------------------------+
| 190 |
+------------------------+
1 row in set (8.949 sec)
With regards to the above query I have indices on main_category_id, attr_3, attr_9 and id columns. I do not have a multi column index comprising of these columns. It is faster for this specific query if I have a multi column index containing all these columns.
But I have so many attributes, how will I be able to do all permutation combination?
Ajith Kumar
(11 rep)
Jul 24, 2023, 04:46 AM
• Last activity: Jul 28, 2023, 10:14 AM
2
votes
1
answers
103
views
Database design suggestion for improving existing design
**Background** I am working on an IoT application that deals with sensors, their configurations and their readings. Following are the requirements of the system, 1. Each sensor can support multiple features. Example: Temperature, Humidity, etc. 2. The features supported by a sensor are identified by...
**Background**
I am working on an IoT application that deals with sensors, their configurations and their readings. Following are the requirements of the system,
1. Each sensor can support multiple features. Example: Temperature, Humidity, etc.
2. The features supported by a sensor are identified by the model linked to the sensor.
3. A model can be linked to multiple sensors.
4. The features linked to a model are user configurable.
5. A feature can be linked to more than one models.
Following is the ER diagram of my current database design,
**Problem:**
In the current design, I store the configuration and readings as different columns in the Sensors table. Example: MinTemperature, MaxTemperature, Temperature, etc. This leads to a huge number of columns in the Sensors table. Not all config and reading fields will be required for each sensor. Example: Only the sensors which support the Temperature feature need the config and readings of Temperature.
The current count of columns I have in my table is 35 which I think is huge. Also supporting a new feature in the future means changing the database schema to add new columns for config and reading. Hence I want to improve this design.
**Solution Tried:**
To solve this I came up with an EAV pattern to store config and readings as Key-Value pairs in a table. But I found out that it is an anti-pattern and it is not recommended to use it. Following is the class diagram of the EAV approach.
Here, the Properties table contains keys of all the config and readings. I have mapped them against the features using the FeatureProperties table and used its Id to store the values of properties in the SensorFeatureProperties table. This is done to apply the constraint that the sensor must have values for required properties only.
There is a functionality that is possible in the existing design but not in the new design. There are certain readings that I want for every sensor but still want it to be user configurable so that the user can toggle its appearance in UI.
Example: MACAddress. The MACAddress is needed for every sensor for internal purposes but it still needs to be user configurable to toggle UI appearance. This is supported in the existing design as every reading was stored as a column. Now I have changed the design to store readings in an EAV table but I don't want to store MACAddress in an EAV table. This seems like some kind of violation of design since some features follow the design and some don't.
Please provide me with suggestions on how can I improve my existing design.


Hem Bhagat
(121 rep)
Jun 14, 2023, 07:27 AM
• Last activity: Jun 15, 2023, 08:15 AM
0
votes
1
answers
1085
views
Query product and product variants based on properties
I'm trying to build a filter system for an e-commerce application. The following schema is given: http://sqlfiddle.com/#!9/18b93 * A product has many variants * A variant as has many property_values * A product has many property_values as well * A variant "inherits" the property_values of the relate...
I'm trying to build a filter system for an e-commerce application. The following schema is given:
http://sqlfiddle.com/#!9/18b93
* A product has many variants
* A variant as has many property_values
* A product has many property_values as well
* A variant "inherits" the property_values of the related product
I cannot figure out the query to filter out all product variants that have for example property_id 1 = Red, property_id 2 = S and property_id 3 = Cotton. In this case the property_ids 1 and 2 are inherited from the product. The query sould also be flexible enough to be extended with any number of additional filters.
Can this be queried in a performant fasion? Would it make more sense to denormalize the data and keep a "index" table from these values?
Any help is greatly appreciated!
rqk991
(13 rep)
Aug 29, 2018, 06:54 AM
• Last activity: Mar 19, 2023, 04:01 PM
1
votes
0
answers
2443
views
Choosing between Entity-Attribute-Value or JSONB representations
I must store multiple types of documents in my Postgres database. I know the schema of each particular document type, I know what fields will be present. So, in theory, I could create a separate table for each document type. In practice however, the number of different document types is way too high...
I must store multiple types of documents in my Postgres database. I know the schema of each particular document type, I know what fields will be present. So, in theory, I could create a separate table for each document type.
In practice however, the number of different document types is way too high for thinking of using the table-for-each-type approach. It seems to me that alternative approaches consist either in some kind of EAV representation, or in having a single table with columns like
(document_id, document_type, jsonb_content)
.
The documents will be immutable—I won't edit fields inside a document.
I'll often need to run GROUP BY
aggregations and general analytic operations over documents of a given type.
Would EAV or [JSONB](https://www.postgresql.org/docs/current/datatype-json.html) be better for my scenario?
Daniel Díaz Carrete
(113 rep)
Feb 5, 2023, 10:04 PM
1
votes
1
answers
68
views
Getting ID when sorting by joined table using group also
I have small query to getting products based on variants (colors, materials), like: ``` SELECT ANY_VALUE(id) AS id, parent_id, COUNT(id), color, material, MIN(price) AS priceMin FROM ( SELECT products.id, products.parent_id, products.price, colors.variant_option_id AS color, materials.variant_option...
I have small query to getting products based on variants (colors, materials), like:
SELECT
ANY_VALUE(id) AS id,
parent_id,
COUNT(id),
color,
material,
MIN(price) AS priceMin
FROM
(
SELECT
products.id,
products.parent_id,
products.price,
colors.variant_option_id AS color,
materials.variant_option_id AS material
FROM products
LEFT JOIN products_variant_options AS colors
ON products.id = colors.product_id
AND colors.variant_id = 1 # Colors
LEFT JOIN products_variant_options AS materials
ON products.id = materials.product_id
AND materials.variant_id = 2 # Materials
WHERE
products.status = 1
) AS product_variants
GROUP BY
parent_id,
color,
material
ORDER BY priceMin
Everything you can see on https://www.db-fiddle.com/f/vfAfRWoo2vHKB6S1phE2dt/1
You see I need get id
of product.
When I using MAX
/MIN
or ANY_VALUE
for this, you see it row with selected price not returned correct ID - look product with price=8
return id=2
instead of id=4
.
I know the function MAX
, MIN
and awful ANY_VALUE
isn't good idea. I tried ANY_VALUE
with OVER()
but without effects.
How to sorting by prices from joined tables and getting ID during using group?
kicaj
(123 rep)
Jan 31, 2023, 03:51 PM
• Last activity: Feb 3, 2023, 07:41 AM
1
votes
1
answers
214
views
Products variants in EAV
I preapre small EAV table for product variants. ``` CREATE TABLE products ( id int auto_increment, parent_id int default null, name varchar(30) not null, PRIMARY KEY (id) ); CREATE TABLE products_eav ( id int auto_increment, product_id int, attr varchar(30), val varchar(30), PRIMARY KEY (id) ); ```...
I preapre small EAV table for product variants.
CREATE TABLE products (
id int auto_increment,
parent_id int default null,
name varchar(30) not null,
PRIMARY KEY (id)
);
CREATE TABLE products_eav (
id int auto_increment,
product_id int,
attr varchar(30),
val varchar(30),
PRIMARY KEY (id)
);
I would like return products grouped by parent_id
(always) and depending on the selected attribute, sometimes "products red", sometimes "products red cotton", etc.
SELECT
MAX(id) AS id
, parent_id
, color
, material
, COUNT(id)
FROM (
SELECT
products.id
, products.parent_id
, colors.val AS color
, materials.val AS material
FROM products
JOIN products_eav AS colors ON products.id = colors.product_id
JOIN products_eav AS materials ON products.id = materials.product_id
WHERE
products.parent_id IS NOT NULL
AND colors.attr = 'Colors'
AND materials.attr = 'Material'
) AS product_variants
GROUP BY
parent_id
, color
, material
When in table products_eav
each product has the same attributes everything works fine, but when just one product has just one attribute less and query wants display each products grouped in two attributes this product is ignored - it's clear.
Look for example: https://www.db-fiddle.com/f/hyhUMMtidU1vxnmCiZXZYk/5
Product "Pink pants" hasn't material attribute and is ignored in results.
How to change the query to return "pants" in results (I mean, one conditions is enough).
*Imagine this is listing products page: When user selected filter like colors and materials, he should see pink pants too.*
kicaj
(123 rep)
Jan 25, 2023, 10:31 AM
• Last activity: Jan 25, 2023, 01:35 PM
Showing page 1 of 20 total questions