MySQL: how to implement a text search without delimiter
0
votes
1
answer
291
views
I have a table
table1
that contains several fields, some textual and other non-textual, on which I do search queries. Let's focus on some text fields, which I simply call field1
, field2
, field3
.
The table contains about 1.5 million records and the queries are not so fast, because when I have to search on these fields, I'm doing in fullscan mode.
I give you a concrete example:
SELECT *
FROM table1
WHERE [some conditions]
AND (field1 LIKE "%str1%" OR field2 LIKE "%str1%" or field3 LIKE "%str1%")
AND (field1 LIKE "%str2%" OR field2 LIKE "%str2%" or field3 LIKE "%str2%")
AND (field3 LIKE "%str3%")
Point out that str1
, str2
, etc ... can be a word or a part of a word (for example "Table" or "Tabl").
Considering how the query is build, I don't see any sense to define any index on field1
, field2
, field3
: they would not bring any advantage, just an useless waste of resources in my opionion.
So, I have evaluated the FULLTEXT
indexes of MySQL and the query MATCH() AGAINST()
: this is not good, because in my case it works only when exactly **field1=word1** or **field2=word2**, while often the fields contain concatenated words without delimiters (for example field1 contains the value **"HOUSETABLERAIN"** and not **"HOUSE TABLE RAIN"**).
What can I do to improve the execution time? My question includes changes to the query and the edit of data structure as well.
I thought two solutions, but I'm not convinced of either.
**Solution A)**
Introduce another table table2
(table1_id INT
, field VARCHAR
, string VARCHAR
); for each record in table1
, insert all possible substrings (confining itself to combinations with at least 4 characters).
Practical example:
If in table1 there is the record
id | field1
45 | HOUSETABLERAIN
then in table2 there will be
table1_id | field | string
45 | field1 | HOUS
45 | field1 | HOUSE
45 | field1 | HOUSET
45 | field1 | HOUSETA
45 | field1 | HOUSETAB
45 | field1 | HOUSETABL
45 | field1 | HOUSETABLE
45 | field1 | HOUSETABLER
45 | field1 | HOUSETABLERA
45 | field1 | HOUSETABLERAI
45 | field1 | HOUSETABLERAIN
45 | field1 | OUSE
45 | field1 | OUSET
45 | field1 | OUSETA
...
Afterwards I could define an index on table2.string
and I should have some advantages; furthermore, I could also eliminate a part of the fields in table1
, because I would no longer search on them.
But I fear it's madness ... even before testing it.
**Solution B)**
Add a TEXT
field for the search and enter all possible combinations separated by the blank space " "
, then define a FULLTEXT
index. The problem are the stopwords
that are not considered by the MATCH AGAINST
query.
I'd like to have some opinions and maybe some better ideas.
Thank you.
Asked by boldfra23
(1 rep)
Jan 23, 2019, 07:40 PM
Last activity: May 16, 2025, 09:04 PM
Last activity: May 16, 2025, 09:04 PM