How is encoding done in SQL database tables for pattern matching?
4
votes
1
answer
533
views
Are strings in table columns represented as bit patterns or Unicode? Does the database engine bring them into memory in a different representation character set from the disk-persisted representations for optimal performance?
I want to know for the purposes of index creation and optimal query writing. Select statements pull data that matches a where clause.
Here is an example query:
select name from carTable where name = 'FordEscort97'
Here is the carTable:
name columnA columnB columnC
FordEscort91
FordEscort92
FordEscort93
FordEscort94
FordEscort95
FordEscort96
FordEscort97
FordEscort98
Would the queries perform faster if the column in the table was designed to have the year at the left of the name in the strings that are stored? (assuming the year component of the string was the most unique)
If there is a character-by-character, left-to-right matching process, the process could be more resource intensive than evaluating bit pattern hashes representing the strings being compared. If the pattern being matched (the anchor string in the where clause) was a substring with a wildcard, each comparison operation would be more CPU intensive because if there are characters in the column that start out matching the wild card in a leftmost moiety of the string, a rightmost moiety may start out as a second potential match late in the string comparison. Early non-matching recognition would obviate the need for as much evaluation. But I don't know for sure that there is a character-by-character pattern match.
There is an implication for creating indexes on unique columns. While a consideration should be made for a likelihood of uniqueness of strings in a column, designing data in the column *can* be arbitrary. Could an index provide a greater benefit if primary column's data had its uniqueness of the string happening mostly within the leftmost characters compared to the rightmost characters? Does it depend on the type of SQL you are using (e.g., Oracle, MySQL, Postgres, etc.)? Sometimes underlying data is stored in hexadecimal format or with bit patterns.
Asked by Yousef
(41 rep)
May 23, 2015, 08:03 PM
Last activity: Oct 6, 2015, 03:33 AM
Last activity: Oct 6, 2015, 03:33 AM