What is the optimal data type for an MD5 field?
57
votes
3
answers
55605
views
We are designing a system that is known to be read-heavy (on the order of tens of thousands of reads per minute).
* There is a table
names
that serves as a sort of central registry. Each row has a text
field representation
and a unique key
that is an MD5 hash of that representation
.1 This table currently has tens of millions of records and is expected to grow into the billions over the lifetime of the application.
* There are dozens of other tables (of highly varying schemas and record counts) that make reference to the names
table. Any given record in one of these tables is guaranteed to have a name_key
, which is functionally a foreign key to the names
table.
1: Incidentally, as you might expect, records in this table are immutable once written.
For any given table other than the names
table, the most common query will follow this pattern:
SELECT list, of, fields
FROM table
WHERE name_key IN (md5a, md5b, md5c...);
I would like to optimize for read performance. I suspect that my first stop should be to minimize the size of the indices (though I wouldn't mind being proven wrong there).
**The Question:**
What is/are the optimal data types for the key
and name_key
columns?
Is there a reason to use hex(32)
over bit(128)
? BTREE
or GIN
?
Asked by bobocopy
(765 rep)
Sep 16, 2015, 05:00 PM
Last activity: Jul 11, 2024, 11:35 PM
Last activity: Jul 11, 2024, 11:35 PM