Sample Header Ad - 728x90

What is the optimal data type for an MD5 field?

57 votes
3 answers
55605 views
We are designing a system that is known to be read-heavy (on the order of tens of thousands of reads per minute). * There is a table names that serves as a sort of central registry. Each row has a text field representation and a unique key that is an MD5 hash of that representation.1 This table currently has tens of millions of records and is expected to grow into the billions over the lifetime of the application. * There are dozens of other tables (of highly varying schemas and record counts) that make reference to the names table. Any given record in one of these tables is guaranteed to have a name_key, which is functionally a foreign key to the names table. 1: Incidentally, as you might expect, records in this table are immutable once written. For any given table other than the names table, the most common query will follow this pattern: SELECT list, of, fields FROM table WHERE name_key IN (md5a, md5b, md5c...); I would like to optimize for read performance. I suspect that my first stop should be to minimize the size of the indices (though I wouldn't mind being proven wrong there). **The Question:** What is/are the optimal data types for the key and name_key columns? Is there a reason to use hex(32) over bit(128)? BTREE or GIN?
Asked by bobocopy (765 rep)
Sep 16, 2015, 05:00 PM
Last activity: Jul 11, 2024, 11:35 PM