Sample Header Ad - 728x90

What data is duplicated when MySQL/MariaDB BLOB columns are copied?

1 vote
1 answer
319 views
Let table_1 be created as follows: CREATE TABLE table_1 ( id INT AUTO_INCREMENT PRIMARY KEY, some_blob BLOB ); Let table_2 be created as follows: CREATE TABLE table_2 ( id INT AUTO_INCREMENT PRIMARY KEY, some_blob BLOB ); What I want to know is, after I run this table-copying query INSERT INTO table_2 (id, some_blob) SELECT id, some_blob FROM table_1; will the actual text within each some_blob field of the table_1 table be duplicated and stored on disk, or will the DB have only duplicated pointers to the disk locations containing the BLOB data? One argument for why BLOB copying must involve the duplication of actual content reasons as follows: > Duplication of BLOB content is necessary because changes to BLOB data in table_1 should not also take place in table_2. If only the disk pointers were duplicated then content changes in one table would be reflected in the other table, which violates the properties of a correct copy operation. Now I present an alternative method that the DB could implement to satisfy this copy operation. This alternative shows **the above argument is not necessarily true**. The DB could only duplicate disk pointers during the execution of the given INSERT statement, then whenever an UPDATE occurs which seeks to modify the BLOB data in one of the tables, the DB would only then allocate more space on disk to store the new data which is part of the UPDATE query. A BLOB data segment then is only deleted when there no longer exists any disk pointers to it, and a particular BLOB data segment could potentially have many disk pointers pointing to it. So which of these strategies does MySQL/MariaDB use when executing the given INSERT statement, or does it use a different strategy? ### Why I am asking this question Currently I am running a couple of UPDATE queries which are copying large amounts of BLOB data from one table to another in the same database (over 10 million rows of BLOB data). The queries have been running for a while. I am curious about whether the performance is so slow because some of the columns I am comparing are poorly indexed, because these queries are literally copying over the content instead of disk pointers, or perhaps because of both of these reasons. I use an INSERT in the question's example because this simplifies the concept of database internals that I am trying to understand.
Asked by AjaxLeung (111 rep)
Jul 25, 2018, 09:40 PM
Last activity: Jul 25, 2018, 10:40 PM