I am processing data tables with varying numbers of columns and rows, represented by JSON documents with one array per column. The format of a document is
{
"column_1": ["value_1_1", "value_1_2", ..., "value_1_n"],
"column_2": ["value_2_1", "value_2_2", ..., "value_2_n"],
...,
"column_m": ["value_m_1", "value_m_2", ..., "value_m_n"],
}
The number of columns m
is typically in the lower tens, while the number of values n
lies in the lower millions. Values are either small integers or short strings, and individual data tables stored as text files are 100-200 MB in size.
Documents are stored in a jsonb
column in PostgreSQL:
Table "data"
************
Column | Type | Nullable
--------------+---------+----------
dat_id | integer | not null
dat_document | jsonb | not null
Documents are typically served "as is" to an application or with a simple filter, e.g.,
select col2, col7, col9
from (
select jsonb_array_elements(dat_document->'column_1') as col1,
jsonb_array_elements(dat_document->'column_2') as col2,
jsonb_array_elements(dat_document->'column_5') as col7,
jsonb_array_elements(dat_document->'column_9') as col9,
from data
where dat_id = 20
) as sub
where sub.col1::text like '%@yahoo.com';
The documents are stored as jsonb
following recommendations from the [PostgreSQL documentation](https://www.postgresql.org/docs/current/datatype-json.html) , however, when using the simpler json
type instead in the table *data*, I do not notice a significant drop in execution time for simple queries as the one above. On the other hand, my documents seem to take roughly 50% more disk space according to pg_column_size
on the same document as jsonb
vs json
.
Is there any advantage of storing my documents as jsonb
instead of json
in this case?
Asked by monomeric
(21 rep)
May 16, 2025, 01:09 PM
Last activity: May 19, 2025, 09:33 AM
Last activity: May 19, 2025, 09:33 AM