Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes

1 answers

2038 views

PostgreSQL get json key name after comparing values of keys

I have table which has a column of type json with name "demographic". I want to compare keys(predefined by me) by their values and the result of that should be the name of the key and not the value which later i will use to save to another column's rows. The json columns that i have and want to comp...

{
   "he":{
      "he_vakiy":29384,
      "he_naiset":14803,
      "he_miehet":14581,
      "he_kika":1485,
      "he_0_2":850,
      "he_3_6"
   }
}

I want to compare f.ex "he_naiset" and "he_miehet" and the one that has highest value should return the name of the "key" ; Ive tried something like this: select greatest(demographics->'he'->>'he_miehet',demographics->'he'->>'he_naiset') as greatest from demographics; Which will compare the values and return the highest values. But I don't know how I can get the name of the key. I also tried something like this from: with cte as ( select * , genre.key as genre , row_number() over (partition by id order by value desc) as ord from base.maakunta cross join lateral json_each_text(base.maakunta.demographic->'he') genre ) select genre, value,ord from cte where ord = 1 ; But this one compares all the keys inside the JSON.

eko (101 rep)

Mar 20, 2022, 02:48 PM • Last activity: Aug 5, 2025, 05:03 PM

5 votes

2 answers

1054 views

What does 'b' stand for in PostgreSQL's JSONB data type?

postgresql jsonb

While not a technical question per se, I'm curious about the actual meaning of _b_ in PostgreSQL's `jsonb` data type. I skimmed through the [official documentation][1] but I couldn't figure it out. As far I can tell, there is no explicit mention of what _b_ actually means. One would assume that it s...

                                  While not a technical question per se, I'm curious about the actual meaning of _b_ in PostgreSQL's jsonb data type.

I skimmed through the official documentation  but I couldn't figure it out. As far I can tell, there is no explicit mention of what _b_ actually means.

One would assume that it stands for _binary_, since that is one of the main differences between jsonb and plain-old json. Most articles and LLM-generated responses take for granted that it means _binary_. I've found articles , however, that refer to it as _better_.

According to the official documentation, jsonb is more than just a binary encoding of a JSON object, therefore, _better_ might be a slightly more fitting description.

Does anybody know the story behind jsonb's naming?

Sotosoul (103 rep)

Jul 17, 2025, 11:23 AM • Last activity: Jul 21, 2025, 08:09 PM

1 votes

1 answers

61 views

SELECT with array values on WHERE using postgres

postgresql array where jsonb

I'm using an query to update a object array inside a **jsonb** column. Example data: ``` [ { "_id": "68696e0a3aab2f9ff9c40679", "altura": 1, "comprimento": 1, "largura": 1, "peso": 1, "valor": 1 }, { "_id": "6869744b44829f42ccdbb32c", "altura": 2, "comprimento": 2, "largura": 2, "peso": 2, "valor":...

I'm using an query to update a object array inside a **jsonb** column. Example data:

[
  {
    "_id": "68696e0a3aab2f9ff9c40679",
    "altura": 1,
    "comprimento": 1,
    "largura": 1,
    "peso": 1,
    "valor": 1
  },
  {
    "_id": "6869744b44829f42ccdbb32c",
    "altura": 2,
    "comprimento": 2,
    "largura": 2,
    "peso": 2,
    "valor": 2
  }
]

Using one ID, this works perfectly:

UPDATE
	objetos o
SET
	itens = o.itens - (
		SELECT
			i.id::int - 1
		FROM
			jsonb_array_elements(o.itens) WITH ORDINALITY i(v, id)
		WHERE
			i.v->'_id' = '6869744b44829f42ccdbb32c'
		LIMIT 1
	)
WHERE
	_id = ${_id}
RETURNING
	_id,
	updated_at;

It deletes a entry containing _id = 6869744b44829f42ccdbb32c --- I have tried to delete entries using ARRAY ids, example ['68696e0a3aab2f9ff9c40679', '6869744b44829f42ccdbb32c'], but I get: ~~~none operator does not exist: jsonb = text ~~~ I'm trying add this in WHERE: i.v->'_id' = ANY(ARRAY['68696e0a3aab2f9ff9c40679', '6869744b44829f42ccdbb32c']) and IN, but IN does not return any information. How to compare i.v->'_id' to elements of an array? Like: ['68696e0a3aab2f9ff9c40679', '6869744b44829f42ccdbb32c'].includes(i.v->'_id') References: 1. https://stackoverflow.com/a/10738459/2741415 2. https://dba.stackexchange.com/a/315124/321838 3. https://stackoverflow.com/a/75053441/2741415

flourigh (145 rep)

Jul 5, 2025, 07:34 PM • Last activity: Jul 10, 2025, 12:46 AM

3 votes

1 answers

373 views

Why does a Json Path inequality over an array not work?

postgresql json jsonb json-path

Given the following `jsonb` array, I want to filter it only for certain objects. ```json [ { "id": "123" }, { "id": "456" }, { "id": "789" } ] ``` I can use the function `jsonb_path_query_array` with the path `strict $[*] ? (@.id == $idToSearch[*])` and a `variables` parameter of `{"idToSearch":["12...

Given the following jsonb array, I want to filter it only for certain objects.

[
  {
    "id": "123"
  },
  {
    "id": "456"
  },
  {
    "id": "789"
  }
]

I can use the function jsonb_path_query_array with the path strict $[*] ? (@.id == $idToSearch[*]) and a variables parameter of {"idToSearch":["123","456"]} to filter by the id parameter. The logic being: * $[*] breaks out the array * ? is a filter predicate... * ... where the id property is equal to ... * ... any value in the idToSearch array variable. I get this straightforward result:

[
  {
    "id": "123"
  },
  {
    "id": "456"
  }
]

If however, I invert the == in the path query to strict $[*] ? (@.id != $idToSearch[*]) then instead of getting the correct result of

[
  {
    "id": "789"
  }
]

instead I just get all the objects again. As a workaround, I can use a negated equals strict $[*] ? (! (@.id == $idToSearch[*])) to get the correct result. **Why is the query not returning the correct result?** Is this a bug? If so, what's the cause and has it been filed? If it's not a bug, where am I going wrong in the path query, and what would be the correct way to express it? ____

create table test (
  _id int,
  items jsonb
);

insert into test values (1, '[
  {
    "id": "123"
  },
  {
    "id": "456"
  },
  {
    "id": "789"
  }
]');

select
  jsonb_path_query_array(
    items,
    'strict $[*] ? (@.id != $idToSearch[*])',
    '{"idToSearch":["123","456"]}')
from test;

**dbfiddle**

Charlieface (17545 rep)

Jul 7, 2025, 01:01 PM • Last activity: Jul 7, 2025, 02:39 PM

0 votes

1 answers

206 views

How to append 2 default values to a JSONB column in Postgres SQL 14?

postgresql query postgresql-14 jsonb

Previously I had 4 default values for a jsonb column in my table cities. create_table "cities", force: :cascade do |t| t.jsonb "config", default: {"A"=>5, "B"=>7, "C"=>10, "D"=>10} Now I want to add 2 more default values `"E"=10` and `"F"=0` to this column without changing the values of A,B,C and D....

                                  Previously I had 4 default values for a jsonb column in my table cities.

    create_table "cities", force: :cascade do |t|
      
      t.jsonb    "config",          default: {"A"=>5, "B"=>7, "C"=>10, "D"=>10}

Now I want to add 2 more default values "E"=10 and "F"=0 to this column without changing the values of A,B,C and D. Note that the values of A,B,C and D have been updated differently for different cities. So I just want to append the default values of E and F to the column without changing the existing ones of A,B,C and D.

I cant't quite figure out the UPDATE query for this. I am quite new to this so its a little confusing. Thanks in advance for your help.

Priyanshu0007 (1 rep)

Dec 13, 2022, 06:07 AM • Last activity: Jun 29, 2025, 10:07 PM

2 votes

1 answers

217 views

JSONB vs JSON for column-oriented data in PostgreSQL

postgresql jsonb

I am processing data tables with varying numbers of columns and rows, represented by JSON documents with one array per column. The format of a document is ``` { "column_1": ["value_1_1", "value_1_2", ..., "value_1_n"], "column_2": ["value_2_1", "value_2_2", ..., "value_2_n"], ..., "column_m": ["valu...

I am processing data tables with varying numbers of columns and rows, represented by JSON documents with one array per column. The format of a document is

{
    "column_1": ["value_1_1", "value_1_2", ..., "value_1_n"],
    "column_2": ["value_2_1", "value_2_2", ..., "value_2_n"],
    ...,
    "column_m": ["value_m_1", "value_m_2", ..., "value_m_n"],
}

The number of columns m is typically in the lower tens, while the number of values n lies in the lower millions. Values are either small integers or short strings, and individual data tables stored as text files are 100-200 MB in size. Documents are stored in a jsonb column in PostgreSQL:

Table "data"
             ************

    Column    |  Type   | Nullable 
--------------+---------+----------
 dat_id       | integer | not null
 dat_document | jsonb   | not null

Documents are typically served "as is" to an application or with a simple filter, e.g.,

select col2, col7, col9
from (
        select jsonb_array_elements(dat_document->'column_1') as col1,
               jsonb_array_elements(dat_document->'column_2') as col2,
               jsonb_array_elements(dat_document->'column_5') as col7,
               jsonb_array_elements(dat_document->'column_9') as col9,
        from data
        where dat_id = 20
    ) as sub
where sub.col1::text like '%@yahoo.com';

The documents are stored as jsonb following recommendations from the [PostgreSQL documentation](https://www.postgresql.org/docs/current/datatype-json.html) , however, when using the simpler json type instead in the table *data*, I do not notice a significant drop in execution time for simple queries as the one above. On the other hand, my documents seem to take roughly 50% more disk space according to pg_column_size on the same document as jsonb vs json. Is there any advantage of storing my documents as jsonb instead of json in this case?

monomeric (21 rep)

May 16, 2025, 01:09 PM • Last activity: May 19, 2025, 09:33 AM

1 votes

1 answers

2194 views

Set difference for jsonb

postgresql json jsonb

The *hstore* type in Postgres provides an [operation to calculate the set difference](https://www.postgresql.org/docs/14/hstore.html#HSTORE-OP-TABLE) (EXCEPT) of two records. Is there something similar for jsonb? The best I've got is `SELECT * FROM jsonb_each('{"a":1, "b":2}'::jsonb) EXCEPT SELECT *...

                                  The *hstore* type in Postgres provides an [operation to calculate the set difference](https://www.postgresql.org/docs/14/hstore.html#HSTORE-OP-TABLE)  (EXCEPT) of two records. Is there something similar for jsonb? The best I've got is SELECT * FROM jsonb_each('{"a":1, "b":2}'::jsonb) EXCEPT SELECT * FROM jsonb_each('{"a":1}'::jsonb), but I don't know how to convert it back to jsonb.

I want to use this to calculate the difference (the changed values) between OLD and NEW in a trigger.

jo-so (71 rep)

Mar 15, 2022, 09:49 PM • Last activity: Mar 27, 2025, 08:20 PM

2 votes

0 answers

308 views

like_regex vs jsonb_build_object performance

postgresql-performance postgresql-14 jsonb json-path

I'm working with Postges 14.8 and using JSONB to store data. The queries that I'm working with perform filtering and JOIN'ing based on stuff in this JSONB column named `data`. We need to perform JOIN's to get the data we need. The setup is simple. We have one table that is partitioned based on one o...

I'm working with Postges 14.8 and using JSONB to store data. The queries that I'm working with perform filtering and JOIN'ing based on stuff in this JSONB column named data. We need to perform JOIN's to get the data we need. The setup is simple. We have one table that is partitioned based on one of our columns and then we JOIN rows in different partitions. Each partition has the same amount of rows in our tests - around 8 mln. We'd like to have one place to INSERT rows and partitioning improved our performance. I'm using a gin index like this: CREATE INDEX ix_tab_data ON public.tab USING gin (data jsonb_path_ops); And a btree index for pagination: CREATE INDEX ix_tab_created_timestamp ON public.tab USING btree (created_timestamp DESC); They're then inherited by partitions. Most queries look something like this:

SELECT User.data->>'Name' AS User_Name
, User.data->>'Surname' AS User_Surname
, User.data->>'Gender' AS User_Gender
, Address.data->>'StreetName' AS Address_StreetName
 FROM public.tab_user AS User
 JOIN public.tab_address AS Address ON Address.data @> jsonb_build_object('Id', User.data->'AddressId')
 WHERE
  Address.data @> jsonb_build_object('StreetName', 'Test')
 ORDER BY User.created_timestamp DESC
 LIMIT 500
 OFFSET 1000;

Query plan:

Limit  (cost=10000001275.74..10000001829.33 rows=500 width=136) (actual time=193.677..293.630 rows=500 loops=1)
  Buffers: shared hit=106258
  ->  Nested Loop  (cost=10000000168.54..149189116161.91 rows=125713639119 width=136) (actual time=0.333..293.527 rows=1500 loops=1)
        Buffers: shared hit=106258
        ->  Index Scan using tab_user_created_timestamp_idx on tab_user user  (cost=0.43..9720087.72 rows=8556337 width=205) (actual time=0.012..7.544 rows=8728 loops=1)
              Buffers: shared hit=8754
        ->  Bitmap Heap Scan on tab_address address  (cost=168.11..15972.39 rows=14692 width=373) (actual time=0.032..0.032 rows=0 loops=8728)
              Recheck Cond: ((data @> jsonb_build_object('Id', (user.data -> 'AddressId'::text))) AND (data @> jsonb_build_object('StreetName', 'Test')))
              Rows Removed by Index Recheck: 0
              Heap Blocks: exact=1502
              Buffers: shared hit=97504
              ->  Bitmap Index Scan on tab_address_data_idx  (cost=0.00..164.44 rows=14692 width=0) (actual time=0.030..0.030 rows=0 loops=8728)
                    Index Cond: ((data @> jsonb_build_object('Id', (user.data -> 'AddressId'::text))) AND (data @> jsonb_build_object('StreetName', 'Test')))
                    Buffers: shared hit=96002
Planning:
  Buffers: shared hit=2
Planning Time: 0.175 ms
Execution Time: 293.705 ms

GIN indexes are used and query plans look fine. From what I saw online this @> jsonb_build_object method is recommended by most. However when we've done tests by sending many queries we found they don't perform as fast as we'd like to, so we've started working on improvements. Since jsonb_build_object creates a new JSONB and then we check for containment we've tried to use a json_path. So the query looked like this:

SELECT User.data->>'Name' AS User_Name
, User.data->>'Surname' AS User_Surname
, User.data->>'Gender' AS User_Gender
, Address.data->>'StreetName' AS Address_StreetName
 FROM public.tab_user AS User
 JOIN public.tab_address AS Address ON Address.data@>jsonb_build_object('Id', User.data->'AddressId')
 WHERE
  Address.data @? '$.StreetName ? (@ == "Test")'
 ORDER BY User.created_timestamp DESC
 LIMIT 500
 OFFSET 1000;

Query plan:

Limit  (cost=10000001273.23..10000001825.58 rows=500 width=136) (actual time=189.903..287.950 rows=500 loops=1)
  Buffers: shared hit=106258
  ->  Nested Loop  (cost=10000000168.54..148874820513.06 rows=125713639119 width=136) (actual time=0.207..287.849 rows=1500 loops=1)
        Buffers: shared hit=106258
        ->  Index Scan using tab_user_created_timestamp_idx on tab_user user  (cost=0.43..9720087.72 rows=8556337 width=205) (actual time=0.007..7.470 rows=8728 loops=1)
              Buffers: shared hit=8754
        ->  Bitmap Heap Scan on tab_address address  (cost=168.11..15935.66 rows=14692 width=373) (actual time=0.031..0.031 rows=0 loops=8728)
              Recheck Cond: ((data @> jsonb_build_object('Id', (user.data -> 'AddressId'::text))) AND (data @? '$."StreetName"?(@ == "Test")'::jsonpath))
              Rows Removed by Index Recheck: 0
              Heap Blocks: exact=1502
              Buffers: shared hit=97504
              ->  Bitmap Index Scan on tab_address_data_idx  (cost=0.00..164.43 rows=14692 width=0) (actual time=0.030..0.030 rows=0 loops=8728)
                    Index Cond: ((data @> jsonb_build_object('Id', (user.data -> 'AddressId'::text))) AND (data @? '$."StreetName"?(@ == "Test")'::jsonpath))
                    Buffers: shared hit=96002
Planning:
  Buffers: shared hit=2
Planning Time: 0.141 ms
Execution Time: 288.016 ms

Performance was the same as with jsonb_build_object. We've tried using like_regex instead of == in the json_path and surprisingly it was fastest. Current query looks like this:

SELECT User.data->>'Name' AS User_Name
, User.data->>'Surname' AS User_Surname
, User.data->>'Gender' AS User_Gender
, Address.data->>'StreetName' AS Address_StreetName
 FROM public.tab_user AS User
 JOIN public.tab_address AS Address ON Address.data@>jsonb_build_object('Id', User.data->'AddressId')
 WHERE
  Address.data @? '$.StreetName ? (@ like_regex "Test")'
 ORDER BY User.created_timestamp DESC
 LIMIT 500
 OFFSET 1000;

Query plan:

Limit  (cost=10050199374.23..10051799360.17 rows=500 width=136) (actual time=40.426..61.151 rows=500 loops=1)
  Buffers: shared hit=61138
  ->  Nested Loop  (cost=10046999402.34..402290157972457.19 rows=125713639119 width=136) (actual time=0.054..61.049 rows=1500 loops=1)
        Buffers: shared hit=61138
        ->  Index Scan using tab_user_created_timestamp_idx on tab_user user  (cost=0.43..9720087.72 rows=8556337 width=205) (actual time=0.008..6.836 rows=8728 loops=1)
              Buffers: shared hit=8754
        ->  Bitmap Heap Scan on tab_address address  (cost=46999401.91..47015169.46 rows=14692 width=373) (actual time=0.006..0.006 rows=0 loops=8728)
              Recheck Cond: ((data @> jsonb_build_object('Id', (user.data -> 'AddressId'::text))) AND (data @? '$."StreetName"?(@ like_regex "Test")'::jsonpath))
              Rows Removed by Index Recheck: 1
              Heap Blocks: exact=8744
              Buffers: shared hit=52384
              ->  Bitmap Index Scan on tab_address_data_idx  (cost=0.00..46999398.24 rows=14692 width=0) (actual time=0.003..0.003 rows=1 loops=8728)
                    Index Cond: ((data @> jsonb_build_object('Id', (user.data -> 'AddressId'::text))) AND (data @? '$."StreetName"?(@ like_regex "Test")'::jsonpath))
                    Buffers: shared hit=43640
Planning:
  Buffers: shared hit=2
Planning Time: 0.171 ms
Execution Time: 61.214 ms

All queries return the same rows but like_regex is fastest - which is counterintuitive. Around 16% of rows in public.tab_address have the StreetName we're looking for in our queries. The more JSONB filters in WHERE the bigger the performance difference. We are currently looking for an explanation of this behavior. From what I saw online, regex searches tend to be more resource-intensive. Maybe when combined with the jsonb_path_ops index, Postgres can optimize the path traversal and quickly identify relevant records before applying the regex filter? It seems like query #2 should be faster than a regex search. I guess the performance depends on how the jsonb_path_ops index handles equality checks versus pattern matching. Does someone have an idea why like_regex is faster than jsonb_build_object and == in json_path?

MRzeczkowski (21 rep)

Dec 12, 2023, 12:40 PM • Last activity: Feb 9, 2025, 06:02 PM

1 votes

1 answers

1146 views

How to create a JSON object from table rows containing key-value pairs?

postgresql json jsonb

I am trying to form a `jsonb` object (not array) from the table (`t1`) below | id | key |value | |--------------|------------|------| | 1 | hello|world | | 1 | dns | 192.2.8.0| | 1 | people| 1000| I have tried ``` SELECT jsonb_agg(('{"' || key || '" : "' || value || '"}')::jsonb) as kv_pair FROM t1...

I am trying to form a jsonb object (not array) from the table (t1) below | id | key |value | |--------------|------------|------| | 1 | hello|world | | 1 | dns | 192.2.8.0| | 1 | people| 1000| I have tried

SELECT 
  jsonb_agg(('{"' || key || '" : "' || value || '"}')::jsonb) as kv_pair 
FROM t1 GROUP BY id

This yields [{"hello":"world"}, {"dns":"192.2.8.0"}, {"people":1000}] How do I flatten/ concatenate the array so the output is just a single object like so: {"hello":"world" , "dns":"192.2.8.0" , "people":1000} It is important that I have a single object for an Alembic migration. Thanks in advance

thesunnyscientist (121 rep)

Mar 15, 2022, 11:21 PM • Last activity: Nov 1, 2024, 07:11 PM

0 votes

0 answers

53 views

Are multi column indexes for JSONB columns better?

postgresql index postgresql-performance index-tuning jsonb

we are using PostgreSQL and we have a table we are using JSONB due to the unstructured nature of the data. We are integrating with a ton of different CRMs, so a part of the records data is pretty different from client to client. The schema is something like this: ```sql CREATE TABLE records ( "clien...

CREATE TABLE records (
    "client_id" INTEGER NOT NULL,
    "fields" jsonb NOT NULL
)

For every client_id, the fields has a set of given keys in the JSON. So, we can something like: - client_id=1: thousands of rows with json with the same keys (but different values). - client_id=...: thousands of rows with json with the same keys (but different values). - client_id=n: thousands of rows with json with the same keys (but different values). Right now we have two different indexes: - One on client_id - Another on fields Due to the fact that once you know the client_id, you also know the keys in the fields json, we were wondering if maybe the associated GIN index to fields is more efficient if we create it as a multi column index on both (client_id, fields). Why do we think this? Because maybe postgres only takes into consideration the shape of those json for a given client id in the first column of the multi-colum index, instead of considering the whole column in the whole table. We still have not benchmarked this, but we are curious if PostgreSQL takes things like this into consideration when creating multi-column indexes. **Note**: it seems at least postgres, by default, do not create cross-column stats on multi column indexes ([source](https://stackoverflow.com/a/77554834))

Antonio Gamiz Delgado (111 rep)

Sep 4, 2024, 08:53 AM • Last activity: Sep 4, 2024, 09:09 AM

0 votes

1 answers

137 views

How to build an efficient EAV model over JSONB in Postgresql

postgresql performance database-design eav jsonb

## Context Hello, we have a project we integrate with a ton of different CRMs. That means that when a new client comes in, we get all their data from their CRM and store it in our own postgresql database. This means that having fixed schemas for our application is pretty hard. ## Requirements Our ap...

                                  ## Context

Hello, we have a project we integrate with a ton of different CRMs. That means that when a new client comes in, we get all their data from their CRM and store it in our own postgresql database. This means that having fixed schemas for our application is pretty hard.

## Requirements

Our app adds geographical capabilities to all the CRMs we integrate with by adding a map. Over that map, we then built some features on top. The base needs of these features are:

- Text search: common full text search.
- Filtering / colorising: filtering by range (numeric, datetime), specific values of picklist, location on map (we have a tile server for this), etc. We also support filtering by null or filled values. For example, show me all records where field field is null or not.
- Insert speed is kind of important, because we have to make imports of hundred of thousands of records when integrating a new company. But these are less and less frequent because we are building an event-based update mechanism to be always up to date with the CRMs data.

## Current solution

We chose some time ago to fully implement an entity-atribute-value (EAV) data model. This seemed the best idea given the shape of our data (more or less shapeless, due to how many different objects you have in the CRMs). The structure is more or less the usual in an EAV model:

- One table to store some core fields of the entity (id, creator, created date and so on)
- 5 tables for the 5 different built-in types we support (number, string, picklist, datetime, address).

This seemed great some time ago, but now we are having some difficulties when adding new features to the system. For instance:

- Full text search: this is somewhat hard to implement for some of the field types that we support (because we need to make some data redundant).
- Adding new field types: putting all the information of a complex field type into one field only forces us to create some pretty weird code.
- The tile server we have built to implement the filtering, needs to run some pretty big and prone to error queries to combine filters by several fields.
- Model hard to understand by new joiners to the project.

## Possible new solution

We were thinking about reducing the complexity of the project by combining Schemas and JSONB. Basically just store all records in just one table:

- Some fixed columns in the table with the most used values.
- Json b columns with all the fields coming from the CRM, stored there to be queried by the filter/colorising functionality. We would use GIN index to support full text search.

I have some concerns about the filtering speed. Customers are used to that feature working pretty fast. I'm unsure about how slow will it get if we store everything in a JSONB format. Right now we don't have a crazy amount of records, so I was thinking that this solution could get us to support more clients, and once we have more clients, move the filtering part to a Reversed Index solution like Elastic Search, while leaving all the other functionalities working with Postgresql.

What would you recommend in this case? Is everything clear? Please let me know if I need to clarify something!

  Right now we have these five fields, but they could be reduced by:

- Combining picklist table into the string one.
- Casting all datetimes to strings (we store everything in UTC and we only care about range filters, which given that all is in UTC, then alphabetical comparisons can be used to implement the datetime range filter).

Antonio Gamiz Delgado (111 rep)

Aug 26, 2024, 08:54 AM • Last activity: Aug 26, 2024, 10:04 AM

1 votes

1 answers

116 views

Optimize query by telling Postgres to scan records from the most recent to the oldest

postgresql jsonb

I am using Postgres 12 and in my app I have a table that I am using to store specific events that contain information about things that happened outside of the system and related to some records in my DB. The table looks like this: ``` CREATE TABLE events ( id BIGSERIAL PRIMARY KEY, eventable_type V...

CREATE TABLE events (
    id BIGSERIAL PRIMARY KEY,
    eventable_type VARCHAR(255) NOT NULL,
    eventable_id BIGINT NOT NULL,
    type VARCHAR(255) NOT NULL,
    data JSONB NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
);
CREATE INDEX index_events_on_eventable ON events (eventable_type, eventable_id);

For example: a meeting was booked in Google calendar. An event is created in this table with the details of what happened and the record is associated with the internal representation of the meeting in the DB. The data attribute contains the details of the event that also contain a unique id like:

INSERT INTO events (eventable_type, eventable_id, type, data) VALUES ('MyInternalEvent', 1234, 'GoogleCalendarEvent', '{"action": "created", "GoogleId": "abcdef1234"}'::jsonb);
INSERT INTO events (eventable_type, eventable_id, type, data) VALUES ('MyInternalEvent', 1234, 'GoogleCalendarEvent', '{"action": "updated", "GoogleId": "abcdef1234"}'::jsonb);
INSERT INTO events (eventable_type, eventable_id, type, data) VALUES ('MyInternalEvent', 1234, 'GoogleCalendarEvent', '{"action": "deleted", "GoogleId": "abcdef1234"}'::jsonb);
INSERT INTO events (eventable_type, eventable_id, type, data) VALUES ('MyInternalEvent', 5678, 'GoogleCalendarEvent', '{"action": "created", "GoogleId": "dsfsdf2343"}'::jsonb);
INSERT INTO events (eventable_type, eventable_id, type, data) VALUES ('MyInternalEvent', 5678, 'GoogleCalendarEvent', '{"action": "updated", "GoogleId": "dsfsdf2343"}'::jsonb);
INSERT INTO events (eventable_type, eventable_id, type, data) VALUES ('MyInternalEvent', 5678, 'GoogleCalendarEvent', '{"action": "deleted", "GoogleId": "dsfsdf2343"}'::jsonb);

I query the events table like:

SELECT * FROM events WHERE events.type = 'GoogleCalendarEvent' AND (data->>'GoogleId' = 'abcdef1234') LIMIT 1;

In terms of cardinality of operations, **the number of writes is approximately 3 times than the number of reads**. That is: we write more than we read. The table has around 3 million rows, growing rapidly. About 300k rows are added to the table every day. At the moment we only store one other type of event in the table, let's call it GoogleEmailEvent. Filtering by GoogleCalendarEvent would return roughly 50% of records in the table. Filtering by GoogleId would normally return less than 10 records, but we really only need 1 because they are all associated to the same "Eventable" as you can see in the example inserts. I want to improve the execution time of the query, I have thought about: - adding an index WHERE data->>'GoogleId' IS NOT NULL. But I am worried about slowing down writes - storing data->>'GoogleId' in a separate table together with the id of the event to allow for fast retrieval. How effective would this be? This would also slow down writes somewhat. - indexing created_at and using that in the query to narrow down the records in the query somehow Important detail: The vast majority of the time (99% of the time or more) the matching event is one that has been inserted in the table recently (say, within 10 minutes). Can I take advantage of this details to speed up the query? Would adding ORDER BY Id DESC LIMIT 1 do the trick?

Perennialista (113 rep)

Aug 17, 2024, 05:42 PM • Last activity: Aug 21, 2024, 09:52 AM

0 votes

1 answers

93 views

How to Index/Query JSONB Columns with Arbitrary Keys for Efficient Membership Queries in PostgreSQL?

postgresql query-performance postgresql-performance jsonb

In Postgres, how do I properly index/query `jsonb` columns that only contain key-value pairs where the keys are arbitrary, to speed up membership operations? I have a table structured like this: ```sql CREATE TABLE "assets" ( "asset_id" text NOT NULL, "customer_id" text NOT NULL, "customer_asset_id"...

In Postgres, how do I properly index/query jsonb columns that only contain key-value pairs where the keys are arbitrary, to speed up membership operations? I have a table structured like this:

CREATE TABLE "assets" (
  "asset_id" text NOT NULL,
  "customer_id" text NOT NULL,
  "customer_asset_id" test NOT NULL,
  "attributes" jsonb NOT NULL,
  CONSTRAINT "asset_pk" PRIMARY KEY ("asset_id"),
  CONSTRAINT "asset_uc" UNIQUE ("tenant_id", "customer_asset_id")
);

This table is the driving table for all queries. A typical query (with query parameters resolved) looks like this:

SELECT
  "asset_id",
  "customer_id",
  "attributes",
  "issues"."opened_ts",
  "issues"."issue_title"
FROM
  "assets"
JOIN
/*
  Multiple assets can be assigned to the same issue.

  The insight_ids are Postgres UUIDs.
*/
  "asset_issue" USING ("asset_id") 
JOIN 
  "issues" USING ("issue_id", "customer_id")
WHERE
  "customer_id" = 'someCustomerId'

/*
  This block is generated dynamically based on what asset attributes the customer wishes to filter on.
  The key names and the values are supplied via query parameters.
  Keys and values are properly escaped JSON strings.
*/
AND 
(
    "assets"."attributes"->'someCustomerKey1' IN ('"someCustomerKey1Value1"', '"someCustomerKey1Value2"')
  AND
    "assets"."attributes"->'someCustomerKey2' IN ('"someCustomerKey2Value1"', '"someCustomerKey2Value2"')
)

/*
  Additional predicates have been elided.
  Note that they are mostly temporal range predicates: 
AND
 (COALESCE(:openedTsLowerBound, null) IS NULL OR "insight"."opened_ts" >= :openedTsLowerBound)
AND
  (COALESCE(:openedTsUpperBound, null) IS NULL OR "insight"."opened_ts" < :openedTsUpperBound)
)
*/

ORDER BY
  "issues"."opened_ts" DESC;

How can I index the attributes column (or additional columns) effectively to improve the performance of such queries, considering that the keys within the jsonb data are arbitrary? Is IN even the best way to go about this? Would adding extended statistics help? Note that customer asset counts range greatly from a handful to tens of thousands. So the number of issues per customer ranges greatly into hundreds of thousands per customer.

Jeff (111 rep)

Jul 12, 2024, 03:23 PM • Last activity: Jul 13, 2024, 12:06 AM

1 votes

1 answers

1124 views

EAV or JSONB column for dynamic data that needs good indexing

postgresql index eav jsonb

I have some data in a Postgres database that does not fit into a static schema. There is a table "content" and each piece of content has a user-defined type. Each type can have different user-defined fields. Within a single type those fields are always the same though. You can assume that a typical database has ~5-10 types with each having ~5-25 fields. My original approach to store this was to use JSONB and store the data like this:

{
  "my_field": "foo",
  "another_field": "bar",
  "some_number_field": 5,
  "a_bool_field": true
}

So as key/value pairs where each field has a string id used as the key and the value stored as the type of the field. So of course you have to know if the specific field you are querying is a number of a string, but that information is stored in the DB elsewhere for all content types and their fields. This is indexed with a GIN index using jsonb_path_ops and then can be queried using the @> containment operator. This works pretty nicely for checking equality, but doesn't support any other cases. The problem is that I need to also support more advanced queries here, specifically some that require support for > and < operators. Part of this is because I'm adding timestamps as a type for the fields, and queries that restrict the range based on a timestamp are a very common use case there. As far as I understand this is not possible to do in a generic way using a JSONB column, there is no index similar to the GIN index that would allow these kinds of queries. So I see only two ways to handle this: - dynamically create the right indexes for specific fields - store the data in a EAV pattern with columns for different data types like timestamp, int, ... The first option to create the indexes in the running application based on user input is a bit unorthodox, though I think it would be possible to do this reasonably safely in this particular case. The tenants are separated by schemas, and I'd use partial indexes that are created concurrently to avoid locking the tables. The second option with an entity attribute value system feels icky, but it does seem to be a valid solution for this particular requirement. But everyone seems to strongly advise against EAV every time this comes up, so I'm not entirely sure how problematic this solution would be. Am I missing any solutions here? What are my options here to store this kind of flexible data in a way that still allows fast comparison queries on types like timestamps and numbers?

Fabian (115 rep)

Jan 4, 2024, 02:08 PM • Last activity: Jul 7, 2024, 05:03 AM

1 votes

2 answers

1086 views

Does LZ4 compression have an effect on JSONB columns?

postgresql compression jsonb

After doing some experimentation, it appears as though `COMPRESSION lz4` doesn't have any effect on `JSONB` columns (whereas it does for `JSON`, and `TEXT`). Is that indeed the case? If so, why is that? I could dive into the Postgresql mailing lists and/or source code to try and find the answer, but...

                                  After doing some experimentation, it appears as though COMPRESSION lz4 doesn't have any effect on JSONB columns (whereas it does for JSON, and TEXT). Is that indeed the case? If so, why is that?

I could dive into the Postgresql mailing lists and/or source code to try and find the answer, but hoping someone who already knows the answer can chime in.

user295232 (13 rep)

Jul 3, 2024, 01:51 AM • Last activity: Jul 4, 2024, 01:31 AM

0 votes

1 answers

1346 views

postgres reverse jsonb array

postgresql array jsonb

Postgres 13 I have a table with jsonb array and I need to revert the array. Any idea on how I can do that? ``` CREATE OR REPLACE FUNCTION array_reverse(anyarray) RETURNS anyarray AS $$ SELECT ARRAY( SELECT $1[i] FROM generate_subscripts($1,1) AS s(i) ORDER BY i DESC ); $$ LANGUAGE 'sql' STRICT IMMUT...

Postgres 13 I have a table with jsonb array and I need to revert the array. Any idea on how I can do that?

CREATE OR REPLACE FUNCTION array_reverse(anyarray) RETURNS anyarray AS $$
SELECT ARRAY(
    SELECT $1[i]
    FROM generate_subscripts($1,1) AS s(i)
    ORDER BY i DESC
);
$$ LANGUAGE 'sql' STRICT IMMUTABLE;

create table route (
   session_id serial primary key, 
   name varchar(20), 
   data jsonb
);

insert into route(name, data)
VALUES('NY-LA',
'["{\"x\":40.83382,\"y\":-75.051552}", "{\"x\":33.740055,\"y\":-118.199982}"]'
);

insert into route(name, data)
VALUES('NY-CH',
'["{\"x\":40.83382,\"y\":-75.051552}", "{\"x\":39.740055,\"y\":-90.199982}"]');

SELECT * FROM route;

What I need is to revert the column data. E.g.

data->0->'x' from route where id = 1;

returns 39.740055. I found array_reverse() function suggested by Postgres, but do not understand how to convert jsonb into ARRAY so that I can use array_reverse(array) function. Thank you.

Daniel Kniaz (101 rep)

Sep 13, 2022, 12:00 PM • Last activity: Feb 25, 2024, 08:06 PM

-1 votes

1 answers

46 views

POSTGRES : Data import

postgresql import jsonb

DB Version : 13.x 1. Source table schema\ ```sql CREATE TABLE mig_bus_data mig_id SERIAL, mig_data JSONB ``` 2. Target table schema\ ``` CREATE TABLE BUSINESS id UNIQUE SERIAL, data1 , data2 , . . ``` 3. All indexes on target table dropped 4. Data being imported in blocks of 100 records in parallel...

DB Version : 13.x 1. Source table schema\

CREATE TABLE mig_bus_data
          mig_id  SERIAL,
          mig_data JSONB

2. Target table schema\

CREATE TABLE BUSINESS
       id   UNIQUE SERIAL,
      data1  ,
      data2  ,
      .
      .

3. All indexes on target table dropped 4. Data being imported in blocks of 100 records in parallel using\ Process 1

CURSOR 
     select mig_data from mig_bus_data LIMIT 100  OFFSET 0

Process 2

CURSOR 
     select mig_data from mig_bus_data LIMIT 100  OFFSET 100
   .
   .

Notes : 1. After dropping indexes and doing step 4, there is no significant performance gain. Any pointer to increase speed would help.

CSP (11 rep)

Feb 20, 2024, 10:48 PM • Last activity: Feb 21, 2024, 07:33 PM

1 votes

1 answers

291 views

will null-ing JSONB column in postgres table before performing ALTER TABLE DROP of that column help?

postgresql jsonb

I changed table structure and I am about to remove old JSONB column from about 2.5M rows big table. The JSONB makes this table really big - pg_total_relation_size: 408GB Performing `ALTER TABLE ... DROP COLUMN ..;` was so heavy that it blocked whole db and caused 502 in most of the services. QUESTIO...

                                  I changed table structure and I am about to remove old JSONB column from about 2.5M rows big table.
The JSONB makes this table really big - pg_total_relation_size: 408GB

Performing ALTER TABLE ... DROP COLUMN ..; was so heavy that it blocked whole db and caused 502 in most of the services.

QUESTION: if I set that JSONB column NULL before performing ALTER TABLE DROP COLUMN will it go faster? Or is there any other option excluding the one with coping all the collumns except the one have aimed for droping into seperate table and swapping names afterwards?

andilabs (697 rep)

Feb 8, 2024, 05:31 PM • Last activity: Feb 8, 2024, 05:55 PM

0 votes

1 answers

2241 views

Computing hashes in PostgreSQL for JSONB columns

postgresql hashing jsonb

From [PostgreSQL docs][1]: > `jsonb` does not preserve white space, does not preserve the order of object keys, and does not keep duplicate object keys. If duplicate keys are specified in the input, only the last value is kept. I have a `jsonb` column that contains some data. I extract specific key...

From PostgreSQL docs : > jsonb does not preserve white space, does not preserve the order of object keys, and does not keep duplicate object keys. If duplicate keys are specified in the input, only the last value is kept. I have a jsonb column that contains some data. I extract specific key data (which is also json) and hash it using sha256, something like:

SELECT sha256(to_jsonb(rule_element) :: TEXT :: BYTEA);

Where rule_element was extracted from the original data. Given that it does not maintain order of keys and the original column of jsonb may be updated in the future, I do not think it is safe to compute hashes as they might change. Can I compute consistent hashes for jsonb columns? (99% Sure you can't, 1% hope to make my work easier) The hashes would be used for unique identification purposes, basically acting as a unique key for different types of JSON compositions.

VIAGC (103 rep)

Jan 17, 2024, 05:27 AM • Last activity: Jan 17, 2024, 11:28 AM

0 votes

1 answers

135 views

User-defined aggregate function to sum jsonb values

postgresql aggregate plpgsql jsonb

I have agg function that sums decimals. Business logic omitted for brevity ```psql CREATE OR REPLACE FUNCTION calculate_snapshot_internal_decimal(_this_value DECIMAL, _next_value DECIMAL OUT _code DECIMAL) LANGUAGE plpgsql AS $func$ BEGIN _code = _this_value + _next_value; END $func$; ``` Now I need...

I have agg function that sums decimals. Business logic omitted for brevity

CREATE OR REPLACE FUNCTION calculate_snapshot_internal_decimal(_this_value DECIMAL,
                                                               _next_value DECIMAL
                                                               OUT _code DECIMAL)
    LANGUAGE plpgsql AS
$func$
BEGIN

    _code = _this_value + _next_value;

END
$func$;

Now I need the same for JSONB where key is string and value is always BigDecimal _this_value:

{"weight": "50", "height":"170", "size": "2"}

_next_value

{"weight": "40", "height":"20"}

aggregated result:

{"weight": "90", "height":"190", "size": "2"}

What could I try?

Capacytron (137 rep)

Oct 3, 2023, 01:27 PM • Last activity: Nov 29, 2023, 05:22 AM

Showing page 1 of 20 total questions