Sample Header Ad - 728x90

Slow queries on JSON_TABLE based view on mariaDB

1 vote
1 answer
68 views
We a running an instance of Apache syncope with around 130k users (ca. 33k with active or pending state). When searching for users by attributes we see long query times often over 50 seconds. One example query might be
select * from user_search_attr 
where schema_id = 'familyName' 
and stringvalue = 'Bergmann'
user_search_attr is a view defined as follows:
CREATE OR REPLACE
ALGORITHM = UNDEFINED VIEW user_search_attr AS
select
    u.id AS any_id,
    attrs.schema_id AS schema_id,
    attrs.booleanvalue AS booleanvalue,
    attrs.datevalue AS datevalue,
    attrs.doublevalue AS doublevalue,
    attrs.longvalue AS longvalue,
    attrs.stringvalue AS stringvalue
from
    (SyncopeUser u
left join JSON_TABLE(coalesce(u.plainAttrs, '[]'), '$[*]' 
   COLUMNS (schema_id varchar(255) PATH '$.schema', 
   NESTED PATH '$.values[*]' 
   COLUMNS (booleanvalue int(11) PATH '$.booleanValue', 
            datevalue varchar(32) PATH '$.dateValue', 
            doublevalue double PATH '$.doubleValue', 
            longvalue bigint(20) PATH '$.longValue', 
            stringvalue varchar(255) PATH '$.stringValue'))) attrs on
    (1 = 1))
where
    attrs.schema_id is not null
    and (attrs.booleanvalue is not null
        or attrs.datevalue is not null
        or attrs.doublevalue is not null
        or attrs.longvalue is not null
        or attrs.stringvalue is not null);
As you can see the data comes from the table SyncopeUser which is defined as follows:
CREATE TABLE SyncopeUser (
  id varchar(36) NOT NULL,
  creationContext varchar(255) DEFAULT NULL,
  creationDate datetime(3) DEFAULT NULL,
  creator varchar(255) DEFAULT NULL,
  lastChangeContext varchar(255) DEFAULT NULL,
  lastChangeDate datetime(3) DEFAULT NULL,
  lastModifier varchar(255) DEFAULT NULL,
  status varchar(255) DEFAULT NULL,
  changePwdDate datetime(3) DEFAULT NULL,
  cipherAlgorithm varchar(20) DEFAULT NULL,
  failedLogins int(11) DEFAULT NULL,
  lastLoginDate datetime(3) DEFAULT NULL,
  mustChangePassword int(11) DEFAULT NULL,
  password varchar(255) DEFAULT NULL,
  passwordHistory text DEFAULT NULL,
  securityAnswer varchar(255) DEFAULT NULL,
  suspended int(11) DEFAULT NULL,
  token text DEFAULT NULL,
  tokenExpireTime datetime(3) DEFAULT NULL,
  username varchar(255) DEFAULT NULL,
  plainAttrs longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_bin
              DEFAULT NULL CHECK (json_valid(plainAttrs)),
  REALM_ID varchar(36) DEFAULT NULL,
  SECURITYQUESTION_ID varchar(36) DEFAULT NULL,
  PRIMARY KEY (id),
  UNIQUE KEY U_SYNCPSR_USERNAME (username),
  UNIQUE KEY SyncopeUser_username (username),
  KEY SECURITYQUESTION_ID (SECURITYQUESTION_ID),
  KEY SyncopeUser_realm_id (REALM_ID),
  CONSTRAINT SyncopeUser_ibfk_1 FOREIGN KEY (REALM_ID) 
           REFERENCES Realm (id),
  CONSTRAINT SyncopeUser_ibfk_2 FOREIGN KEY (SECURITYQUESTION_ID)
           REFERENCES SecurityQuestion (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_general_ci;
The relevant column plainAttrs contains the users attibutes as JSON string. One example might be [{"uniqueValue":{"stringValue":"123456789"},"schema":"lockSystemId"},{"values":[{"stringValue":"Bergmann"}],"schema":"familyName"}]. The following indexes are defined on SyncopeUser:
Table      |Non_unique|Key_name            |Seq_in_index|Column_name        |Collation|Cardinality|Sub_part|Packed|Null|Index_type|Comment|Index_comment|Ignored|
-----------+----------+--------------------+------------+-------------------+---------+-----------+--------+------+----+----------+-------+-------------+-------+
SyncopeUser|         0|PRIMARY             |           1|id                 |A        |     149635|        |      |    |BTREE     |       |             |NO     |
SyncopeUser|         0|U_SYNCPSR_USERNAME  |           1|username           |A        |     149635|        |      |YES |BTREE     |       |             |NO     |
SyncopeUser|         0|SyncopeUser_username|           1|username           |A        |     149635|        |      |YES |BTREE     |       |             |NO     |
SyncopeUser|         1|SECURITYQUESTION_ID |           1|SECURITYQUESTION_ID|A        |          1|        |      |YES |BTREE     |       |             |NO     |
SyncopeUser|         1|SyncopeUser_realm_id|           1|REALM_ID           |A        |          1|        |      |YES |BTREE     |       |             |NO     |
As most normal users have over 15 attributes the user_search_attr view contains over 2M rows. The servers are VMs with currently 8 GB of RAM and 6 CPUs but this can be changed. We are currently running version 11.4.7 of MariaDB in a galera cluster of 3 Servers. Syncope is only connected to one of the servers as this cluster is also used for other applications and we don't want to influnce these other applications. Here are some of the IMHO relevant configuration options:
[mysqld]
aria_pagecache_buffer_size = 32M
basedir = /usr
bind-address = ::
binlog_format = ROW
character_set_server = utf8
collation_server = utf8_general_ci
innodb_adaptive_hash_index = OFF
innodb_autoinc_lock_mode = 2
innodb_buffer_pool_instances = 1
innodb_buffer_pool_size = 4096M
innodb_flush_log_at_trx_commit = 0
innodb_log_buffer_size = 16MB
innodb_log_file_size = 512M
key_buffer_size = 1M
log-error = /var/log/mariadb/mariadb.log
log_queries_not_using_indexes = OFF
long_query_time = 4.0
max_allowed_packet = 128M
max_binlog_size = 100M
max_connections = 400
max_heap_table_size = 256M
performance_schema = ON
query_cache_limit = 16M
query_cache_size = 0
query_cache_type = OFF
skip-external-locking
skip-name-resolve = 1
slow_query_log = 0
slow_query_log_file = /var/log/mariadb/slow.log
sort_buffer_size = 4M

table_definition_cache = 800
table_open_cache = 1000
thread_cache_size = 8
thread_stack = 256K
tmp_table_size = 256M
wsrep_on = ON
wsrep_provider = /usr/lib64/galera-4/libgalera_smm.so
As most queries only search for active or pending users one option that I have not tried is indexing or partitioning the table for the status column. Currently we are not in the position of changing the design of the database or even switching to another IDM sollution. We hope to find optimazion potential on the DB layer or maybe some improvements on some queries.
Asked by Clemens Bergmann (111 rep)
Aug 3, 2025, 06:44 AM
Last activity: Aug 5, 2025, 08:07 AM