Sample Header Ad - 728x90

Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes
1 answers
285 views
How to call a function in regex_replace with matches?
I'd like to use some processing function instead of `replacement text` in `regex_replace` function. Is there possible to use a function with matches as arguments, like: SELECT REGEXP_REPLACE( description, '\[([is]?:?)(\d+)\]', CONCAT_WS(' ', \1, \2), 'gi' ) FROM some_table WHERE id = 123; Of course,...
I'd like to use some processing function instead of replacement text in regex_replace function. Is there possible to use a function with matches as arguments, like: SELECT REGEXP_REPLACE( description, '\[([is]?:?)(\d+)\]', CONCAT_WS(' ', \1, \2), 'gi' ) FROM some_table WHERE id = 123; Of course, backreferences \1 and \2 do not work here, but is there some way to use variables instead of them? Background: description may contain more than one string with a pattern like ` or [i:12345] or [s:12345] and I'd like to process such matches into advanced strings, so I thought the best way is to write a custom function and call it with matches of regex_replace`. I just did not found in the docs, how could I use matches as variable/arguments. Is this possible or what is a better way to accomplish my goal? **EDIT** I show description field before and after: Lorem ipsum dolor sit amet , consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et magna aliqua. should be after replacements like: Lorem ipsum dolor sit amet "one-two-three", consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et "two-three-four" magna aliqua.
w.k (141 rep)
Oct 13, 2020, 10:08 PM • Last activity: May 20, 2025, 05:00 AM
149 votes
8 answers
246879 views
Pattern matching with LIKE, SIMILAR TO or regular expressions
I had to write a simple query where I go looking for people's name that start with a B or a D: SELECT s.name FROM spelers s WHERE s.name LIKE 'B%' OR s.name LIKE 'D%' ORDER BY 1 I was wondering if there is a way to rewrite this to become more performant. So I can avoid `or` and / or `like`?
I had to write a simple query where I go looking for people's name that start with a B or a D: SELECT s.name FROM spelers s WHERE s.name LIKE 'B%' OR s.name LIKE 'D%' ORDER BY 1 I was wondering if there is a way to rewrite this to become more performant. So I can avoid or and / or like?
Lucas Kauffman (1835 rep)
Jan 15, 2012, 11:24 AM • Last activity: May 20, 2025, 01:30 AM
0 votes
1 answers
304 views
Snowflake - extracting country/ state names - RegEx or Array?
I have a bunch of sloppy Geo text in a field. However the country/ state names or codes are relatively clean. It'll say Poland 23 or Illinois Remote or Frogballs, Germany or TX, AL, AK. I have a finite list of country names/ codes ... and US 50 state names, codes. I'm trying to figure out the best w...
I have a bunch of sloppy Geo text in a field. However the country/ state names or codes are relatively clean. It'll say Poland 23 or Illinois Remote or Frogballs, Germany or TX, AL, AK. I have a finite list of country names/ codes ... and US 50 state names, codes. I'm trying to figure out the best way to convert the "trash STATENAME trash" into a clean state name or country name. I'm thinking either go the array route STRTOK_TO_ARRAY(location_field) - which will convert the string to 'word items' in an array. But I'm not sure the best function to extract a matching 'item' within an array. Array_contains() merely is true/ false. Not "Poland". Maybe regex is better for this purpose? Something like regexp_like(location_field,country_list|country_list,'i'). Only issue here is that -- only want to match countries/ states that are a "word" (preceding or trailing space) -- not "AL" for Alabama when it's part of portugAL for instance.
user45867 (1739 rep)
Jun 5, 2023, 09:32 PM • Last activity: May 14, 2025, 08:00 AM
1 votes
2 answers
66 views
How to extract certificate attributes from a single column and split into many?
In my table I've a field (let say `cert_attr`) which stores [Certificate X.509 Attributes](https://docs.oracle.com/cd/E24191_01/common/tutorials/authz_cert_attributes.html). Here is the example of 3 rows (each line corresponds to a field): "CN=User1, OU=Eng, O=Company Ltd, L=D4, S=Dublin, C=IE" "CN=...
In my table I've a field (let say cert_attr) which stores [Certificate X.509 Attributes](https://docs.oracle.com/cd/E24191_01/common/tutorials/authz_cert_attributes.html) . Here is the example of 3 rows (each line corresponds to a field): "CN=User1, OU=Eng, O=Company Ltd, L=D4, S=Dublin, C=IE" "CN=User2, OU=Eng, O=Company Ltd, L=D2, S=Dublin, C=IE" "OU=Eng, O=Company Ltd" And I'm trying to split the value of a field into separate columns using SELECT in the following way: SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(cert_attr, "CN=", -1), ", ", 1) as CN, SUBSTRING_INDEX(SUBSTRING_INDEX(cert_attr, "OU=", -1), ", ", 1) as OU, SUBSTRING_INDEX(SUBSTRING_INDEX(cert_attr, "O=", -1), ", ", 1) as O, SUBSTRING_INDEX(SUBSTRING_INDEX(cert_attr, "L=", -1), ", ", 1) as L, SUBSTRING_INDEX(SUBSTRING_INDEX(cert_attr, "ST=", -1), ", ", 1) as ST, SUBSTRING_INDEX(SUBSTRING_INDEX(cert_attr, "C=", -1), ", ", 1) as C FROM mytable which works, however there is an issue for the rows which are missing some attributes. So in the case where the attribute is missing in the field's string, I expect the column to be empty, but it returns the whole string instead. The first two row examples are working as expected, which returns the following columns correctly: | CN | OU | O | L | S. | C | | ----- | --- | ----------- | -- | ------ | -- | | User1 | Eng | Company Ltd | D4 | Dublin | IE | | User2 | Eng | Company Ltd | D2 | Dublin | IE | The problem is with the 3rd row example, which I expect to return an empty string when the substring pattern is not found: | CN | OU | O | L | S. | C | | ---------- | --- | ----------- | ---------- | ---------- | ---------- | | OU=Eng,... | Eng | Company Ltd | OU=Eng,... | OU=Eng,... | OU=Eng,... | but instead the whole string is returned. **Question:** Is there any way to return an empty string when SUBSTRING_INDEX() fails to find the substring? Or maybe there is some other function (like a regular expression) or another workaround? My goal is to extract the data into TSV file by having these attributes in separate columns with valid values: mysql mytable cert_attributes.tsv
kenorb (475 rep)
Oct 23, 2020, 07:10 PM • Last activity: Mar 22, 2025, 03:28 PM
2 votes
2 answers
226 views
(non-breaking space) is not considered whitespace by Postgres?
String-functions in Postgres do not recognize non-breaking-space as whitespace, neither when trimming nor when using regular expressions: ~~~pgsql select 'x' || test || 'x' , 'x' || trim(test) || 'x' , 'x' || regexp_replace(test, '\s+', '') || 'x' from (values (''), (' '), ('   y&#160...
String-functions in Postgres do not recognize non-breaking-space as whitespace, neither when trimming nor when using regular expressions: ~~~pgsql select 'x' || test || 'x' , 'x' || trim(test) || 'x' , 'x' || regexp_replace(test, '\s+', '') || 'x' from (values (''), (' '), ('   y   '), ('s s s s') ) as foo(test) ~~~ (Not sure if the non-breaking ( ) survives in the above code, but the two last rows contain nbsp.) Onecompiler SQL demonstration Is it a Postgres-thing not to handle this, or is it a bug? I know of char(160) for nbsp, but would prefer en general-purpose way to strip all whitespace. Collation in use is en_US.utf8
Rune (123 rep)
Feb 27, 2025, 11:57 AM • Last activity: Mar 5, 2025, 02:00 PM
3 votes
3 answers
10527 views
Oracle regexp_like square brackets inside character set
This questions is in regards to Oracle, pl/sql, and the regexp_like function. I am trying to build a character set that will match on all typical special characters. My character set currently looks like: pattern := '[-~`!@#$%^&*\(\)\\{}_+=|''";:,./?]+'; I would like to add the square brackets to th...
This questions is in regards to Oracle, pl/sql, and the regexp_like function. I am trying to build a character set that will match on all typical special characters. My character set currently looks like: pattern := '[-~`!@#$%^&*\(\)\\{}_+=|''";:,./?]+'; I would like to add the square brackets to this character set, however, whatever I try to add ']' is not working. Here is a simple example that illustrates the problem: select case when regexp_like('w]ord', '[\]]+') then 'true' else 'false' end from dual; This returns false, meaning it did not match the ']' character. Curiously, I can get the '[' character to match because this returns true: select case when regexp_like('w[ord', '[\[]+') then 'true' else 'false' end from dual;
Elijah W. Gagne (705 rep)
Aug 6, 2015, 09:13 PM • Last activity: Feb 25, 2025, 05:46 AM
0 votes
3 answers
2870 views
REGEXP_SUBSTR to split delimited values
I have a field called `FREQUENCY` and it has values like `'10,0;30'`. How do I write a regular expression to get the result like this: FREQUENCY 10 0 30
I have a field called FREQUENCY and it has values like '10,0;30'. How do I write a regular expression to get the result like this: FREQUENCY 10 0 30
Jayachandran Jayaseelan (1 rep)
Mar 23, 2021, 10:40 AM • Last activity: Feb 19, 2025, 01:00 AM
3 votes
1 answers
603 views
Regular Expression Integrity Constraints in SQLite3
In a SQLite3 CREATE TABLE statement, can I use a regular expression to enforce a particular constraint? Specifically, I am trying to require that an inputted URL is valid. I know there are other ways of doing this besides using regular expressions, but because of the structure of my project, this is...
In a SQLite3 CREATE TABLE statement, can I use a regular expression to enforce a particular constraint? Specifically, I am trying to require that an inputted URL is valid. I know there are other ways of doing this besides using regular expressions, but because of the structure of my project, this is the method I need.
Colin Soleim (31 rep)
Feb 21, 2013, 08:50 PM • Last activity: Dec 27, 2024, 10:00 PM
0 votes
3 answers
1414 views
Remove spaces and replace characters with regexp_replace()
I wish to ... 1. ... remove spaces 2. ... delete apostrophes 3. ... replace 'é' and 'è' with 'e' I use the function `regexp_replace()`. For the moment, I can delete the spaces but poorly. Indeed, when the attribute contains several spaces only one is deleted. I can't process *1.*, *2.*, an...
I wish to ... 1. ... remove spaces 2. ... delete apostrophes 3. ... replace 'é' and 'è' with 'e' I use the function regexp_replace(). For the moment, I can delete the spaces but poorly. Indeed, when the attribute contains several spaces only one is deleted. I can't process *1.*, *2.*, and *3.* at the same time. Is this possible? Below is a link to my code: https://dbfiddle.uk/22ODtpNS
fcka (125 rep)
May 25, 2024, 07:51 PM • Last activity: May 27, 2024, 05:44 PM
1 votes
1 answers
636 views
Regex help in Snowflake
I am trying to write a regex that can replace '[', ']' and '].' with '_' and in cases where ']' is the last character it should be replaced with '' but I am struggling to come up with a regex for it that works in all cases because Snowflake does not support lookahead & lookbehind. My question is, ha...
I am trying to write a regex that can replace '[', ']' and '].' with '_' and in cases where ']' is the last character it should be replaced with '' but I am struggling to come up with a regex for it that works in all cases because Snowflake does not support lookahead & lookbehind. My question is, has anyone tried/achieved to do this before? or is it impossible and I should just give up?? Eg: - look_behind --> look_behind_0 - look_behind_positive --> look_behind_0_positive_1
Pepe Silvia (11 rep)
Apr 14, 2024, 11:20 PM • Last activity: Apr 15, 2024, 11:40 PM
0 votes
1 answers
495 views
Optimizing regex based query in sqlite
I'm using an sqlite database to store manually created labels for some data automatically queried from a live system. The data from the live system consists primarily of an address, comprised of 3 parts. Let's use URLs an example, the three parts being the protocol, the domain and the path. Initiall...
I'm using an sqlite database to store manually created labels for some data automatically queried from a live system. The data from the live system consists primarily of an address, comprised of 3 parts. Let's use URLs an example, the three parts being the protocol, the domain and the path. Initially, I would load a couple 100k worth of addresses into a table with each field of the address being a column and together building the primary key. The labels are then in additional columns.
CREATE TABLE OldWebsites (
    protocol VARCHAR (255) NOT NULL,
    domain   VARCHAR (255) NOT NULL,
    path     VARCHAR (255) NOT NULL,
    label1   INTEGER,
    label2   TEXT,
    CONSTRAINT address PRIMARY KEY (
        protocol,
        domain,
        path
    )
);
I found myself repeating labels over and over based on certain patterns that the address would match. Since I would always extend this table with new data and remove old data, this became too much of a hazzle, so I tried a different approach, namely just loading the existing addresses into one table and then have other tables for the data where I would write regex matchers for the address components
CREATE TABLE Websites (
    protocol VARCHAR (255) NOT NULL,
    domain   VARCHAR (255) NOT NULL,
    path     VARCHAR (255) NOT NULL,
    CONSTRAINT address PRIMARY KEY (
        protocol,
        domain,
        path
    )
);

CREATE TABLE Label1 (
    protocol_re VARCHAR (255) NOT NULL,
    domain_re   VARCHAR (255) NOT NULL,
    path_re     VARCHAR (255) NOT NULL,
    label1   INTEGER
    CONSTRAINT address_matcher PRIMARY KEY (
        protocol_re,
        domain_re,
        path_re
    )
);

CREATE TABLE Label2 (
    protocol_re VARCHAR (255) NOT NULL,
    domain_re   VARCHAR (255) NOT NULL,
    path_re     VARCHAR (255) NOT NULL,
    label2   TEXT,
    CONSTRAINT address_matcher PRIMARY KEY (
        protocol_re,
        domain_re,
        path_re
    )
);
Assume that I have already (using other queries) guaranteed, that there is exactly one match in each label table for each address in the Websites table. I would now like to write a query that reconstructs a table like the original OldWebsites one by matching labels and automatically queried data. Something like this
SELECT Websites.*,
       Label1.label1,
       Label2.label2
  FROM Websites
       JOIN
       Label1 ON (Websites.protocol REGEXP '^' || Label1.protocol_re || '$' AND 
                  Websites.domain REGEXP '^' || Label1.domain_re || '$' AND 
                  Websites.path REGEXP '^' || Label1.path_re || '$') 
       JOIN
       Label2 ON (Websites.protocol REGEXP '^' || Label2.protocol_re || '$' AND 
                  Websites.domain REGEXP '^' || Label2.domain_re || '$' AND 
                  Websites.path REGEXP '^' || Label2.path_re || '$');
Now.. this is really slow, especially for more label tables, using PCRE sqlite3 extension for the REGEXP function. I would like to know if there's way to optimize this query using either parallelization (the query should run ideally from python) or using the knowledge that there is exactly 1 match in each Label table. From my understanding, multiple inner joins should take at most the sum of the individual joins, correct? Perhaps indexes are also helpful, but I have only a basic idea of what they are and no idea whether they would be of help here.
Xaser (115 rep)
Feb 4, 2021, 03:48 PM • Last activity: Feb 21, 2024, 04:01 PM
0 votes
1 answers
208 views
Query for converting a column of numbers matching a regex in MySQL
I have a column which contains string values of the format e.g `AB12345678` - two letters followed by 8 numeric digits. I need to write a MySQL query to insert a `5` before each set of 4 digits thus `AB12345678` becomes `AB5123455678`. I'm close, the following works for one value in the column, but...
I have a column which contains string values of the format e.g AB12345678 - two letters followed by 8 numeric digits. I need to write a MySQL query to insert a 5 before each set of 4 digits thus AB12345678 becomes AB5123455678. I'm close, the following works for one value in the column, but not for multiple values - all values are written with the same first value encountered it seems.
SELECT @twoeight := os_grid_ref FROM projects WHERE os_grid_ref REGEXP '[A-Z]{2}[0-9]{4}[0-9]{4}';
SELECT @prefix1 := SUBSTR(@twoeight,1,2);
SELECT @part1 := SUBSTR(@twoeight,3,4);
SELECT @part2 := SUBSTR(@twoeight,7,4);
UPDATE projects SET os_grid_ref=CONCAT(@prefix1,'5',@part1,'5',@part2) WHERE os_grid_ref REGEXP '[A-Z]{2}[0-9]{4}[0-9]{4}';
What do I need to adjust so that the query can modify each value in the column individually. Thank you.
therobyouknow (133 rep)
Nov 21, 2023, 09:34 PM • Last activity: Nov 21, 2023, 09:58 PM
2 votes
2 answers
538 views
Return part of a string with variable length
I have data in a table column like below: ``` host=0.0.0.0 port=5432 dbname=database_name user=pglogicaluser host=0.0.0.0 port=5432 dbname=database_name2 user=pglogicaluser ``` I want to write a query to get the database names only like below: ``` database_name database_name2 ``` I could come up wit...
I have data in a table column like below:
host=0.0.0.0 port=5432 dbname=database_name user=pglogicaluser
host=0.0.0.0 port=5432 dbname=database_name2 user=pglogicaluser
I want to write a query to get the database names only like below:
database_name
database_name2
I could come up with something like:
select substring(column_name, '.+dbname=(.*)$') from table_name;
However, I couldn't figure how to stop the extraction before the user keyword starts.
Sayad Xiarkakh (532 rep)
Mar 21, 2022, 06:28 PM • Last activity: Sep 3, 2023, 04:21 PM
1 votes
2 answers
2104 views
Replace every character in each word, except the first two characters
Due to privacy concerns, I need to hide personal information, while remaining somewhat understandable. I propose to achieve that by hiding everything except the first two characters. The word length should stay the same: ```lang-none FirstName SecondName LastName -> Fi****** Se******** La****** ```...
Due to privacy concerns, I need to hide personal information, while remaining somewhat understandable. I propose to achieve that by hiding everything except the first two characters. The word length should stay the same:
-none
FirstName SecondName LastName -> Fi****** Se******** La******
My attempt is working, but not 100%:
SELECT regexp_replace('Firstname Secondname Lastname', '\y(\S)(\S)','\1' 
       || repeat('*', length('\2')),'g');
Output:
-none
F**rtname S**condname L**stname
With only first character it is easier, and it works, but I need the second character too:
SELECT regexp_replace('Firtname Secondname Lastname', '\Y\w', '*','g');
Output:
-none
F******* S********* L*******
[Demo](https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/7139)
oren (13 rep)
Feb 10, 2023, 03:25 PM • Last activity: Aug 20, 2023, 08:45 AM
0 votes
0 answers
103 views
Counting Rows From a Value in a SQL text box - please please can anyone help?
I just can't find an example of this anywhere and hoping someone can help. I have a query which returns a list of care services and other fields based on the care service table values and calculated expressions. One of the calculated expressions for each row is as follows: ``` =iif(Fields!NoCareInsp...
I just can't find an example of this anywhere and hoping someone can help. I have a query which returns a list of care services and other fields based on the care service table values and calculated expressions. One of the calculated expressions for each row is as follows:
=iif(Fields!NoCareInspection.Value=0,1,0)+
iif(Fields!NoCareInspection.Value>2,1,0)+
iif((Fields!IncidentsPrevMth.Value-(Fields!NoofIncidentsLast12.Value/12)>0 AND 
 Fields!IncidentsPrev2ndMth.Value-(Fields!NoofIncidentsLast12.Value/12)>0),1,0)+
iif(Fields!ConcernsPer100Last12.Value-Avg(Fields!ConcernsPer100Last12.Value, "DataSet2")>0,1,0)+
iif(Fields!CurrManagerStatus.Value  "Registered",1,0)+
iif((Fields!ManagerApplicationsLast24.Value+Fields!AbsencesLast24.Value)>2,1,0)+
iif((Fields!ENFMeetingsLast24.Value+Fields!FTCsLast24.Value+Fields!INsLast24.Value+Fields!NOPsLast24.Value+Fields!UPsLast24.Value)>0,1,0)+
iif(Fields!IncidentsPrev3Mths.Value =0,1,0)+
iif(Fields!NoofAFIsLast12.Value-Avg(Fields!NoofAFIsLast12.Value, "DataSet2")>0,1,0)+
iif(Fields!RPNotReg.Value >0,1,0)
This evaluates a number of field values from the care homes table and nicely calculates a risk score for each care home (row) -this works a charm. Now what I need is a summary table to count how many rows had a total score of 0,1,2 and so forth. What I have tried is different permutations of this kind of statement:
=SUM(iif(iif(Fields!NoCareInspection.Value=0,1,0)+
iif(Fields!NoCareInspection.Value>2,1,0)+
iif((Fields!IncidentsPrevMth.Value-(Fields!NoofIncidentsLast12.Value/12)>0 AND
Fields!IncidentsPrev2ndMth.Value-(Fields!NoofIncidentsLast12.Value/12)>0),1,0)+
iif(Fields!ConcernsPer100Last12.Value-Avg(Fields!ConcernsPer100Last12.Value, "DataSet2")>0,1,0)+
iif(Fields!CurrManagerStatus.Value  "Registered",1,0)+
iif((Fields!ManagerApplicationsLast24.Value+Fields!AbsencesLast24.Value)>2,1,0)+
iif((Fields!ENFMeetingsLast24.Value+Fields!FTCsLast24.Value+Fields!INsLast24.Value+Fields!NOPsLast24.Value+Fields!UPsLast24.Value)>0,1,0)+
iif(Fields!IncidentsPrev3Mths.Value =0,1,0)+
iif(Fields!NoofAFIsLast12.Value-Avg(Fields!NoofAFIsLast12.Value, "DataSet2")>0,1,0)+
iif(Fields!RPNotReg.Value >0,1,0)=1,1,0))
I thought it was going to run but got this message: 'Inner aggregates cannot specify a dataset scope' I have spent a few days trying to untangle this - any help would be immensely appreciated.
Paula Morrison (11 rep)
Jul 3, 2023, 10:58 AM • Last activity: Aug 8, 2023, 09:00 AM
2 votes
3 answers
167 views
Find "naked" varchars
It is clear that omitting the length from a varchar is [a bad thing][1]. Unfortunately I am now working with a code base where this has happened. Extensively. I would like to correct this. The first step will be to find the occurrences. This is where I need help. Searches on various web engines usin...
It is clear that omitting the length from a varchar is a bad thing . Unfortunately I am now working with a code base where this has happened. Extensively. I would like to correct this. The first step will be to find the occurrences. This is where I need help. Searches on various web engines using all the synonyms I can think of return no authoritative answer. I'm asking for - additional test cases which I've missed - a comprehensive, canonical way of finding declarations without a length Any technology which is typically available on a Windows development environment (SSMS, Powershell, .Net etc.) is good. Answers employing more niche technologies would be interesting for the wider community but less so for me personally. ### Tests ### Since the four data types in question - char, nchar, varchar and nvarchar - all end with the characters C-H-A-R I use this alone in the tests below. This saves bloating the list and makes adding further tests simpler. It'll be easy enough to copy-paste-replace should the need arise.
-- These are all legal; the regex must not return these
char(9)
char (9)    		-- with a space
char	(9)			-- with a tab
char	 	 (9)	-- tab space tab space
char(max)
char
(9)                 -- a new line between type and length

character(9)
CAST(999 AS character(9))

char varying(9)
character varying(9)
CAST(999 AS char varying(9))
CAST(999 AS character varying(9))


-- These also are legal; ugly, but legal
[char](9)
[char] (9)    		-- with a space
[char]	(9)			-- with a tab
[char]	 	 (9)	-- tab space tab space
[char](max)
[char]
(9)                 -- a new line between type and length

-- The type can also be delimited by double-quote
"char"(9)
-- All the tests using square brackets should be duplicated with other delimiters.

[character](9)
CAST(999 AS [character](9))

-- SQL Server 2022 throws an error for [character varying]
-- Msg 243, Level 16, State 1, Line 15
-- Type character varying is not a defined system type.


-- These are business terms which the regex should not return
characteristic
charge
chart

-- These are valid SQL but missing the length. These are what the search should return
char;
char ;		-- a space
char	;	-- a tab
char,
char ,
char = 'lorem'
cast(9 as char)
convert(char, 9)

[char];
[char] ;		-- a space
[char]	;	-- a tab
[char],
[char] ,
[char] = 'lorem'
cast(9 as [char])
convert([char], 9)

character
CAST(999 AS character)

char varying
character varying
CAST(999 AS char varying)
CAST(999 AS character varying)
Michael Green (25265 rep)
Jul 12, 2023, 12:32 PM • Last activity: Jul 20, 2023, 03:14 AM
4 votes
3 answers
8193 views
Postgres - Return default value if regex match fails
I'd like to attempt a regex match, and return `null` if it fails. The following query attempts to find the first number in a string. The result ignores the entry with text `'blah'`. I'd prefer it to return a `null` value instead. This question is probably less about regex and more about set algebra....
I'd like to attempt a regex match, and return null if it fails. The following query attempts to find the first number in a string. The result ignores the entry with text 'blah'. I'd prefer it to return a null value instead. This question is probably less about regex and more about set algebra. My hunch is that there is an elegant way to do it, without needing to left join anything, though googling is proving futile. with test_data as ( select 'abc 123' as txt union select 'abc 456' as txt union select 'blah' as txt ) select txt, (regexp_matches(txt, '\d+')) as first_num from test_data
ryantuck (173 rep)
Jun 19, 2018, 04:19 PM • Last activity: Apr 24, 2023, 02:04 AM
0 votes
1 answers
1136 views
How to get exact match when the column used for filtering is comma-separated list in postgresql
I have a column which will have values like: ```none FC FCW FC,FCW FX,FC ``` My input used for filtering should give correct result (match the exact value in the column) ex: ```none FC like 'FC,FCW' --> true FC like 'FCW' --> false ``` when I pass FC--> only 1st, 3rd and last record should be return...
I have a column which will have values like:
FC
FCW
FC,FCW
FX,FC
My input used for filtering should give correct result (match the exact value in the column) ex:
FC like 'FC,FCW' --> true
FC like 'FCW'  --> false
when I pass FC--> only 1st, 3rd and last record should be returned. (i.e, only FCW should not be matched) Splitting the column using regexsplit and then checking the value is one way of doing. Since the table data is huge, this way takes lot of time. Is there any other simple way of doing this operation?
Ross (1 rep)
Jan 19, 2023, 09:08 AM • Last activity: Jan 19, 2023, 09:24 AM
5 votes
1 answers
1107 views
PostgreSQL - Index efficiently on REGEX_REPLACE()
I have a query which is designed to loop and search addresses for duplicates, the query uses REGEX_REPLACE. I am trying to index on the regex as on doing an explain and its doing a sequential scan on the user_property table with a filter on the regex ``` EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) wi...
I have a query which is designed to loop and search addresses for duplicates, the query uses REGEX_REPLACE. I am trying to index on the regex as on doing an explain and its doing a sequential scan on the user_property table with a filter on the regex
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)  with user_detail AS (
        SELECT user_id,
            max(user_property_value) FILTER (WHERE user_property_type_id = 6 ) AS FIRST_NAME,
            max(user_property_value) FILTER (WHERE user_property_type_id = 7 ) AS LAST_NAME,
            max(TO_DATE(user_property_value, 'YYYY-MM-DD')) FILTER (WHERE user_property_type_id = 8 ) AS DOB,
            max(user_property_value) FILTER (WHERE user_property_type_id = 33 ) AS BIRTH_NUMBER
        FROM PUBLIC.user_property cp
        JOIN PUBLIC.user c using (user_id)
        WHERE c.user_group_id= '38'
        AND cp.user_property_is_active
        GROUP BY user_id
    ),
    duplicate as (
        SELECT COALESCE(MAX(
                CASE WHEN REGEXP_REPLACE((address_line1), E'\\_|\\W','','g') = 'Flat 25 Arliss Court 24'
                AND (
                    COALESCE(REGEXP_REPLACE((address_line2), E'\\_|\\W','','g'), '') = ''
                    OR REGEXP_REPLACE((address_line2), E'\\_|\\W','','g') = 'Calderon Road'
                )
                AND REGEXP_REPLACE((address_place), E'\\_|\\W','','g') = 'Dartford'
                AND address_country_code = 'GB'
            THEN 1 ELSE 0 END), 0) AS dup_name_address,
            COALESCE(MAX(CASE WHEN REGEXP_REPLACE(UPPER(address_postcode), E'\\_|\\W','','g') = 'WD17 1JY' THEN 1 ELSE 0 END), 0) AS dup_name_postcode
        FROM
            user_detail cd
        LEFT JOIN PUBLIC.address ad ON cd.user_id = ad.user_id
        WHERE  (
          (REGEXP_REPLACE(UPPER(cd.FIRST_NAME), E'\\_|\\W', '', 'g') = 'Clyde'
                  AND REGEXP_REPLACE(UPPER(cd.LAST_NAME), E'\\_|\\W', '', 'g') = 'Len')
            OR
                  (REGEXP_REPLACE(UPPER(cd.LAST_NAME), E'\\_|\\W', '', 'g') = 'Clyde'
                  AND REGEXP_REPLACE(UPPER(cd.FIRST_NAME), E'\\_|\\W', '', 'g') = 'Len')
            )
            AND cd.user_id != '2589384'
    ), dup_dob_address AS (
        SELECT
            COALESCE(MAX(CASE WHEN
                (cd.DOB IS NOT NULL AND cd.DOB = '1982-06-14 00:00:00') OR (cd.BIRTH_NUMBER IS NOT NULL AND cd.BIRTH_NUMBER = null )
             THEN 1 ELSE 0 END), 0) AS dob
        FROM
            user_detail cd
        LEFT JOIN PUBLIC.address ad ON cd.user_id = ad.user_id
        WHERE (
                REGEXP_REPLACE((address_line1), E'\\_|\\W','','g') = 'Flat 25 Arliss Court 24'
                AND (
                    COALESCE(REGEXP_REPLACE((address_line2), E'\\_|\\W','','g'), '') = ''
                    OR REGEXP_REPLACE((address_line2), E'\\_|\\W','','g') = 'Calderon Road'
                )
                AND REGEXP_REPLACE((address_place), E'\\_|\\W','','g') = 'Dartford'
                AND address_country_code = 'GB'
            )
        AND cd.user_id != '2589384'
    )
    SELECT * FROM duplicate, dup_dob_address;
Explain result:
Nested Loop  (cost=492738.45..492738.50 rows=1 width=12) (actual time=7589.136..7590.933 rows=1 loops=1)
  Output: (COALESCE(max(CASE WHEN ((regexp_replace((ad.address_line1)::text, '\\_|\\W'::text, ''::text, 'g'::text) = 'Flat 25 Arliss Court 24'::text) AND ((COALESCE(regexp_replace((ad.address_line2)::text, '\\_|\\W'::text, ''::text, 'g'::text), ''::text) = ''::text) OR (regexp_replace((ad.address_line2)::text, '\\_|\\W'::text, ''::text, 'g'::text) = 'Calderon Road'::text)) AND (regexp_replace((ad.address_place)::text, '\\_|\\W'::text, ''::text, 'g'::text) = 'Dartford'::text) AND ((ad.address_country_code)::text = 'GB'::text)) THEN 1 ELSE 0 END), 0)), (COALESCE(max(CASE WHEN (regexp_replace(upper((ad.address_postcode)::text), '\\_|\\W'::text, ''::text, 'g'::text) = 'WD17 1JY'::text) THEN 1 ELSE 0 END), 0)), (COALESCE(max(CASE WHEN (((cd_1.dob IS NOT NULL) AND (cd_1.dob = '1982-06-14'::date)) OR ((cd_1.birth_number IS NOT NULL) AND NULL::boolean)) THEN 1 ELSE 0 END), 0))
  Buffers: shared hit=931500 read=103761
  CTE user_detail
    ->  Finalize HashAggregate  (cost=423105.99..426854.87 rows=374888 width=104) (actual time=6110.633..6172.107 rows=115625 loops=1)
          Output: cp.user_id, max((cp.user_property_value)::text) FILTER (WHERE (cp.user_property_type_id = 6)), max((cp.user_property_value)::text) FILTER (WHERE (cp.user_property_type_id = 7)), max(to_date((cp.user_property_value)::text, 'YYYY-MM-DD'::text)) FILTER (WHERE (cp.user_property_type_id = 8)), max((cp.user_property_value)::text) FILTER (WHERE (cp.user_property_type_id = 33))
          Group Key: cp.user_id
          Buffers: shared hit=908203 read=103761
          ->  Gather  (cost=335007.31..413733.79 rows=749776 width=104) (actual time=6024.383..6062.501 rows=115625 loops=1)
                Output: cp.user_id, (PARTIAL max((cp.user_property_value)::text) FILTER (WHERE (cp.user_property_type_id = 6))), (PARTIAL max((cp.user_property_value)::text) FILTER (WHERE (cp.user_property_type_id = 7))), (PARTIAL max(to_date((cp.user_property_value)::text, 'YYYY-MM-DD'::text)) FILTER (WHERE (cp.user_property_type_id = 8))), (PARTIAL max((cp.user_property_value)::text) FILTER (WHERE (cp.user_property_type_id = 33)))
                Workers Planned: 2
                Workers Launched: 2
                Buffers: shared hit=908203 read=103761
                ->  Partial HashAggregate  (cost=334007.31..337756.19 rows=374888 width=104) (actual time=6017.847..6037.215 rows=38542 loops=3)
                      Output: cp.user_id, PARTIAL max((cp.user_property_value)::text) FILTER (WHERE (cp.user_property_type_id = 6)), PARTIAL max((cp.user_property_value)::text) FILTER (WHERE (cp.user_property_type_id = 7)), PARTIAL max(to_date((cp.user_property_value)::text, 'YYYY-MM-DD'::text)) FILTER (WHERE (cp.user_property_type_id = 8)), PARTIAL max((cp.user_property_value)::text) FILTER (WHERE (cp.user_property_type_id = 33))
                      Group Key: cp.user_id
                      Buffers: shared hit=908203 read=103761
                      Worker 0: actual time=6017.372..6035.986 rows=37261 loops=1
                        Buffers: shared hit=292969 read=33275
                      Worker 1: actual time=6012.321..6032.378 rows=40788 loops=1
                        Buffers: shared hit=320593 read=35787
                      ->  Nested Loop  (cost=1630.78..321001.76 rows=520222 width=30) (actual time=48.770..5900.888 rows=434730 loops=3)
                            Output: cp.user_id, cp.user_property_value, cp.user_property_type_id
                            Buffers: shared hit=908203 read=103761
                            Worker 0: actual time=45.466..5905.504 rows=420402 loops=1
                              Buffers: shared hit=292969 read=33275
                            Worker 1: actual time=44.758..5889.927 rows=459654 loops=1
                              Buffers: shared hit=320593 read=35787
                            ->  Parallel Bitmap Heap Scan on public.user c  (cost=1630.22..22201.58 rows=48268 width=4) (actual time=26.536..39.410 rows=38542 loops=3)
                                  Output: c.user_id, c.currency_code, c.user_group_id, c.user_created_on, c.user_status_id, c.user_max_credit, c.user_last_updated_on, c.user_version
                                  Recheck Cond: (c.user_group_id = 38)
                                  Heap Blocks: exact=2249
                                  Buffers: shared hit=6896 read=319
                                  Worker 0: actual time=22.735..35.486 rows=37261 loops=1
                                    Buffers: shared hit=2303
                                  Worker 1: actual time=22.766..36.418 rows=40788 loops=1
                                    Buffers: shared hit=2343
                                  ->  Bitmap Index Scan on idx_user_user_group_id  (cost=0.00..1601.26 rows=115844 width=0) (actual time=33.224..33.224 rows=115625 loops=1)
                                        Index Cond: (c.user_group_id = 38)
                                        Buffers: shared hit=1 read=319
                            ->  Index Scan using idx_user_id_user_property on public.user_property cp  (cost=0.56..5.51 rows=68 width=30) (actual time=0.036..0.150 rows=11 loops=115625)
                                  Output: cp.user_id, cp.user_property_type_id, cp.user_property_created_on, cp.user_property_is_active, cp.user_property_value, cp.user_property_upper_value, cp.user_property_version
                                  Index Cond: (cp.user_id = c.user_id)
                                  Buffers: shared hit=901307 read=103442
                                  Worker 0: actual time=0.038..0.156 rows=11 loops=37261
                                    Buffers: shared hit=290666 read=33275
                                  Worker 1: actual time=0.034..0.142 rows=11 loops=40788
                                    Buffers: shared hit=318250 read=35787
  ->  Aggregate  (cost=19766.95..19766.96 rows=1 width=8) (actual time=6882.602..6882.605 rows=1 loops=1)
        Output: COALESCE(max(CASE WHEN ((regexp_replace((ad.address_line1)::text, '\\_|\\W'::text, ''::text, 'g'::text) = 'Flat 25 Arliss Court 24'::text) AND ((COALESCE(regexp_replace((ad.address_line2)::text, '\\_|\\W'::text, ''::text, 'g'::text), ''::text) = ''::text) OR (regexp_replace((ad.address_line2)::text, '\\_|\\W'::text, ''::text, 'g'::text) = 'Calderon Road'::text)) AND (regexp_replace((ad.address_place)::text, '\\_|\\W'::text, ''::text, 'g'::text) = 'Dartford'::text) AND ((ad.address_country_code)::text = 'GB'::text)) THEN 1 ELSE 0 END), 0), COALESCE(max(CASE WHEN (regexp_replace(upper((ad.address_postcode)::text), '\\_|\\W'::text, ''::text, 'g'::text) = 'WD17 1JY'::text) THEN 1 ELSE 0 END), 0)
        Buffers: shared hit=908203 read=103761
        ->  Nested Loop Left Join  (cost=0.42..19766.22 rows=21 width=110) (actual time=6882.596..6882.597 rows=0 loops=1)
              Output: ad.address_line1, ad.address_line2, ad.address_place, ad.address_country_code, ad.address_postcode
              Buffers: shared hit=908203 read=103761
              ->  CTE Scan on user_detail cd  (cost=0.00..19681.62 rows=19 width=4) (actual time=6882.595..6882.595 rows=0 loops=1)
                    Output: cd.user_id, cd.first_name, cd.last_name, cd.dob, cd.birth_number
                    Filter: ((cd.user_id  2589384) AND (((regexp_replace(upper(cd.first_name), '\\_|\\W'::text, ''::text, 'g'::text) = 'Clyde'::text) AND (regexp_replace(upper(cd.last_name), '\\_|\\W'::text, ''::text, 'g'::text) = 'Len'::text)) OR ((regexp_replace(upper(cd.last_name), '\\_|\\W'::text, ''::text, 'g'::text) = 'Clyde'::text) AND (regexp_replace(upper(cd.first_name), '\\_|\\W'::text, ''::text, 'g'::text) = 'Len'::text))))
                    Rows Removed by Filter: 115625
                    Buffers: shared hit=908203 read=103761
              ->  Index Scan using address_idx_01 on public.address ad  (cost=0.42..4.44 rows=1 width=114) (never executed)
                    Output: ad.address_line1, ad.address_line2, ad.address_place, ad.address_country_code, ad.address_postcode, ad.user_id
                    Index Cond: (ad.user_id = cd.user_id)
  ->  Aggregate  (cost=46116.63..46116.64 rows=1 width=4) (actual time=706.525..707.941 rows=1 loops=1)
        Output: COALESCE(max(CASE WHEN (((cd_1.dob IS NOT NULL) AND (cd_1.dob = '1982-06-14'::date)) OR ((cd_1.birth_number IS NOT NULL) AND NULL::boolean)) THEN 1 ELSE 0 END), 0)
        Buffers: shared hit=23297
        ->  Hash Join  (cost=36282.83..46116.62 rows=1 width=36) (actual time=706.520..707.934 rows=0 loops=1)
              Output: cd_1.dob, cd_1.birth_number
              Hash Cond: (cd_1.user_id = ad_1.user_id)
              Buffers: shared hit=23297
              ->  CTE Scan on user_detail cd_1  (cost=0.00..8434.98 rows=373014 width=40) (actual time=0.002..0.003 rows=1 loops=1)
                    Output: cd_1.user_id, cd_1.first_name, cd_1.last_name, cd_1.dob, cd_1.birth_number
                    Filter: (cd_1.user_id  2589384)
              ->  Hash  (cost=36282.82..36282.82 rows=1 width=4) (actual time=706.499..707.911 rows=0 loops=1)
                    Output: ad_1.user_id
                    Buckets: 1024  Batches: 1  Memory Usage: 8kB
                    Buffers: shared hit=23297
                    ->  Gather  (cost=1000.00..36282.82 rows=1 width=4) (actual time=706.496..707.907 rows=0 loops=1)
                          Output: ad_1.user_id
                          Workers Planned: 2
                          Workers Launched: 2
                          Buffers: shared hit=23297
                          ->  Parallel Seq Scan on public.address ad_1  (cost=0.00..35282.72 rows=1 width=4) (actual time=701.969..701.970 rows=0 loops=3)
                                Output: ad_1.user_id
                                Filter: (((ad_1.address_country_code)::text = 'GB'::text) AND (regexp_replace((ad_1.address_line1)::text, '\\_|\\W'::text, ''::text, 'g'::text) = 'Flat 25 Arliss Court 24'::text) AND (regexp_replace((ad_1.address_place)::text, '\\_|\\W'::text, ''::text, 'g'::text) = 'Dartford'::text) AND ((COALESCE(regexp_replace((ad_1.address_line2)::text, '\\_|\\W'::text, ''::text, 'g'::text), ''::text) = ''::text) OR (regexp_replace((ad_1.address_line2)::text, '\\_|\\W'::text, ''::text, 'g'::text) = 'Calderon Road'::text)))
                                Rows Removed by Filter: 295033
                                Buffers: shared hit=23297
                                Worker 0: actual time=699.642..699.644 rows=0 loops=1
                                  Buffers: shared hit=7331
                                Worker 1: actual time=700.498..700.499 rows=0 loops=1
                                  Buffers: shared hit=7984
Planning Time: 17.292 ms
Execution Time: 7601.934 ms
https://explain.depesz.com/s/cbmv#html I looked at a similar post regarding using the pg_trgm extension but made no difference when trying to index.
create index concurrently on address using gin (address_place gin_trgm_ops);
But The size of the user_property table is approx 2.5 million rows with the size of the address table also very small < 1.7m rows. Is there an efficient way to index on Regex_replace? or would a redesign of the query be needed? Any help much appreciated.
rdbmsNoob (459 rep)
Dec 9, 2022, 09:58 PM • Last activity: Dec 15, 2022, 01:18 AM
0 votes
1 answers
2801 views
postgres error: invalid regular expression: brackets [] not balanced
On the postgres command line, I tried this: select regexp_matches('file?device_id[]=555', 'device_id\[\]=(\d+)', 'g') as dev_id; I get this error: ERROR: invalid regular expression: brackets [] not balanced When I run this same query in HeidiSQL it returns with no errors. I don't understand why the...
On the postgres command line, I tried this: select regexp_matches('file?device_id[]=555', 'device_id\[\]=(\d+)', 'g') as dev_id; I get this error: ERROR: invalid regular expression: brackets [] not balanced When I run this same query in HeidiSQL it returns with no errors. I don't understand why the CLI thinks the brackets are unbalanced when I'm escaping them. Just for testing, I tried anywhere from 0 to 3 backslashes to escape the brackets, and none of those worked.
raphael75 (244 rep)
Dec 7, 2021, 03:16 PM • Last activity: Nov 4, 2022, 04:03 PM
Showing page 1 of 20 total questions