Database Administrators
Q&A for database professionals who wish to improve their database skills
Latest Questions
4
votes
1
answers
537
views
PostgreSQL unaccent and full text search for Arabic/Persian
I'm using an application that uses PostgreSQL as its database and it uses the `unaccent` extension to normalize text. I want to improve its search functionality by modifying the `unaccent.rules` file. I edit `/usr/share/postgresql/16/tsearch_data/unaccent.rules` and add some rules for Arabic unicode...
I'm using an application that uses PostgreSQL as its database and it uses the
> Note.2:
>
> I did't add all these lines at the same time, I try them one by one.
> Note.3:
>
> There is no other rule for ZWNJ in the
unaccent
extension to normalize text.
I want to improve its search functionality by modifying the unaccent.rules
file.
I edit /usr/share/postgresql/16/tsearch_data/unaccent.rules
and add some rules for Arabic unicode block (U+0600
to U+06ff
):
۰ 0
۱ 1
۲ 2
۳ 3
۴ 4
۵ 5
۶ 6
۷ 7
۸ 8
۹ 9
َ
and it's working fine.
SELECT unaccent('سَلام ۱۳۲');
unaccent
----------
سلام 123
(1 row)
# Problem 1:
The problem is with Zero Width Non-Joiner (ZWNJ - U+200C
), It should be replaced with space (U+0020
).
سَلامعلیکم
-> سلام علیکم
### What I Tried:
I tried this rows, but ether not working or giving error:
- "" " "
: invalid syntax: more than two strings in unaccent rule
(warning) + it didn't work.
- " "
: invalid syntax: more than two strings in unaccent rule
(warning) + it didn't work.
- \u200C \u0020
: Suggested by ChatGPT, but it didn't work.
- \u200C " "
: Suggested by ChatGPT, but it didn't work.
> Note.1:
>
> In the first two lines above, there is an invisible ZWNJ character, which is shown as in VIM, but it's not visible in this post.
>
> 
unaccent.rules
# Problem 2:
Is there a way to add new rule file instead of editing the default one? I can't edit the application source code and change the queries.
Does adding something like /usr/share/postgresql/16/tsearch_data/arabic.stop
or /usr/share/postgresql/16/tsearch_data/arabic.rules
and restarting the service, make PostgreSQL to understand it?
Is it required to run some query to reload the file?
Is it required to change the way search is requested from the application?
M.A. Heshmat Khah
(145 rep)
Oct 8, 2024, 04:55 PM
• Last activity: Oct 8, 2024, 08:10 PM
0
votes
3
answers
1415
views
Remove spaces and replace characters with regexp_replace()
I wish to ... 1. ... remove spaces 2. ... delete apostrophes 3. ... replace 'é' and 'è' with 'e' I use the function `regexp_replace()`. For the moment, I can delete the spaces but poorly. Indeed, when the attribute contains several spaces only one is deleted. I can't process *1.*, *2.*, an...
I wish to ...
1. ... remove spaces
2. ... delete apostrophes
3. ... replace 'é' and 'è' with 'e'
I use the function
regexp_replace()
.
For the moment, I can delete the spaces but poorly. Indeed, when the attribute contains several spaces only one is deleted.
I can't process *1.*, *2.*, and *3.* at the same time. Is this possible?
Below is a link to my code:
https://dbfiddle.uk/22ODtpNS
fcka
(125 rep)
May 25, 2024, 07:51 PM
• Last activity: May 27, 2024, 05:44 PM
1
votes
3
answers
1449
views
How to speed up string cleanup function?
I need to cleanup a string, so that certain ASCII code characters are left out of the string, and others are replaced. I am new to Postgres. My function `ufn_cie_easy()` performs way too slow: DECLARE letter char = ''; str_result TEXT = ''; x integer; y integer; asc_code int; BEGIN y:=1; x:=char_len...
I need to cleanup a string, so that certain ASCII code characters are left out of the string, and others are replaced.
I am new to Postgres. My function
ufn_cie_easy()
performs way too slow:
DECLARE
letter char = '';
str_result TEXT = '';
x integer;
y integer;
asc_code int;
BEGIN
y:=1;
x:=char_length(arg);
LOOP
letter=substring(arg from y for 1);
asc_code=ascii(letter);
IF (asc_code BETWEEN 47 and 58) or (asc_code BETWEEN 65 and 90) or (
asc_code BETWEEN 97 and 122) THEN
str_result := str_result || letter;
ELSIF (asc_code BETWEEN 192 and 197) THEN
str_result := str_result || 'A';
ELSIF (asc_code BETWEEN 200 and 203) THEN
str_result := str_result || 'E';
ELSIF (asc_code BETWEEN 204 and 207) THEN
str_result := str_result || 'I';
ELSIF (asc_code BETWEEN 210 and 214) OR (asc_code=216) THEN
str_result := str_result || 'O';
ELSIF (asc_code BETWEEN 217 and 220) THEN
str_result := str_result || 'U';
ELSIF (asc_code BETWEEN 224 and 229) THEN
str_result := str_result || 'a';
ELSIF (asc_code BETWEEN 232 and 235) THEN
str_result := str_result || 'e';
ELSIF (asc_code BETWEEN 236 and 239) THEN
str_result := str_result || 'i';
ELSIF (asc_code BETWEEN 242 and 246) OR (asc_code=248) THEN
str_result := str_result || 'o';
ELSIF (asc_code BETWEEN 249 and 252) THEN
str_result := str_result || 'u';
ELSE
CASE asc_code
WHEN 352 THEN str_result := str_result || 'S';
WHEN 338 THEN str_result := str_result || 'OE';
WHEN 381 THEN str_result := str_result || 'Z';
WHEN 353 THEN str_result := str_result || 's';
WHEN 339 THEN str_result := str_result || 'oe';
WHEN 382 THEN str_result := str_result || 'z';
WHEN 162 THEN str_result := str_result || 'c';
WHEN 198 THEN str_result := str_result || 'AE';
WHEN 199 THEN str_result := str_result || 'C';
WHEN 208 THEN str_result := str_result || 'D';
WHEN 209 THEN str_result := str_result || 'N';
WHEN 223 THEN str_result := str_result || 'ss';
WHEN 230 THEN str_result := str_result || 'ae';
WHEN 231 THEN str_result := str_result || 'c';
WHEN 241 THEN str_result := str_result || 'n';
WHEN 376 THEN str_result := str_result || 'Y';
WHEN 221 THEN str_result := str_result || 'Y';
WHEN 253 THEN str_result := str_result || 'y';
WHEN 255 THEN str_result := str_result || 'y';
ELSE str_result := str_result;
END CASE;
END IF;
y:=y+1;
exit when y=x+1;
END LOOP;
return str_result;
END;
W. Smets
(11 rep)
Jul 28, 2016, 08:04 AM
• Last activity: May 26, 2024, 09:06 PM
3
votes
3
answers
3648
views
How to query efficiently from Postgres to select special words?
Let's say that I have a table called `words` with very many records. Columns are `id` and `name`. In the `words` table I have for example: 'systematic', 'سلام','gear','synthesis','mysterious', etc. **NB: we have utf8 words, too.** How to query efficiently to see which words include letters `'s'`, `'...
Let's say that I have a table called
words
with very many records.
Columns are id
and name
.
In the words
table I have for example:
'systematic', 'سلام','gear','synthesis','mysterious', etc.
**NB: we have utf8 words, too.**
How to query efficiently to see which words include letters 's'
, 'm'
and 'e'
(all of them)?
The output would be:
systematic,mysterious
I have no idea how to do such a thing. It should be efficient because our server would suffer otherwise.
Alireza
(3676 rep)
Dec 17, 2013, 08:55 AM
• Last activity: Feb 19, 2024, 09:45 PM
0
votes
2
answers
594
views
PostgreSQL's unaccent function is unable to remove accents (diacritic signs) from Japanese characters like 'ド'
The postgresql out of box unaccent function is unable to remove accents (diacritic signs) with more then one diacritic in a character. The character 'ド' which is failing for normalisation have 2 diacritic signs in it. I tried to check the latest version ofhttps://github.com/postgres/postgres/blob/ma...
The postgresql out of box unaccent function is unable to remove accents (diacritic signs) with more then one diacritic in a character. The character 'ド' which is failing for normalisation have 2 diacritic signs in it.
I tried to check the latest version ofhttps://github.com/postgres/postgres/blob/master/contrib/unaccent/unaccent.rules but still it does not have mapping for the Japanese characters like 'ド' . Can someone please guide how we can resolve this issue.
shailesh Totale
(1 rep)
Nov 28, 2023, 01:11 PM
• Last activity: Nov 30, 2023, 10:47 AM
6
votes
1
answers
23508
views
'ERROR: text search dictionary "unaccent" does not exist' during CREATE INDEX?
I'm running PostgreSQL 9.3 on Mac OS X Yosemite. I try to create an unaccent lowercase trigram index. To achieve it I did this: mydb=# CREATE EXTENSION pg_trgm SCHEMA public VERSION "1.1"; CREATE EXTENSION unaccent SCHEMA public; ALTER FUNCTION unaccent(text) IMMUTABLE; CREATE EXTENSION CREATE EXTEN...
I'm running PostgreSQL 9.3 on Mac OS X Yosemite.
I try to create an unaccent lowercase trigram index. To achieve it I did this:
mydb=# CREATE EXTENSION pg_trgm SCHEMA public VERSION "1.1";
CREATE EXTENSION unaccent SCHEMA public;
ALTER FUNCTION unaccent(text) IMMUTABLE;
CREATE EXTENSION
CREATE EXTENSION
ALTER FUNCTION
Then I tried to create the index:
mydb=# CREATE INDEX author_label_hun_gin_trgm ON address
USING gin (public.unaccent(lower(label_hun)) gin_trgm_ops);
ERROR: text search dictionary "unaccent" does not exist
... and got this error. If I try to list the available text search dictionaries the
unaccent
dictionary seems to be there:
mydb=# \dFd
List of text search dictionaries
Schema | Name | Description
------------+-----------------+-----------------------------------------------------------
pg_catalog | danish_stem | snowball stemmer for danish language
pg_catalog | dutch_stem | snowball stemmer for dutch language
pg_catalog | english_stem | snowball stemmer for english language
pg_catalog | finnish_stem | snowball stemmer for finnish language
pg_catalog | french_stem | snowball stemmer for french language
pg_catalog | german_stem | snowball stemmer for german language
pg_catalog | hungarian_stem | snowball stemmer for hungarian language
pg_catalog | italian_stem | snowball stemmer for italian language
pg_catalog | norwegian_stem | snowball stemmer for norwegian language
pg_catalog | portuguese_stem | snowball stemmer for portuguese language
pg_catalog | romanian_stem | snowball stemmer for romanian language
pg_catalog | russian_stem | snowball stemmer for russian language
pg_catalog | simple | simple dictionary: just lower case and check for stopword
pg_catalog | spanish_stem | snowball stemmer for spanish language
pg_catalog | swedish_stem | snowball stemmer for swedish language
pg_catalog | turkish_stem | snowball stemmer for turkish language
public | unaccent |
Any idea what could be wrong here?
Balázs E. Pataki
(95 rep)
Apr 8, 2015, 03:10 PM
• Last activity: Aug 1, 2023, 06:25 PM
3
votes
1
answers
1913
views
Trigram similarity (pg_trgm) with German umlauts
I try to figure out how to improve Postgres 10.6 pg_trgm queries with German umlauts (äöü). In german 'ö' can be written as 'oe'. But beware: not every 'oe' can be written as 'ö'. ```sql CREATE TABLE public.names (name text COLLATE pg_catalog."default"); CREATE INDEX names_i...
I try to figure out how to improve Postgres 10.6 pg_trgm queries with German umlauts (äöü). In german 'ö' can be written as 'oe'. But beware: not every 'oe' can be written as 'ö'.
CREATE TABLE public.names
(name text COLLATE pg_catalog."default");
CREATE INDEX names_idx
ON public.names USING gin (name COLLATE pg_catalog."default" gin_trgm_ops);
SHOW LC_COLLATE; -- de_DE.UTF-8
When I use the [similarity()
](https://www.postgresql.org/docs/10/pgtrgm.html#id-1.11.7.41.5)
function to query the similarity for ***'Schoenstraße'***.
SELECT
name,
similarity (name, 'Schoenstraße') AS similarity,
show_trgm (name)
FROM
names
WHERE
name % 'Schoenstraße'
ORDER BY
similarity DESC;
I get the following result:
Name similarity show_trgm
Schyrenstraße 0.588235 {0x9a07c3,0xde3801,"" s"","" sc"",chy,ens,hyr,nst,ren,sch,str,tra,0x76a40e,yre}
Schönstraße 0.5625 {0x9a07c3,0xde3801,0xf00320,0x095f29,"" s"","" sc"",0x6deea5,nst,sch,str,tra,0x76a40e}
dbfiddle [here](https://dbfiddle.uk/?rdbms=postgres_10&fiddle=58f8436a5096a779f358572e615bf11f)
Is there anything I can do to improve that or do I need to replace all umlauts in the DB?
Stephan
(143 rep)
Jul 22, 2020, 09:35 AM
• Last activity: Aug 1, 2023, 05:52 PM
2
votes
0
answers
503
views
Postgresql full text search without diacritics (effective unaccent)
I've similliar question as https://dba.stackexchange.com/questions/176471/postgres-full-text-search-with-unaccent-and-inflection-conjugation-etc . I want to use FTS but in "unaccent" mode. I'll explain with the example. The "base" word in my dictionary is "príloha" only (no unaccented version w...
I've similliar question as https://dba.stackexchange.com/questions/176471/postgres-full-text-search-with-unaccent-and-inflection-conjugation-etc .
I want to use FTS but in "unaccent" mode. I'll explain with the example. The "base" word in my dictionary is "príloha" only (no unaccented version whatsoever). Now I want to match the word "priloha" (notice the "i") with the base word in the dictionary.
I've created the text search configuration with unaccent filtering as below:
CREATE TEXT SEARCH DICTIONARY public.slovak_hunspell(
TEMPLATE = ispell,
DictFile = sk_SK,
AffFile = sk_SK,
StopWords = sk_SK
);
CREATE TEXT SEARCH CONFIGURATION public.slovak (
COPY = pg_catalog.english
);
ALTER TEXT SEARCH CONFIGURATION public.slovak
ALTER MAPPING
FOR
asciiword, asciihword, hword_asciipart, word, hword, hword_part
WITH
unaccent, slovak_hunspell, simple
;
But the problem with this is that the unaccented string is pass down to the "slovak_hunspell" but because there isn't unaccented word in this dictionary no lexeme is returned. But what I would really need is to use unaccent for words from the "slovak_hunspell" dictionary.
TLDR; from the link above is that it couldn't be done without some drwbacks/restrictions.
The only solution that I could think of is to use 2 dictionaries, one with the diacritics etc. and the another one unaccented, i.e. plain ASCII. And then use the unaccented for the "asciiword" etc. and another one for "word" etc.. Would this work? Or are there some other alternatives?
I've searched a lot but haven't find anything useful, only the link at the beginning of this post.
Trigve
(21 rep)
Oct 2, 2022, 12:21 PM
4
votes
1
answers
5142
views
"Function unaccent(text) does not exist" in update trigger
I've created a trigger for table `x1` to update a column `y` with the expression `to_tsvector(unaccent(x1.col1 || ' ' || x2.col1))`. This trigger function throws: > function unaccent(text) does not exist Why would this function not exist when a trigger is called, but exist when executed manually? I'...
I've created a trigger for table
x1
to update a column y
with the expression to_tsvector(unaccent(x1.col1 || ' ' || x2.col1))
. This trigger function throws:
> function unaccent(text) does not exist
Why would this function not exist when a trigger is called, but exist when executed manually? I'm using Supabase to manage this database.
Alb
(142 rep)
May 22, 2022, 04:12 PM
• Last activity: May 22, 2022, 10:02 PM
6
votes
1
answers
12544
views
Creating a case-insensitive and accent/diacritics insensitive search on a field
I would like to know if it makes sense to create an index combining two functions while using Full Text Search: `lower(name)` and `f_unaccent(name)` Where `f_unaccent` is just my wrapper to make unaccent function immutable. I do have an index working on: `f_unaccent(name)` using `varchar_pattern_ops...
I would like to know if it makes sense to create an index combining two functions while using Full Text Search:
lower(name)
and f_unaccent(name)
Where f_unaccent
is just my wrapper to make unaccent function immutable.
I do have an index working on: f_unaccent(name)
using varchar_pattern_ops
. My question is: would an index that combines lower
and unaccent
functions be triggered by full_text_search lower(f_unaccent(name))
.
I don't know if the lower
function would be useful working with Full Text Search algorithms.
Rubén_ic
(63 rep)
Jun 22, 2017, 03:45 PM
• Last activity: Feb 28, 2020, 12:18 PM
3
votes
1
answers
4789
views
'text search dictionary "unaccent" does not exist' entries in postgres log, supposedly during automatic analyze
I have many such entries in the postgresql main log, ever since upgrading to Postgres 10: 2018-03-28 08:51:00.281 CEST [97547] ERROR: text search dictionary "unaccent" does not exist 2018-03-28 08:51:00.281 CEST [97547] CONTEXT: automatic analyze of table "dbname.public.periodical" I do use unaccent...
I have many such entries in the postgresql main log, ever since upgrading to Postgres 10:
2018-03-28 08:51:00.281 CEST ERROR: text search dictionary "unaccent" does not exist
2018-03-28 08:51:00.281 CEST CONTEXT: automatic analyze of table "dbname.public.periodical"
I do use unaccent in many indexes in all tables (by making unaccent immutable (I know it is not the recommended way, but it is so historically)).
If I manually run
vacuumdb dbname --table periodical
or VACUUM periodical
or VACUUM ANALYZE periodical
, it does not generate such an error (neither on the command line, nor in the log).
Does anyone have a tip how I could stop these messages and what their real cause may be?
PostgreSQL version: 10.3 (Debian 10.3-1.pgdg80+1)
P.Péter
(911 rep)
Mar 28, 2018, 07:19 AM
• Last activity: Dec 10, 2018, 12:01 PM
2
votes
1
answers
578
views
Custom unaccent rules in heroku
We have an application running in heroku and we want to add a search feature on a field that contains greek characters and we want to make it accent agnostic. So the idea is to use postgresql's [unaccent](https://www.postgresql.org/docs/9.1/static/unaccent.html) functionality. The problem is that by...
We have an application running in heroku and we want to add a search feature on a field that contains greek characters and we want to make it accent agnostic.
So the idea is to use postgresql's [unaccent](https://www.postgresql.org/docs/9.1/static/unaccent.html) functionality. The problem is that by default greek rules are not included at heroku and we somehow need to add them (we already have a rules file).
Has anyone managed to add custom rules files at heroku, and if yes could you please share how?
George Karanikas
(123 rep)
Sep 17, 2017, 01:42 PM
• Last activity: Jan 3, 2018, 05:03 PM
6
votes
2
answers
5478
views
Postgres full text search with unaccent and inflection (conjugation, etc.)
I want to be able to search unaccented phrases in an inflected (Polish) language in Postgres. Say, if a document contains `robiłem`, the lexeme should be `robić` (the infinivite). Its forms are `robię`, `robił`, `robiła` and so on. I want to be able to find it, for example, with a phrase `robie` whi...
I want to be able to search unaccented phrases in an inflected (Polish) language in Postgres.
Say, if a document contains
robiłem
, the lexeme should be robić
(the infinivite). Its forms are robię
, robił
, robiła
and so on. I want to be able to find it, for example, with a phrase robie
which is unaccented robię
.
**What I did** is I started out with a perfectly well working polish text search config
CREATE TEXT SEARCH DICTIONARY polish_ispell (
TEMPLATE = pg_catalog.ispell,
dictfile = 'polish', afffile = 'polish', stopwords = 'polish' );
Then I tried to extend it to include the unaccent
.
create extension unaccent;
create text search configuration polish_unaccented (copy = polish);
ALTER TEXT SEARCH CONFIGURATION polish_unaccented ALTER MAPPING FOR hword,
hword_part, word WITH unaccen, polish_ispell, simple, ;
Sadly, lexems are not created correctly with this config:
select to_tsvector('polish_unaccented' ,'robił');
'robil':1
The lexem should be of course:
'robić':1
So the below cant return true (and that's what I need I think):
select to_tsvector('polish_unaccented','robić') @@ to_tsquery('polish_unaccented','robie');
I've googled but did not find any documents showing how to really configure Postgres for my case. The docs only show the lame 'Hôtels' example, which is not a 'lexemed' word.
Cheers
Tomek
(61 rep)
Jun 16, 2017, 02:42 PM
• Last activity: Nov 15, 2017, 04:04 AM
3
votes
1
answers
2154
views
Error during pg_restore: text search dictionary "unaccent" does not exist
I'm trying to move data between servers. I recreated the whole database structure, checked that same users exist and the `unaccent` extension that we use is enabled for all schemas of the target database. When I try to run: pg_restore -h %server_host% -a -c --disable-triggers -d %db_name% -U %user_n...
I'm trying to move data between servers. I recreated the whole database structure, checked that same users exist and the
unaccent
extension that we use is enabled for all schemas of the target database.
When I try to run:
pg_restore -h %server_host% -a -c --disable-triggers -d %db_name% -U %user_name% 2014-12-09.custom
I get the following error:
> pg_restore: [archiver (db)] COPY failed for table "contracts":
> ERROR: text search dictionary "unaccent" does not exist
What am I missing here?
jezzarax
(181 rep)
Dec 9, 2014, 08:56 AM
• Last activity: Dec 29, 2015, 02:47 PM
2
votes
1
answers
3445
views
PostgreSQL translate special characters?
**Description:** PostgreSQL 9.3 String: **`'ì ằ ú ề'`** Desired Result: **`'i a u e'`** My code: select translate ('ì ằ ú ề', 'ìằúề', 'iaue') ; -- it works. Result: i a u e **Question:** If I use it this way, I have to define a manual translation between 'ìằúề...
**Description:**
PostgreSQL 9.3
String: **
'ì ằ ú ề'
**
Desired Result: **'i a u e'
**
My code:
select translate ('ì ằ ú ề', 'ìằúề', 'iaue') ; -- it works. Result: i a u e
**Question:**
If I use it this way, I have to define a manual translation between 'ìằúề' and 'iaue'. Is there a better solution?
Ref: PG Document
Luan Huynh
(2010 rep)
Dec 28, 2015, 10:57 AM
• Last activity: Dec 29, 2015, 04:19 AM
Showing page 1 of 15 total questions