Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

4 votes

1 answers

537 views

PostgreSQL unaccent and full text search for Arabic/Persian

I'm using an application that uses PostgreSQL as its database and it uses the `unaccent` extension to normalize text. I want to improve its search functionality by modifying the `unaccent.rules` file. I edit `/usr/share/postgresql/16/tsearch_data/unaccent.rules` and add some rules for Arabic unicode...

I'm using an application that uses PostgreSQL as its database and it uses the unaccent extension to normalize text. I want to improve its search functionality by modifying the unaccent.rules file. I edit /usr/share/postgresql/16/tsearch_data/unaccent.rules and add some rules for Arabic unicode block (U+0600 to U+06ff ):

۰	0
۱	1
۲	2
۳	3
۴	4
۵	5
۶	6
۷	7
۸	8
۹	9
َ

and it's working fine.

SELECT unaccent('سَلام ۱۳۲');
 unaccent
----------
 سلام 123
(1 row)

# Problem 1: The problem is with Zero Width Non-Joiner (ZWNJ - U+200C ), It should be replaced with space (U+0020). سَلام‌علیکم -> سلام علیکم ### What I Tried: I tried this rows, but ether not working or giving error: - "‌" " ": invalid syntax: more than two strings in unaccent rule (warning) + it didn't work. - ‌ " ": invalid syntax: more than two strings in unaccent rule (warning) + it didn't work. - \u200C \u0020: Suggested by ChatGPT, but it didn't work. - \u200C " ": Suggested by ChatGPT, but it didn't work. > Note.1: > > In the first two lines above, there is an invisible ZWNJ character, which is shown as in VIM, but it's not visible in this post. > >

> Note.2: > > I did't add all these lines at the same time, I try them one by one. > Note.3: > > There is no other rule for ZWNJ in the unaccent.rules # Problem 2: Is there a way to add new rule file instead of editing the default one? I can't edit the application source code and change the queries. Does adding something like /usr/share/postgresql/16/tsearch_data/arabic.stop or /usr/share/postgresql/16/tsearch_data/arabic.rules and restarting the service, make PostgreSQL to understand it? Is it required to run some query to reload the file? Is it required to change the way search is requested from the application?

M.A. Heshmat Khah (145 rep)

Oct 8, 2024, 04:55 PM • Last activity: Oct 8, 2024, 08:10 PM

0 votes

3 answers

1415 views

Remove spaces and replace characters with regexp_replace()

postgresql regular-expression string-manipulation unaccent

I wish to ... 1. ... remove spaces 2. ... delete apostrophes 3. ... replace 'é' and 'è' with 'e' I use the function `regexp_replace()`. For the moment, I can delete the spaces but poorly. Indeed, when the attribute contains several spaces only one is deleted. I can't process *1.*, *2.*, an...

                                  I wish to ...

1. ... remove spaces 
2. ... delete apostrophes 
3. ... replace 'é' and 'è' with 'e'

I use the function regexp_replace().

For the moment, I can delete the spaces but poorly. Indeed, when the attribute contains several spaces only one is deleted.

I can't process *1.*, *2.*, and *3.* at the same time. Is this possible?

Below is a link to my code:

  https://dbfiddle.uk/22ODtpNS

fcka (125 rep)

May 25, 2024, 07:51 PM • Last activity: May 27, 2024, 05:44 PM

1 votes

3 answers

1449 views

How to speed up string cleanup function?

postgresql performance plpgsql replace unaccent postgresql-performance

I need to cleanup a string, so that certain ASCII code characters are left out of the string, and others are replaced. I am new to Postgres. My function `ufn_cie_easy()` performs way too slow: DECLARE letter char = ''; str_result TEXT = ''; x integer; y integer; asc_code int; BEGIN y:=1; x:=char_len...

                                  I need to cleanup a string, so that certain ASCII code characters are left out of the string, and others are replaced.

I am new to Postgres. My function ufn_cie_easy() performs way too slow:

    DECLARE
      letter char = '';
      str_result TEXT = '';
      x integer;
      y integer;
      asc_code int;
    BEGIN
      y:=1;
      x:=char_length(arg);
      LOOP
        letter=substring(arg from y for 1);
        asc_code=ascii(letter);
        IF (asc_code BETWEEN 47 and 58) or (asc_code BETWEEN 65 and 90) or (
            asc_code BETWEEN 97 and 122) THEN
          str_result := str_result || letter;
          ELSIF (asc_code BETWEEN 192 and 197) THEN
          str_result := str_result || 'A';
          ELSIF (asc_code BETWEEN 200 and 203) THEN
          str_result := str_result || 'E';
          ELSIF (asc_code BETWEEN 204 and 207) THEN
          str_result := str_result || 'I';
          ELSIF (asc_code BETWEEN 210 and 214) OR (asc_code=216) THEN
          str_result := str_result || 'O';
          ELSIF (asc_code BETWEEN 217 and 220) THEN
          str_result := str_result || 'U';
          ELSIF (asc_code BETWEEN 224 and 229) THEN
          str_result := str_result || 'a';
          ELSIF (asc_code BETWEEN 232 and 235) THEN
          str_result := str_result || 'e';
          ELSIF (asc_code BETWEEN 236 and 239) THEN
          str_result := str_result || 'i';
          ELSIF (asc_code BETWEEN 242 and 246) OR (asc_code=248) THEN
          str_result := str_result || 'o';
          ELSIF (asc_code BETWEEN 249 and 252) THEN
          str_result := str_result || 'u';
          ELSE
          CASE asc_code
            WHEN 352 THEN str_result := str_result || 'S';
            WHEN 338 THEN str_result := str_result || 'OE';
            WHEN 381 THEN str_result := str_result || 'Z';
            WHEN 353 THEN str_result := str_result || 's';
            WHEN 339 THEN str_result := str_result || 'oe';
            WHEN 382 THEN str_result := str_result || 'z';
            WHEN 162 THEN str_result := str_result || 'c';
            WHEN 198 THEN str_result := str_result || 'AE';
            WHEN 199 THEN str_result := str_result || 'C';
            WHEN 208 THEN str_result := str_result || 'D';
            WHEN 209 THEN str_result := str_result || 'N';
            WHEN 223 THEN str_result := str_result || 'ss';
            WHEN 230 THEN str_result := str_result || 'ae';
            WHEN 231 THEN str_result := str_result || 'c';
            WHEN 241 THEN str_result := str_result || 'n';
            WHEN 376 THEN str_result := str_result || 'Y';
            WHEN 221 THEN str_result := str_result || 'Y';
            WHEN 253 THEN str_result := str_result || 'y';
            WHEN 255 THEN str_result := str_result || 'y';
            ELSE str_result := str_result;
          END CASE;
          END IF;    
        y:=y+1;
        exit when  y=x+1;
      END LOOP;
      return str_result;
    END;
                                

W. Smets (11 rep)

Jul 28, 2016, 08:04 AM • Last activity: May 26, 2024, 09:06 PM

3 votes

3 answers

3648 views

How to query efficiently from Postgres to select special words?

postgresql select pattern-matching unaccent

Let's say that I have a table called `words` with very many records. Columns are `id` and `name`. In the `words` table I have for example: 'systematic', 'سلام','gear','synthesis','mysterious', etc. **NB: we have utf8 words, too.** How to query efficiently to see which words include letters `'s'`, `'...

                                  Let's say that I have a table called words with very many records.  
Columns are id and name.

In the words table I have for example:

     'systematic', 'سلام','gear','synthesis','mysterious', etc.  

**NB: we have utf8 words, too.**  
How to query efficiently to see which words include letters 's', 'm' and 'e' (all of them)?

The output would be: 

    systematic,mysterious

I have no idea how to do such a thing. It should be efficient because our server would suffer otherwise.

Alireza (3676 rep)

Dec 17, 2013, 08:55 AM • Last activity: Feb 19, 2024, 09:45 PM

0 votes

2 answers

594 views

PostgreSQL's unaccent function is unable to remove accents (diacritic signs) from Japanese characters like 'ド'

postgresql unaccent

The postgresql out of box unaccent function is unable to remove accents (diacritic signs) with more then one diacritic in a character. The character 'ド' which is failing for normalisation have 2 diacritic signs in it. I tried to check the latest version ofhttps://github.com/postgres/postgres/blob/ma...

                                  The postgresql out of box unaccent function is unable to remove accents (diacritic signs) with more then one diacritic in a character.  The character 'ド' which is failing for normalisation  have 2 diacritic signs in it.

I tried to check the latest version ofhttps://github.com/postgres/postgres/blob/master/contrib/unaccent/unaccent.rules  but still it does not have mapping for the Japanese characters like  'ド' . Can someone please guide how we can resolve this issue.

shailesh Totale (1 rep)

Nov 28, 2023, 01:11 PM • Last activity: Nov 30, 2023, 10:47 AM

6 votes

1 answers

23508 views

'ERROR: text search dictionary "unaccent" does not exist' during CREATE INDEX?

postgresql index index-tuning functions unaccent

I'm running PostgreSQL 9.3 on Mac OS X Yosemite. I try to create an unaccent lowercase trigram index. To achieve it I did this: mydb=# CREATE EXTENSION pg_trgm SCHEMA public VERSION "1.1"; CREATE EXTENSION unaccent SCHEMA public; ALTER FUNCTION unaccent(text) IMMUTABLE; CREATE EXTENSION CREATE EXTEN...

                                  I'm running PostgreSQL 9.3 on Mac OS X Yosemite.

I try to create an unaccent lowercase trigram index. To achieve it I did this:

    mydb=# CREATE EXTENSION pg_trgm SCHEMA public VERSION "1.1"; 
           CREATE EXTENSION unaccent SCHEMA public; 
           ALTER FUNCTION unaccent(text) IMMUTABLE;
    CREATE EXTENSION
    CREATE EXTENSION
    ALTER FUNCTION

Then I tried to create the index:

    mydb=# CREATE INDEX author_label_hun_gin_trgm ON address 
           USING gin (public.unaccent(lower(label_hun)) gin_trgm_ops);
    ERROR:  text search dictionary "unaccent" does not exist

... and got this error. If I try to list the available text search dictionaries the unaccent dictionary seems to be there:

    mydb=# \dFd
                                 List of text search dictionaries
       Schema   |      Name       |                        Description                        
    ------------+-----------------+-----------------------------------------------------------
     pg_catalog | danish_stem     | snowball stemmer for danish language
     pg_catalog | dutch_stem      | snowball stemmer for dutch language
     pg_catalog | english_stem    | snowball stemmer for english language
     pg_catalog | finnish_stem    | snowball stemmer for finnish language
     pg_catalog | french_stem     | snowball stemmer for french language
     pg_catalog | german_stem     | snowball stemmer for german language
     pg_catalog | hungarian_stem  | snowball stemmer for hungarian language
     pg_catalog | italian_stem    | snowball stemmer for italian language
     pg_catalog | norwegian_stem  | snowball stemmer for norwegian language
     pg_catalog | portuguese_stem | snowball stemmer for portuguese language
     pg_catalog | romanian_stem   | snowball stemmer for romanian language
     pg_catalog | russian_stem    | snowball stemmer for russian language
     pg_catalog | simple          | simple dictionary: just lower case and check for stopword
     pg_catalog | spanish_stem    | snowball stemmer for spanish language
     pg_catalog | swedish_stem    | snowball stemmer for swedish language
     pg_catalog | turkish_stem    | snowball stemmer for turkish language
     public     | unaccent        | 

Any idea what could be wrong here?
                                

Balázs E. Pataki (95 rep)

Apr 8, 2015, 03:10 PM • Last activity: Aug 1, 2023, 06:25 PM

3 votes

1 answers

1913 views

Trigram similarity (pg_trgm) with German umlauts

postgresql index full-text-search string-searching unaccent

I try to figure out how to improve Postgres 10.6 pg_trgm queries with German umlauts (äöü). In german 'ö' can be written as 'oe'. But beware: not every 'oe' can be written as 'ö'. ```sql CREATE TABLE public.names (name text COLLATE pg_catalog."default"); CREATE INDEX names_i...

I try to figure out how to improve Postgres 10.6 pg_trgm queries with German umlauts (äöü). In german 'ö' can be written as 'oe'. But beware: not every 'oe' can be written as 'ö'.

CREATE TABLE public.names
  (name text COLLATE pg_catalog."default");

CREATE INDEX names_idx
    ON public.names USING gin (name COLLATE pg_catalog."default" gin_trgm_ops);

SHOW LC_COLLATE; -- de_DE.UTF-8

When I use the [similarity()](https://www.postgresql.org/docs/10/pgtrgm.html#id-1.11.7.41.5) function to query the similarity for ***'Schoenstraße'***.

SELECT
	name,
	similarity (name, 'Schoenstraße') AS similarity,
	show_trgm (name)
FROM
	names
WHERE
	name % 'Schoenstraße'
ORDER BY
	similarity DESC;

I get the following result:

Name			similarity	show_trgm

Schyrenstraße	0.588235	{0x9a07c3,0xde3801,""  s"","" sc"",chy,ens,hyr,nst,ren,sch,str,tra,0x76a40e,yre}
Schönstraße		0.5625		{0x9a07c3,0xde3801,0xf00320,0x095f29,""  s"","" sc"",0x6deea5,nst,sch,str,tra,0x76a40e}

dbfiddle [here](https://dbfiddle.uk/?rdbms=postgres_10&fiddle=58f8436a5096a779f358572e615bf11f) Is there anything I can do to improve that or do I need to replace all umlauts in the DB?

Stephan (143 rep)

Jul 22, 2020, 09:35 AM • Last activity: Aug 1, 2023, 05:52 PM

2 votes

0 answers

503 views

Postgresql full text search without diacritics (effective unaccent)

postgresql full-text-search unaccent

I've similliar question as https://dba.stackexchange.com/questions/176471/postgres-full-text-search-with-unaccent-and-inflection-conjugation-etc . I want to use FTS but in "unaccent" mode. I'll explain with the example. The "base" word in my dictionary is "príloha" only (no unaccented version w...

                                  I've similliar question as https://dba.stackexchange.com/questions/176471/postgres-full-text-search-with-unaccent-and-inflection-conjugation-etc  .

I want to use FTS but in "unaccent" mode. I'll explain with the example. The "base" word in my dictionary is "príloha" only (no unaccented version whatsoever). Now I want to match the word "priloha" (notice the "i") with the base word in the dictionary.

I've created the text search configuration with unaccent filtering as below:

    CREATE TEXT SEARCH DICTIONARY public.slovak_hunspell(
    	TEMPLATE  = ispell,
    	DictFile  = sk_SK,
    	AffFile   = sk_SK,
    	StopWords = sk_SK
    );
    
    CREATE TEXT SEARCH CONFIGURATION public.slovak (
    	COPY = pg_catalog.english
    );
    
    ALTER TEXT SEARCH CONFIGURATION public.slovak
    	ALTER MAPPING
    	FOR
    		asciiword, asciihword, hword_asciipart,  word, hword, hword_part
    	WITH
    		unaccent, slovak_hunspell, simple
    ;
But the problem with this is that the unaccented string is pass down to the "slovak_hunspell" but because there isn't unaccented word in this dictionary no lexeme is returned. But what I would really need is to use unaccent for words from the "slovak_hunspell" dictionary.

TLDR; from the link above is that it couldn't be done without some drwbacks/restrictions.  
The only solution that I could think of is to use 2 dictionaries, one with the diacritics etc. and the another one unaccented, i.e. plain ASCII. And then use the unaccented for the "asciiword" etc. and another one for "word" etc.. Would this work? Or are there some other alternatives?  
I've searched a lot but haven't find anything useful, only the link at the beginning of this post.
                                

Trigve (21 rep)

Oct 2, 2022, 12:21 PM

4 votes

1 answers

5142 views

"Function unaccent(text) does not exist" in update trigger

postgresql trigger unaccent

I've created a trigger for table `x1` to update a column `y` with the expression `to_tsvector(unaccent(x1.col1 || ' ' || x2.col1))`. This trigger function throws: > function unaccent(text) does not exist Why would this function not exist when a trigger is called, but exist when executed manually? I'...

                                  I've created a trigger for table x1 to update a column y with the expression to_tsvector(unaccent(x1.col1 || ' ' || x2.col1)). This trigger function throws:

>     function unaccent(text) does not exist

Why would this function not exist when a trigger is called, but exist when executed manually? I'm using Supabase to manage this database.

Alb (142 rep)

May 22, 2022, 04:12 PM • Last activity: May 22, 2022, 10:02 PM

6 votes

1 answers

12544 views

Creating a case-insensitive and accent/diacritics insensitive search on a field

postgresql index full-text-search case-sensitive unaccent

I would like to know if it makes sense to create an index combining two functions while using Full Text Search: `lower(name)` and `f_unaccent(name)` Where `f_unaccent` is just my wrapper to make unaccent function immutable. I do have an index working on: `f_unaccent(name)` using `varchar_pattern_ops...

                                  I would like to know if it makes sense to create an index combining two functions while using Full Text Search: lower(name) and f_unaccent(name) Where f_unaccent is just my wrapper to make unaccent function immutable.

I do have an index working on: f_unaccent(name) using varchar_pattern_ops. My question is: would an index that combines lower and unaccent functions be triggered by full_text_search lower(f_unaccent(name)).

I don't know if the lower function would be useful working with Full Text Search algorithms.

Rubén_ic (63 rep)

Jun 22, 2017, 03:45 PM • Last activity: Feb 28, 2020, 12:18 PM

3 votes

1 answers

4789 views

'text search dictionary "unaccent" does not exist' entries in postgres log, supposedly during automatic analyze

postgresql postgresql-10 autovacuum unaccent

I have many such entries in the postgresql main log, ever since upgrading to Postgres 10: 2018-03-28 08:51:00.281 CEST [97547] ERROR: text search dictionary "unaccent" does not exist 2018-03-28 08:51:00.281 CEST [97547] CONTEXT: automatic analyze of table "dbname.public.periodical" I do use unaccent...

                                  I have many such entries in the postgresql main log, ever since upgrading to Postgres 10:

    2018-03-28 08:51:00.281 CEST  ERROR:  text search dictionary "unaccent" does not exist
    2018-03-28 08:51:00.281 CEST  CONTEXT:  automatic analyze of table "dbname.public.periodical"

I do use unaccent in many indexes in all tables (by making unaccent immutable (I know it is not the recommended way, but it is so historically)). 

If I manually run vacuumdb dbname --table periodical or VACUUM periodical or VACUUM ANALYZE periodical, it does not generate such an error (neither on the command line, nor in the log).

Does anyone have a tip how I could stop these messages and what their real cause may be?

PostgreSQL version: 10.3 (Debian 10.3-1.pgdg80+1)

P.Péter (911 rep)

Mar 28, 2018, 07:19 AM • Last activity: Dec 10, 2018, 12:01 PM

2 votes

1 answers

578 views

Custom unaccent rules in heroku

postgresql heroku unaccent

We have an application running in heroku and we want to add a search feature on a field that contains greek characters and we want to make it accent agnostic. So the idea is to use postgresql's [unaccent](https://www.postgresql.org/docs/9.1/static/unaccent.html) functionality. The problem is that by...

                                  We have an application running in heroku and we want to add a search feature on a field that contains greek characters and we want to make it accent agnostic.

So the idea is to use postgresql's [unaccent](https://www.postgresql.org/docs/9.1/static/unaccent.html)  functionality. The problem is that by default greek rules are not included at heroku and we somehow need to add them (we already have a rules file).

Has anyone managed to add custom rules files at heroku, and if yes could you please share how?

George Karanikas (123 rep)

Sep 17, 2017, 01:42 PM • Last activity: Jan 3, 2018, 05:03 PM

6 votes

2 answers

5478 views

Postgres full text search with unaccent and inflection (conjugation, etc.)

postgresql full-text-search unaccent

I want to be able to search unaccented phrases in an inflected (Polish) language in Postgres. Say, if a document contains `robiłem`, the lexeme should be `robić` (the infinivite). Its forms are `robię`, `robił`, `robiła` and so on. I want to be able to find it, for example, with a phrase `robie` whi...

                                  I want to be able to search unaccented phrases in an inflected (Polish) language in Postgres.

Say, if a document contains robiłem, the lexeme should be robić (the infinivite). Its forms are robię, robił, robiła and so on. I want to be able to find it, for example, with a phrase robie which is unaccented robię.

**What I did** is I started out with a perfectly well working polish text search config

    CREATE TEXT SEARCH DICTIONARY polish_ispell (
    TEMPLATE = pg_catalog.ispell,
    dictfile = 'polish', afffile = 'polish', stopwords = 'polish' );

Then I tried to extend it to include the unaccent.

    create extension unaccent;
    create text search configuration polish_unaccented (copy = polish);
    ALTER TEXT SEARCH CONFIGURATION polish_unaccented   ALTER MAPPING FOR hword, 
    hword_part, word WITH unaccen, polish_ispell, simple, ;

Sadly, lexems are not created correctly with this config:

    select to_tsvector('polish_unaccented' ,'robił');

    'robil':1

The lexem should be of course:

    'robić':1

So the below cant return true (and that's what I need I think):

    select to_tsvector('polish_unaccented','robić') @@ to_tsquery('polish_unaccented','robie');

I've googled but did not find any documents showing how to really configure Postgres for my case. The docs only show the lame 'Hôtels' example, which is not a 'lexemed' word.

Cheers

Tomek (61 rep)

Jun 16, 2017, 02:42 PM • Last activity: Nov 15, 2017, 04:04 AM

3 votes

1 answers

2154 views

Error during pg_restore: text search dictionary "unaccent" does not exist

postgresql restore postgresql-9.3 unaccent

I'm trying to move data between servers. I recreated the whole database structure, checked that same users exist and the `unaccent` extension that we use is enabled for all schemas of the target database. When I try to run: pg_restore -h %server_host% -a -c --disable-triggers -d %db_name% -U %user_n...

                                  I'm trying to move data between servers. I recreated the whole database structure, checked that same users exist and the unaccent extension that we use is enabled for all schemas of the target database.

When I try to run:

    pg_restore -h %server_host% -a -c --disable-triggers -d %db_name% -U %user_name% 2014-12-09.custom

I get the following error:

>     pg_restore: [archiver (db)] COPY failed for table "contracts":
>     ERROR:  text search dictionary "unaccent" does not exist

What am I missing here?

jezzarax (181 rep)

Dec 9, 2014, 08:56 AM • Last activity: Dec 29, 2015, 02:47 PM

2 votes

1 answers

3445 views

PostgreSQL translate special characters?

postgresql unaccent

**Description:** PostgreSQL 9.3 String: **`'ì ằ ú ề'`** Desired Result: **`'i a u e'`** My code: select translate ('ì ằ ú ề', 'ìằúề', 'iaue') ; -- it works. Result: i a u e **Question:** If I use it this way, I have to define a manual translation between 'ìằúề...

                                  **Description:**

PostgreSQL 9.3

String: **'ì ằ ú ề'**

Desired Result:  **'i a u e'**

My code:

    select translate ('ì ằ ú ề', 'ìằúề', 'iaue') ; -- it works. Result: i a u e

**Question:**

If I use it this way, I have to define a manual translation between 'ìằúề' and 'iaue'. Is there a better solution?

Ref: PG Document

Luan Huynh (2010 rep)

Dec 28, 2015, 10:57 AM • Last activity: Dec 29, 2015, 04:19 AM

Showing page 1 of 15 total questions