Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

2 votes

2 answers

2040 views

Identify the version of a collation from ICU in Postgres

postgresql collation data-versioning international-components-unicode

Postgres 10 and later incorporates the [International Components for Unicode (ICU)][1] library for text-handling and other internationalization issues. Changes happen to human languages, such as sorting rules evolving. Therefore collation definitions are versioned in ICU. ➥ How can I identify the ve...

                                  Postgres 10 and later incorporates the International Components for Unicode (ICU)  library for text-handling and other internationalization issues. 

Changes happen to human languages, such as sorting rules evolving. Therefore collation definitions are versioned in ICU. 

➥ How can I identify the version number of a collation? 

I know Postgres will report  version number as part of a collation version mismatch error message. Is there a way to proactively query for the version rather than wait for an error message?

Basil Bourque (11188 rep)

Aug 4, 2019, 03:16 AM • Last activity: Mar 11, 2025, 06:36 AM

0 votes

0 answers

247 views

Error Building PostgreSQL from Source: Undefined ICU References During make world-bin

postgresql installation international-components-unicode

I am trying to build PostgreSQL from source on my Linux machine but am encountering errors related to ICU during the make world-bin process. Below are the details of what I've done so far: ### Steps Taken: ``` sudo yum install libicu-devel git clone git://git.postgresql.org/git/postgresql.git ./conf...

sudo yum install libicu-devel

git clone git://git.postgresql.org/git/postgresql.git
./configure --enable-tap-tests --prefix=pwd/build
make world-bin

### Issues During the make world-bin process, I encountered the following error messages indicating undefined references to ICU functions:

/usr/bin/ld: pg_locale.c:(.text+0x1701): undefined reference to `ucol_getRules_73'
/usr/bin/ld: pg_locale.c:(.text+0x1724): undefined reference to `u_strlen_73'
/usr/bin/ld: pg_locale.c:(.text+0x1739): undefined reference to `u_strlen_73'
/usr/bin/ld: pg_locale.c:(.text+0x1757): undefined reference to `u_strcpy_73'
/usr/bin/ld: pg_locale.c:(.text+0x1764): undefined reference to `u_strcat_73'
... (more similar errors)
collect2: error: ld returned 1 exit status
make: *** [Makefile:67: postgres] Error 1

I have ICU version 73.1 installed and confirmed with:

icu-config --version

My ~/.zshrc file includes the following configurations:

export LDFLAGS="-L/usr/lib64"
export CPPFLAGS="-I/usr/include"
export PATH="/usr/lib64/icu:$PATH"
export PKG_CONFIG_PATH="/usr/lib64/pkgconfig/"
export LD_LIBRARY_PATH="/usr/local/lib:$LD_LIBRARY_PATH"

I have also sourced the .zshrc file after making these changes. What might be causing these ICU-related errors, and how can I resolve them to successfully build PostgreSQL?

Ayush Vatsa (55 rep)

Sep 21, 2024, 09:33 AM

1 votes

1 answers

214 views

Postgres sorting order not respecting default C collation?

postgresql collation international-components-unicode

I have a postgres DB running in an AL2023 docker image: ``` postgres=# select version(); select version(); version ---------------------------------------------------------------------------------------------------------- PostgreSQL 15.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.4.1 20230605...

I have a postgres DB running in an AL2023 docker image:

postgres=# select version();
select version();
                                                 version                                                  
----------------------------------------------------------------------------------------------------------
 PostgreSQL 15.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2), 64-bit
(1 row)

postgres=#

I created this using these args:

--locale-provider=icu --icu-locale=C.UTF8 --encoding=UTF8

Which produces the following:

The database cluster will be initialized with this locale configuration:
  provider:    icu
  ICU locale:  C.UTF8
  LC_COLLATE:  C
  LC_CTYPE:    C
  LC_MESSAGES: C
  LC_MONETARY: C
  LC_NUMERIC:  C
  LC_TIME:     C
The default text search configuration will be set to "english".

If I ask for a sorting between caps and noncaps, I get lower case coming first which (a) isn't what I want and (b) isn't what I'd expect for ASCII / UTF-8 / C

postgres=# select 'C' < 'c';
select 'C' < 'c';
 ?column? 
----------
 f
(1 row)

So if I used the C collation, I'd get it the right way round, perhaps?

postgres=# select 'C' < 'c' collate "C";
select 'C' < 'c' collate "C";
 ?column? 
----------
 t
(1 row)

I thought I was using C, so I check what collation am I using and get ...

postgres=# select datname, datcollate from pg_database;
select datname, datcollate from pg_database;
  datname  | datcollate 
-----------+------------
 postgres  | C
 template1 | C
 template0 | C
(3 rows)

I'm using C - so why is the default not the same as if I use C explicitly?

Richard Wheeldon (133 rep)

Nov 29, 2023, 02:24 PM • Last activity: Nov 29, 2023, 05:43 PM

4 votes

2 answers

891 views

Numeric collation sorts by digits instead of value- postgres

postgresql collation sorting international-components-unicode

I have a table of users with ids in the following form `user123@domain.com`. When searching and sorting the users, I need `user1@domian.com` to be before `user14@domain.com` but since 4 is “smaller” than @ it sorts the other way around. After looking around I came across the following collation: CRE...

                                  I have a table of users with ids in the following form user123@domain.com. 
When searching and sorting the users, I need user1@domian.com to be before user14@domain.com but since 4 is “smaller” than @ it sorts the other way around. 

After looking around I came across the following collation:

    CREATE COLLATION numeric (provider = icu, locale = 'en-u-kn-true');

Which fixed the problem, but caused another one, users with numbers that started with 2 appeared after users that started with 1

For example, user19@domain.com came before user2@domain.com  

From the docs, it says that kn-true and looks at the number’s value but it looks like it only compares the digits. 

I'm using Postgres 11.6.  I have two servers: one on Windows (local) and a remote one on Linux.

Is anyone familiar with this issue?

Amit Keinan (41 rep)

Dec 28, 2022, 07:17 PM • Last activity: Jan 5, 2023, 04:05 AM

1 votes

1 answers

2493 views

Collation for accent-insensitive comparison on Postgres?

postgresql collation localization international-components-unicode

On PG 13 documentation, there are several examples of ICU collations for specialized purposes. It is also mentioned that ICU locales exist that allow creating collations to ignore accents, and that they can be found on https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml Howeve...

                                  On PG 13 documentation, there are several examples of ICU collations for specialized purposes.  It is also mentioned that ICU locales exist that allow creating collations to ignore accents, and that they can be found on https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml 

However, after reading that document, I am still unclear on what is the locale I should use to create an ICU collation for accent insensitive comparisons in Spanish.

What is the name of such an ICU locale?  Is there any list of ICU Spanish collations?

ARX (1509 rep)

Sep 13, 2021, 01:18 AM • Last activity: Sep 13, 2021, 03:05 AM

2 votes

1 answers

1193 views

Equivalent of utf8_general_ci in Postgres/ICU?

postgresql collation international-components-unicode

In MySQL there is a collation `utf8_general_ci` which provides case-insensitive comparisons in a variety of languages. For example, these are all `1` (true): ```sql SELECT 'ı' = 'I' SOLLATE 'utf8_general_ci'; SELECT 'i' = 'I' COLLATE 'utf8_general_ci'; SELECT 'ä' = 'Ä' COLLATE 'utf8_genera...

In MySQL there is a collation utf8_general_ci which provides case-insensitive comparisons in a variety of languages. For example, these are all 1 (true):

SELECT 'ı' = 'I' SOLLATE 'utf8_general_ci';
SELECT 'i' = 'I' COLLATE 'utf8_general_ci';
SELECT 'ä' = 'Ä' COLLATE 'utf8_general_ci';

Can I define a similar collation using PostgreSQL's ICU? I tried

CREATE COLLATION "undefined_ci_nondet_old" (
  PROVIDER = 'icu',
  LOCALE = "@colStrength=secondary",
  DETERMINISTIC = false
);

But that doesn't seem to include the Turkish I/ı conversion:

SELECT 'ı' = 'I' COLLATE undefined_ci_nondet_old; -- false

AndreKR (607 rep)

May 9, 2021, 10:00 PM • Last activity: May 10, 2021, 01:12 PM

15 votes

1 answers

15325 views

PostgreSQL nondeterministic collations are not supported for LIKE

collation unicode postgresql-12 case-sensitive international-components-unicode

I am using Postgresql v12. I created a collation like this: CREATE COLLATION ci (provider = icu, locale = 'tr_TR', deterministic = false); I used that collation in a table: create table testtable1 ( id serial primary key, name text COLLATE "ci" ); And I inserted sample data: insert into testtable1 v...

                                  I am using Postgresql v12.
I created a collation like this:

    CREATE COLLATION ci (provider = icu, locale = 'tr_TR', deterministic = false);

I used that collation in a table: 

    create table testtable1 (
        id serial  primary key,
        name text  COLLATE "ci"
    );

And I inserted sample data:

    insert into testtable1 values(3,'abc');

When I query that table using LIKE, it returns the following error:

    select name from testtable1 WHERE name LIKE '%a%'  

> ERROR:  nondeterministic collations are not supported for LIKE  
> SQL state: 0A000

But I need to use LIKE. Is there any way to allow this?

Banu Akkus (389 rep)

Apr 29, 2020, 02:40 PM • Last activity: May 5, 2020, 04:47 PM

6 votes

1 answers

4674 views

What is the meaning of "-x-icu" in PostgreSQL's "collates"?

postgresql collation unicode international-components-unicode

I had this query: SELECT * FROM table ORDER BY label ASC; Since the labels are not in English, they didn't get sorted in the right order (ones beginning with "ö" were not in the bottom/end). I therefore tried: SELECT * FROM table ORDER BY label COLLATE "sv-SE" ASC; SELECT * FROM table ORDER BY...

                                  I had this query:

	SELECT * FROM table ORDER BY label ASC;

Since the labels are not in English, they didn't get sorted in the right order (ones beginning with "ö" were not in the bottom/end).

I therefore tried:

	SELECT * FROM table ORDER BY label COLLATE "sv-SE" ASC;
	SELECT * FROM table ORDER BY label COLLATE "sv_SE" ASC;

These gave errors about those collations not existing, which baffled me.

After a bunch of searching, I figured out to do:

	SELECT * FROM pg_collation;

Which revealed that it's supposed to be:

	sv-SE-x-icu

It worked when I used that identifier, but what's with the "-x-icu" stuff? What's that all about? I hate that they always have to mess with the standard locale identifiers so you can never rely on just the standard "language_location" format.

Tadhg Mocniak (69 rep)

Feb 21, 2020, 04:30 PM • Last activity: Apr 30, 2020, 05:30 AM

3 votes

1 answers

3516 views

Does PostgreSQL support ICU collation's options and settings?

postgresql collation postgresql-10 international-components-unicode

ICU Specifies different [LDML Collation Settings](http://www.unicode.org/reports/tr35/tr35-collation.html#Setting_Options). Some of them seem pretty interesting, especially the ones on case and accent, > * “Ignore accents”: `strength=primary` > * “Ignore accents” but take case into account: `strengt...

                                  ICU Specifies different [LDML Collation Settings](http://www.unicode.org/reports/tr35/tr35-collation.html#Setting_Options) . Some of them seem pretty interesting, especially the ones on case and accent,

> * “Ignore accents”: strength=primary
> * “Ignore accents” but take case into account: strength=primary caseLevel=on
> * “Ignore case”: strength=secondary
> * “Ignore punctuation” (completely): strength=tertiary alternate=shifted
> * “Ignore punctuation” but distinguish among punctuation marks: strength=quaternary alternate=shifted potentially a better method of doing what 

You can also see these [documented here](http://userguide.icu-project.org/collation/customization) . Are these ICU options and settings possible with [PostgreSQL 10 ICU collation support](https://www.postgresql.org/docs/10/static/collation.html) ?

    CREATE COLLATION special (provider = icu, locale = 'en@strength=primary');
    SELECT 'Å' LIKE 'A' COLLATE "special"; # returns false

I've also tried the [CLDR BCP47](http://cldr.unicode.org/index/bcp47-extension) 

> Starting with ICU 54, collation attributes can be specified via locale keywords as well, in the old locale extension syntax ("el@colCaseFirst=upper") or in language tag syntax ("el-u-kf-upper"). Keywords and values are case-insensitive. See the [LDML Collation spec, Collation Settings](http://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Settings) , and the [data file](http://unicode.org/repos/cldr/trunk/common/bcp47/collation.xml)  listing the valid collation keywords and their values. (The deprecated attributes kh/colHiraganaQuaternary and vt/variableTop are not supported.)

For that, this looked right

    CREATE COLLATION special (provider = icu, locale = 'en-ks-level1');
    SELECT 'Å' LIKE 'A' COLLATE "special"; # returns false

Also tried en-u-ks-level1 That method seems to be what the docs go for,

    CREATE COLLATION german_phonebook (provider = icu, locale = 'de-u-co-phonebk');

Evan Carroll (65502 rep)

Nov 28, 2017, 04:18 PM • Last activity: Oct 4, 2019, 12:21 PM

2 votes

1 answers

248 views

International Components for Unicode (ICU) Date Formatter for MySQL?

mysql date-format unicode localization international-components-unicode

Regarding to [ICU - International Components for Unicode][1] definition: > ICU is a mature, widely used set of C/C++ and Java libraries providing Unicode and Globalization support for software applications. ICU is widely portable and gives applications the same results on all platforms and between C...

                                  Regarding to ICU - International Components for Unicode  definition:

> ICU is a mature, widely used set of C/C++ and Java libraries providing Unicode and Globalization support for software applications. ICU is widely portable and gives applications the same results on all platforms and between C/C++ and Java software.

one of important part is formatting:

> Formatting: Format numbers, dates, times and currency amounts according the conventions of a chosen locale. This includes translating month and day names into the selected language, choosing appropriate abbreviations, ordering fields correctly, etc. This data also comes from the Common Locale Data Repository.

which implemented in vary languages; I would like to know if there is any implementation or work for MySQL to add this feature for Unicode date formatting?

Hamid (21 rep)

Apr 23, 2017, 07:04 AM • Last activity: Jul 13, 2018, 09:56 PM

8 votes

1 answers

4592 views

Specify ICU collations as `ENCODING`, `LC_COLLATE`, and `LC_CTYPE` in Postgres 10

postgresql collation postgresql-10 international-components-unicode

Postgres 10 gains the ability to use [*International Components for Unicode (ICU)*](http://site.icu-project.org/) collations rather than depending on host OS implementations. See [*More robust collations with ICU support in PostgreSQL 10*](https://blog.2ndquadrant.com/icu-support-postgresql-10/) by...

                                  Postgres 10 gains the ability to use [*International Components for Unicode (ICU)*](http://site.icu-project.org/)  collations rather than depending on host OS implementations. See [*More robust collations with ICU support in PostgreSQL 10*](https://blog.2ndquadrant.com/icu-support-postgresql-10/)  by Peter Eisentraut.

So how exactly does one specify an ICU collation when creating the database?

I did read a tip in the CREATE DATABASE to use template0 rather than template1 if template1 might have been created using encoding and collation other than what you want for your new database. And when using template0, you need to enable data connections to override the default.

And I read in Postgres 10 doc page for [*Collation Support*](https://www.postgresql.org/docs/10/static/collation.html)  that:

>und-x-icu (for “undefined”)
>
>ICU “root” collation. Use this to get a reasonable language-agnostic sort order.

So it seems und-x-icu might be a good value to use when establishing a new database that will include multi-lingual text. When a certain language is known to focus on a particular language, the SELECT command can specify another collation such as German de-x-icu. Otherwise, fallback to the default of und-x-icu. 

So I tried the following SQL to create a database in Postgres 10 Beta 2.



    CREATE DATABASE timepiece_
    TEMPLATE 'template0'
    ALLOW_CONNECTIONS TRUE
    CONNECTION LIMIT -1
    ENCODING 'UTF8'
    LC_COLLATE 'und-x-icu'
    LC_CTYPE 'und-x-icu'
    ;

Fails with an error:

> ERROR: invalid locale name: "und-x-icu"

So I stuck on an appended .utf8 as seen in the doc for the old-fashioned collation syntax. 



    CREATE DATABASE timepiece_
    TEMPLATE 'template0'
    ALLOW_CONNECTIONS TRUE
    CONNECTION LIMIT -1
    ENCODING 'UTF8'
    LC_COLLATE 'und-x-icu.utf8'
    LC_CTYPE 'und-x-icu.utf8'
    ;

Fails just the same:

> ERROR: invalid locale name: "und-x-icu.utf8"

Was my Postgres cluster built with ICU libraries? Here is the result of calling pg_config . The last line is 'ICU_LIBS=-L/opt/local/Current/lib -licuuc -licudata -licui18n'. I guess that means my build includes the ICU libraries but I am not sure.

>/Library/PostgreSQL/10Beta2/bin/pg_config --configure
>
>'--with-icu' '--prefix=/mnt/hgfs/pginstaller.pune/server/staging_cache/osx' '--with-ldap' '--with-openssl' '--with-perl' '--with-python' '--with-tcl' '--with-bonjour' '--with-pam' '--enable-thread-safety' '--with-libxml' '--with-uuid=e2fs' '--with-includes=/opt/local/Current/include/libxml2:/opt/local/Current/include:/opt/local/Current/include/security' '--docdir=/mnt/hgfs/pginstaller.pune/server/staging_cache/osx/doc/postgresql' '--with-libxslt' '--with-libedit-preferred' '--with-gssapi' 'CFLAGS=-isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk -mmacosx-version-min=10.8 -arch i386 -arch x86_64 -O2' 'LDFLAGS=-L/opt/local/Current/lib' 'ICU_CFLAGS=-I/opt/local/Current/include' 'ICU_LIBS=-L/opt/local/Current/lib -licuuc -licudata -licui18n'

➠ How does one specify the ENCODING, LC_COLLATE, and LC_TYPE to use the new ICU collations?

➠ What ICU collation names you would suggest for English-oriented databases? For Western European multi-lingual databases?

➠ What are the possible ICU collation names?

                                

Basil Bourque (11188 rep)

Aug 13, 2017, 05:15 AM • Last activity: Aug 23, 2017, 02:22 PM

3 votes

1 answers

2012 views

Was my Postgres cluster built with the ICU libraries available for Postgres 10 and later?

postgresql postgresql-10 international-components-unicode

I wonder if my installation of Postgres 10 Beta 2 has been built to include the new [*International Components for Unicode (ICU)*](http://site.icu-project.org/) collations. For background info, see [*More robust collations with ICU support in PostgreSQL 10*](https://blog.2ndquadrant.com/icu-support-...

                                  I wonder if my installation of Postgres 10 Beta 2 has been built to include the new [*International Components for Unicode (ICU)*](http://site.icu-project.org/)  collations. For background info, see [*More robust collations with ICU support in PostgreSQL 10*](https://blog.2ndquadrant.com/icu-support-postgresql-10/)  by Peter Eisentraut.

I used the pg_config  utility to detect the build info. When running pg_config --configure I get output where the last line is 'ICU_LIBS=-L/opt/local/Current/lib -licuuc -licudata -licui18n'.

➠ Does this mean I have the ICU libraries installed, or not?

>/Library/PostgreSQL/10Beta2/bin/pg_config --configure
>
>'--with-icu' '--prefix=/mnt/hgfs/pginstaller.pune/server/staging_cache/osx' '--with-ldap' '--with-openssl' '--with-perl' '--with-python' '--with-tcl' '--with-bonjour' '--with-pam' '--enable-thread-safety' '--with-libxml' '--with-uuid=e2fs' '--with-includes=/opt/local/Current/include/libxml2:/opt/local/Current/include:/opt/local/Current/include/security' '--docdir=/mnt/hgfs/pginstaller.pune/server/staging_cache/osx/doc/postgresql' '--with-libxslt' '--with-libedit-preferred' '--with-gssapi' 'CFLAGS=-isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk -mmacosx-version-min=10.8 -arch i386 -arch x86_64 -O2' 'LDFLAGS=-L/opt/local/Current/lib' 'ICU_CFLAGS=-I/opt/local/Current/include' 'ICU_LIBS=-L/opt/local/Current/lib -licuuc -licudata -licui18n'

➠ Is there another way to detect if my Postgres has the ICU collations available? Some way via SQL?

When I execute SELECT * FROM pg_collation ;, of the 845 locale names listed, 575 end in -x-icu. Does that happen only if the ICU libraries are in place?

Basil Bourque (11188 rep)

Aug 13, 2017, 07:29 PM • Last activity: Aug 14, 2017, 12:45 PM

Showing page 1 of 12 total questions