Database Administrators
Q&A for database professionals who wish to improve their database skills
Latest Questions
2
votes
2
answers
2040
views
Identify the version of a collation from ICU in Postgres
Postgres 10 and later incorporates the [International Components for Unicode (ICU)][1] library for text-handling and other internationalization issues. Changes happen to human languages, such as sorting rules evolving. Therefore collation definitions are versioned in ICU. ➥ How can I identify the ve...
Postgres 10 and later incorporates the International Components for Unicode (ICU) library for text-handling and other internationalization issues.
Changes happen to human languages, such as sorting rules evolving. Therefore collation definitions are versioned in ICU.
➥ How can I identify the version number of a collation?
I know Postgres will report version number as part of a collation version mismatch error message. Is there a way to proactively query for the version rather than wait for an error message?
Basil Bourque
(11188 rep)
Aug 4, 2019, 03:16 AM
• Last activity: Mar 11, 2025, 06:36 AM
0
votes
0
answers
247
views
Error Building PostgreSQL from Source: Undefined ICU References During make world-bin
I am trying to build PostgreSQL from source on my Linux machine but am encountering errors related to ICU during the make world-bin process. Below are the details of what I've done so far: ### Steps Taken: ``` sudo yum install libicu-devel git clone git://git.postgresql.org/git/postgresql.git ./conf...
I am trying to build PostgreSQL from source on my Linux machine but am encountering errors related to ICU during the make world-bin process. Below are the details of what I've done so far:
### Steps Taken:
sudo yum install libicu-devel
git clone git://git.postgresql.org/git/postgresql.git
./configure --enable-tap-tests --prefix=pwd
/build
make world-bin
### Issues
During the make world-bin process, I encountered the following error messages indicating undefined references to ICU functions:
/usr/bin/ld: pg_locale.c:(.text+0x1701): undefined reference to `ucol_getRules_73'
/usr/bin/ld: pg_locale.c:(.text+0x1724): undefined reference to `u_strlen_73'
/usr/bin/ld: pg_locale.c:(.text+0x1739): undefined reference to `u_strlen_73'
/usr/bin/ld: pg_locale.c:(.text+0x1757): undefined reference to `u_strcpy_73'
/usr/bin/ld: pg_locale.c:(.text+0x1764): undefined reference to `u_strcat_73'
... (more similar errors)
collect2: error: ld returned 1 exit status
make: *** [Makefile:67: postgres] Error 1
I have ICU version 73.1 installed and confirmed with:
icu-config --version
My ~/.zshrc file includes the following configurations:
export LDFLAGS="-L/usr/lib64"
export CPPFLAGS="-I/usr/include"
export PATH="/usr/lib64/icu:$PATH"
export PKG_CONFIG_PATH="/usr/lib64/pkgconfig/"
export LD_LIBRARY_PATH="/usr/local/lib:$LD_LIBRARY_PATH"
I have also sourced the .zshrc file after making these changes.
What might be causing these ICU-related errors, and how can I resolve them to successfully build PostgreSQL?
Ayush Vatsa
(55 rep)
Sep 21, 2024, 09:33 AM
1
votes
1
answers
214
views
Postgres sorting order not respecting default C collation?
I have a postgres DB running in an AL2023 docker image: ``` postgres=# select version(); select version(); version ---------------------------------------------------------------------------------------------------------- PostgreSQL 15.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.4.1 20230605...
I have a postgres DB running in an AL2023 docker image:
postgres=# select version();
select version();
version
----------------------------------------------------------------------------------------------------------
PostgreSQL 15.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2), 64-bit
(1 row)
postgres=#
I created this using these args:
--locale-provider=icu --icu-locale=C.UTF8 --encoding=UTF8
Which produces the following:
The database cluster will be initialized with this locale configuration:
provider: icu
ICU locale: C.UTF8
LC_COLLATE: C
LC_CTYPE: C
LC_MESSAGES: C
LC_MONETARY: C
LC_NUMERIC: C
LC_TIME: C
The default text search configuration will be set to "english".
If I ask for a sorting between caps and noncaps, I get lower case coming first which (a) isn't what I want and (b) isn't what I'd expect for ASCII / UTF-8 / C
postgres=# select 'C' < 'c';
select 'C' < 'c';
?column?
----------
f
(1 row)
So if I used the C collation, I'd get it the right way round, perhaps?
postgres=# select 'C' < 'c' collate "C";
select 'C' < 'c' collate "C";
?column?
----------
t
(1 row)
I thought I was using C, so I check what collation am I using and get ...
postgres=# select datname, datcollate from pg_database;
select datname, datcollate from pg_database;
datname | datcollate
-----------+------------
postgres | C
template1 | C
template0 | C
(3 rows)
I'm using C - so why is the default not the same as if I use C explicitly?
Richard Wheeldon
(133 rep)
Nov 29, 2023, 02:24 PM
• Last activity: Nov 29, 2023, 05:43 PM
4
votes
2
answers
891
views
Numeric collation sorts by digits instead of value- postgres
I have a table of users with ids in the following form `user123@domain.com`. When searching and sorting the users, I need `user1@domian.com` to be before `user14@domain.com` but since 4 is “smaller” than @ it sorts the other way around. After looking around I came across the following collation: CRE...
I have a table of users with ids in the following form
user123@domain.com
.
When searching and sorting the users, I need user1@domian.com
to be before user14@domain.com
but since 4 is “smaller” than @ it sorts the other way around.
After looking around I came across the following collation:
CREATE COLLATION numeric (provider = icu, locale = 'en-u-kn-true');
Which fixed the problem, but caused another one, users with numbers that started with 2 appeared after users that started with 1
For example, user19@domain.com
came before user2@domain.com
From the docs, it says that kn-true
and looks at the number’s value but it looks like it only compares the digits.
I'm using Postgres 11.6. I have two servers: one on Windows (local) and a remote one on Linux.
Is anyone familiar with this issue?
Amit Keinan
(41 rep)
Dec 28, 2022, 07:17 PM
• Last activity: Jan 5, 2023, 04:05 AM
1
votes
1
answers
2493
views
Collation for accent-insensitive comparison on Postgres?
On PG 13 documentation, there are several examples of ICU collations for specialized purposes. It is also mentioned that ICU locales exist that allow creating collations to ignore accents, and that they can be found on https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml Howeve...
On PG 13 documentation, there are several examples of ICU collations for specialized purposes. It is also mentioned that ICU locales exist that allow creating collations to ignore accents, and that they can be found on https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml
However, after reading that document, I am still unclear on what is the locale I should use to create an ICU collation for accent insensitive comparisons in Spanish.
What is the name of such an ICU locale? Is there any list of ICU Spanish collations?
ARX
(1509 rep)
Sep 13, 2021, 01:18 AM
• Last activity: Sep 13, 2021, 03:05 AM
2
votes
1
answers
1193
views
Equivalent of utf8_general_ci in Postgres/ICU?
In MySQL there is a collation `utf8_general_ci` which provides case-insensitive comparisons in a variety of languages. For example, these are all `1` (true): ```sql SELECT 'ı' = 'I' SOLLATE 'utf8_general_ci'; SELECT 'i' = 'I' COLLATE 'utf8_general_ci'; SELECT 'ä' = 'Ä' COLLATE 'utf8_genera...
In MySQL there is a collation
utf8_general_ci
which provides case-insensitive comparisons in a variety of languages. For example, these are all 1
(true):
SELECT 'ı' = 'I' SOLLATE 'utf8_general_ci';
SELECT 'i' = 'I' COLLATE 'utf8_general_ci';
SELECT 'ä' = 'Ä' COLLATE 'utf8_general_ci';
Can I define a similar collation using PostgreSQL's ICU?
I tried
CREATE COLLATION "undefined_ci_nondet_old" (
PROVIDER = 'icu',
LOCALE = "@colStrength=secondary",
DETERMINISTIC = false
);
But that doesn't seem to include the Turkish I/ı conversion:
SELECT 'ı' = 'I' COLLATE undefined_ci_nondet_old; -- false
AndreKR
(607 rep)
May 9, 2021, 10:00 PM
• Last activity: May 10, 2021, 01:12 PM
15
votes
1
answers
15325
views
PostgreSQL nondeterministic collations are not supported for LIKE
I am using Postgresql v12. I created a collation like this: CREATE COLLATION ci (provider = icu, locale = 'tr_TR', deterministic = false); I used that collation in a table: create table testtable1 ( id serial primary key, name text COLLATE "ci" ); And I inserted sample data: insert into testtable1 v...
I am using Postgresql v12.
I created a collation like this:
CREATE COLLATION ci (provider = icu, locale = 'tr_TR', deterministic = false);
I used that collation in a table:
create table testtable1 (
id serial primary key,
name text COLLATE "ci"
);
And I inserted sample data:
insert into testtable1 values(3,'abc');
When I query that table using
LIKE
, it returns the following error:
select name from testtable1 WHERE name LIKE '%a%'
> ERROR: nondeterministic collations are not supported for LIKE
> SQL state: 0A000
But I need to use LIKE
. Is there any way to allow this?
Banu Akkus
(389 rep)
Apr 29, 2020, 02:40 PM
• Last activity: May 5, 2020, 04:47 PM
6
votes
1
answers
4674
views
What is the meaning of "-x-icu" in PostgreSQL's "collates"?
I had this query: SELECT * FROM table ORDER BY label ASC; Since the labels are not in English, they didn't get sorted in the right order (ones beginning with "ö" were not in the bottom/end). I therefore tried: SELECT * FROM table ORDER BY label COLLATE "sv-SE" ASC; SELECT * FROM table ORDER BY...
I had this query:
SELECT * FROM table ORDER BY label ASC;
Since the labels are not in English, they didn't get sorted in the right order (ones beginning with "ö" were not in the bottom/end).
I therefore tried:
SELECT * FROM table ORDER BY label COLLATE "sv-SE" ASC;
SELECT * FROM table ORDER BY label COLLATE "sv_SE" ASC;
These gave errors about those collations not existing, which baffled me.
After a bunch of searching, I figured out to do:
SELECT * FROM pg_collation;
Which revealed that it's supposed to be:
sv-SE-x-icu
It worked when I used that identifier, but what's with the "-x-icu" stuff? What's that all about? I hate that they always have to mess with the standard locale identifiers so you can never rely on just the standard "language_location" format.
Tadhg Mocniak
(69 rep)
Feb 21, 2020, 04:30 PM
• Last activity: Apr 30, 2020, 05:30 AM
3
votes
1
answers
3516
views
Does PostgreSQL support ICU collation's options and settings?
ICU Specifies different [LDML Collation Settings](http://www.unicode.org/reports/tr35/tr35-collation.html#Setting_Options). Some of them seem pretty interesting, especially the ones on case and accent, > * “Ignore accents”: `strength=primary` > * “Ignore accents” but take case into account: `strengt...
ICU Specifies different [LDML Collation Settings](http://www.unicode.org/reports/tr35/tr35-collation.html#Setting_Options) . Some of them seem pretty interesting, especially the ones on case and accent,
> * “Ignore accents”:
strength=primary
> * “Ignore accents” but take case into account: strength=primary caseLevel=on
> * “Ignore case”: strength=secondary
> * “Ignore punctuation” (completely): strength=tertiary alternate=shifted
> * “Ignore punctuation” but distinguish among punctuation marks: strength=quaternary alternate=shifted
potentially a better method of doing what
You can also see these [documented here](http://userguide.icu-project.org/collation/customization) . Are these ICU options and settings possible with [PostgreSQL 10 ICU collation support](https://www.postgresql.org/docs/10/static/collation.html) ?
CREATE COLLATION special (provider = icu, locale = 'en@strength=primary');
SELECT 'Å' LIKE 'A' COLLATE "special"; # returns false
I've also tried the [CLDR BCP47](http://cldr.unicode.org/index/bcp47-extension)
> Starting with ICU 54, collation attributes can be specified via locale keywords as well, in the old locale extension syntax ("el@colCaseFirst=upper
") or in language tag syntax ("el-u-kf-upper"). Keywords and values are case-insensitive. See the [LDML Collation spec, Collation Settings](http://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Settings) , and the [data file](http://unicode.org/repos/cldr/trunk/common/bcp47/collation.xml) listing the valid collation keywords and their values. (The deprecated attributes kh/colHiraganaQuaternary and vt/variableTop are not supported.)
For that, this looked right
CREATE COLLATION special (provider = icu, locale = 'en-ks-level1');
SELECT 'Å' LIKE 'A' COLLATE "special"; # returns false
Also tried en-u-ks-level1
That method seems to be what the docs go for,
CREATE COLLATION german_phonebook (provider = icu, locale = 'de-u-co-phonebk');
Evan Carroll
(65502 rep)
Nov 28, 2017, 04:18 PM
• Last activity: Oct 4, 2019, 12:21 PM
2
votes
1
answers
248
views
International Components for Unicode (ICU) Date Formatter for MySQL?
Regarding to [ICU - International Components for Unicode][1] definition: > ICU is a mature, widely used set of C/C++ and Java libraries providing Unicode and Globalization support for software applications. ICU is widely portable and gives applications the same results on all platforms and between C...
Regarding to ICU - International Components for Unicode definition:
> ICU is a mature, widely used set of C/C++ and Java libraries providing Unicode and Globalization support for software applications. ICU is widely portable and gives applications the same results on all platforms and between C/C++ and Java software.
one of important part is formatting:
> Formatting: Format numbers, dates, times and currency amounts according the conventions of a chosen locale. This includes translating month and day names into the selected language, choosing appropriate abbreviations, ordering fields correctly, etc. This data also comes from the Common Locale Data Repository.
which implemented in vary languages; I would like to know if there is any implementation or work for MySQL to add this feature for Unicode date formatting?
Hamid
(21 rep)
Apr 23, 2017, 07:04 AM
• Last activity: Jul 13, 2018, 09:56 PM
8
votes
1
answers
4592
views
Specify ICU collations as `ENCODING`, `LC_COLLATE`, and `LC_CTYPE` in Postgres 10
Postgres 10 gains the ability to use [*International Components for Unicode (ICU)*](http://site.icu-project.org/) collations rather than depending on host OS implementations. See [*More robust collations with ICU support in PostgreSQL 10*](https://blog.2ndquadrant.com/icu-support-postgresql-10/) by...
Postgres 10 gains the ability to use [*International Components for Unicode (ICU)*](http://site.icu-project.org/) collations rather than depending on host OS implementations. See [*More robust collations with ICU support in PostgreSQL 10*](https://blog.2ndquadrant.com/icu-support-postgresql-10/) by Peter Eisentraut.
So how exactly does one specify an ICU collation when creating the database?
I did read a tip in the
CREATE DATABASE
to use template0
rather than template1
if template1
might have been created using encoding and collation other than what you want for your new database. And when using template0
, you need to enable data connections to override the default.
And I read in Postgres 10 doc page for [*Collation Support*](https://www.postgresql.org/docs/10/static/collation.html) that:
>und-x-icu (for “undefined”)
>
>ICU “root” collation. Use this to get a reasonable language-agnostic sort order.
So it seems und-x-icu
might be a good value to use when establishing a new database that will include multi-lingual text. When a certain language is known to focus on a particular language, the SELECT
command can specify another collation such as German de-x-icu
. Otherwise, fallback to the default of und-x-icu
.
So I tried the following SQL to create a database in Postgres 10 Beta 2.
CREATE DATABASE timepiece_
TEMPLATE 'template0'
ALLOW_CONNECTIONS TRUE
CONNECTION LIMIT -1
ENCODING 'UTF8'
LC_COLLATE 'und-x-icu'
LC_CTYPE 'und-x-icu'
;
Fails with an error:
> ERROR: invalid locale name: "und-x-icu"
So I stuck on an appended .utf8
as seen in the doc for the old-fashioned collation syntax.
CREATE DATABASE timepiece_
TEMPLATE 'template0'
ALLOW_CONNECTIONS TRUE
CONNECTION LIMIT -1
ENCODING 'UTF8'
LC_COLLATE 'und-x-icu.utf8'
LC_CTYPE 'und-x-icu.utf8'
;
Fails just the same:
> ERROR: invalid locale name: "und-x-icu.utf8"
Was my Postgres cluster built with ICU libraries? Here is the result of calling pg_config
. The last line is 'ICU_LIBS=-L/opt/local/Current/lib -licuuc -licudata -licui18n'
. I guess that means my build includes the ICU libraries but I am not sure.
>/Library/PostgreSQL/10Beta2/bin/pg_config --configure
>
>'--with-icu' '--prefix=/mnt/hgfs/pginstaller.pune/server/staging_cache/osx' '--with-ldap' '--with-openssl' '--with-perl' '--with-python' '--with-tcl' '--with-bonjour' '--with-pam' '--enable-thread-safety' '--with-libxml' '--with-uuid=e2fs' '--with-includes=/opt/local/Current/include/libxml2:/opt/local/Current/include:/opt/local/Current/include/security' '--docdir=/mnt/hgfs/pginstaller.pune/server/staging_cache/osx/doc/postgresql' '--with-libxslt' '--with-libedit-preferred' '--with-gssapi' 'CFLAGS=-isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk -mmacosx-version-min=10.8 -arch i386 -arch x86_64 -O2' 'LDFLAGS=-L/opt/local/Current/lib' 'ICU_CFLAGS=-I/opt/local/Current/include' 'ICU_LIBS=-L/opt/local/Current/lib -licuuc -licudata -licui18n'
➠ How does one specify the ENCODING
, LC_COLLATE
, and LC_TYPE
to use the new ICU collations?
➠ What ICU collation names you would suggest for English-oriented databases? For Western European multi-lingual databases?
➠ What are the possible ICU collation names?
Basil Bourque
(11188 rep)
Aug 13, 2017, 05:15 AM
• Last activity: Aug 23, 2017, 02:22 PM
3
votes
1
answers
2012
views
Was my Postgres cluster built with the ICU libraries available for Postgres 10 and later?
I wonder if my installation of Postgres 10 Beta 2 has been built to include the new [*International Components for Unicode (ICU)*](http://site.icu-project.org/) collations. For background info, see [*More robust collations with ICU support in PostgreSQL 10*](https://blog.2ndquadrant.com/icu-support-...
I wonder if my installation of Postgres 10 Beta 2 has been built to include the new [*International Components for Unicode (ICU)*](http://site.icu-project.org/) collations. For background info, see [*More robust collations with ICU support in PostgreSQL 10*](https://blog.2ndquadrant.com/icu-support-postgresql-10/) by Peter Eisentraut.
I used the
pg_config
utility to detect the build info. When running pg_config --configure
I get output where the last line is 'ICU_LIBS=-L/opt/local/Current/lib -licuuc -licudata -licui18n'
.
➠ Does this mean I have the ICU libraries installed, or not?
>/Library/PostgreSQL/10Beta2/bin/pg_config --configure
>
>'--with-icu' '--prefix=/mnt/hgfs/pginstaller.pune/server/staging_cache/osx' '--with-ldap' '--with-openssl' '--with-perl' '--with-python' '--with-tcl' '--with-bonjour' '--with-pam' '--enable-thread-safety' '--with-libxml' '--with-uuid=e2fs' '--with-includes=/opt/local/Current/include/libxml2:/opt/local/Current/include:/opt/local/Current/include/security' '--docdir=/mnt/hgfs/pginstaller.pune/server/staging_cache/osx/doc/postgresql' '--with-libxslt' '--with-libedit-preferred' '--with-gssapi' 'CFLAGS=-isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk -mmacosx-version-min=10.8 -arch i386 -arch x86_64 -O2' 'LDFLAGS=-L/opt/local/Current/lib' 'ICU_CFLAGS=-I/opt/local/Current/include' 'ICU_LIBS=-L/opt/local/Current/lib -licuuc -licudata -licui18n'
➠ Is there another way to detect if my Postgres has the ICU collations available? Some way via SQL?
When I execute SELECT * FROM pg_collation ;
, of the 845 locale names listed, 575 end in -x-icu
. Does that happen only if the ICU libraries are in place?
Basil Bourque
(11188 rep)
Aug 13, 2017, 07:29 PM
• Last activity: Aug 14, 2017, 12:45 PM
Showing page 1 of 12 total questions