Sample Header Ad - 728x90

Specify ICU collations as `ENCODING`, `LC_COLLATE`, and `LC_CTYPE` in Postgres 10

8 votes
1 answer
4593 views
Postgres 10 gains the ability to use [*International Components for Unicode (ICU)*](http://site.icu-project.org/) collations rather than depending on host OS implementations. See [*More robust collations with ICU support in PostgreSQL 10*](https://blog.2ndquadrant.com/icu-support-postgresql-10/) by Peter Eisentraut. So how exactly does one specify an ICU collation when creating the database? I did read a tip in the CREATE DATABASE to use template0 rather than template1 if template1 might have been created using encoding and collation other than what you want for your new database. And when using template0, you need to enable data connections to override the default. And I read in Postgres 10 doc page for [*Collation Support*](https://www.postgresql.org/docs/10/static/collation.html) that: >und-x-icu (for “undefined”) > >ICU “root” collation. Use this to get a reasonable language-agnostic sort order. So it seems und-x-icu might be a good value to use when establishing a new database that will include multi-lingual text. When a certain language is known to focus on a particular language, the SELECT command can specify another collation such as German de-x-icu. Otherwise, fallback to the default of und-x-icu. So I tried the following SQL to create a database in Postgres 10 Beta 2. CREATE DATABASE timepiece_ TEMPLATE 'template0' ALLOW_CONNECTIONS TRUE CONNECTION LIMIT -1 ENCODING 'UTF8' LC_COLLATE 'und-x-icu' LC_CTYPE 'und-x-icu' ; Fails with an error: > ERROR: invalid locale name: "und-x-icu" So I stuck on an appended .utf8 as seen in the doc for the old-fashioned collation syntax. CREATE DATABASE timepiece_ TEMPLATE 'template0' ALLOW_CONNECTIONS TRUE CONNECTION LIMIT -1 ENCODING 'UTF8' LC_COLLATE 'und-x-icu.utf8' LC_CTYPE 'und-x-icu.utf8' ; Fails just the same: > ERROR: invalid locale name: "und-x-icu.utf8" Was my Postgres cluster built with ICU libraries? Here is the result of calling pg_config . The last line is 'ICU_LIBS=-L/opt/local/Current/lib -licuuc -licudata -licui18n'. I guess that means my build includes the ICU libraries but I am not sure. >/Library/PostgreSQL/10Beta2/bin/pg_config --configure > >'--with-icu' '--prefix=/mnt/hgfs/pginstaller.pune/server/staging_cache/osx' '--with-ldap' '--with-openssl' '--with-perl' '--with-python' '--with-tcl' '--with-bonjour' '--with-pam' '--enable-thread-safety' '--with-libxml' '--with-uuid=e2fs' '--with-includes=/opt/local/Current/include/libxml2:/opt/local/Current/include:/opt/local/Current/include/security' '--docdir=/mnt/hgfs/pginstaller.pune/server/staging_cache/osx/doc/postgresql' '--with-libxslt' '--with-libedit-preferred' '--with-gssapi' 'CFLAGS=-isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk -mmacosx-version-min=10.8 -arch i386 -arch x86_64 -O2' 'LDFLAGS=-L/opt/local/Current/lib' 'ICU_CFLAGS=-I/opt/local/Current/include' 'ICU_LIBS=-L/opt/local/Current/lib -licuuc -licudata -licui18n' ➠ How does one specify the ENCODING, LC_COLLATE, and LC_TYPE to use the new ICU collations? ➠ What ICU collation names you would suggest for English-oriented databases? For Western European multi-lingual databases? ➠ What are the possible ICU collation names?
Asked by Basil Bourque (11188 rep)
Aug 13, 2017, 05:15 AM
Last activity: Aug 23, 2017, 02:22 PM