Sample Header Ad - 728x90

Trigram similarity (pg_trgm) with German umlauts

3 votes
1 answer
1913 views
I try to figure out how to improve Postgres 10.6 pg_trgm queries with German umlauts (äöü). In german 'ö' can be written as 'oe'. But beware: not every 'oe' can be written as 'ö'.
CREATE TABLE public.names
  (name text COLLATE pg_catalog."default");

CREATE INDEX names_idx
    ON public.names USING gin (name COLLATE pg_catalog."default" gin_trgm_ops);
SHOW LC_COLLATE; -- de_DE.UTF-8
When I use the [similarity()](https://www.postgresql.org/docs/10/pgtrgm.html#id-1.11.7.41.5) function to query the similarity for ***'Schoenstraße'***.
SELECT
	name,
	similarity (name, 'Schoenstraße') AS similarity,
	show_trgm (name)
FROM
	names
WHERE
	name % 'Schoenstraße'
ORDER BY
	similarity DESC;
I get the following result:
Name			similarity	show_trgm

Schyrenstraße	0.588235	{0x9a07c3,0xde3801,""  s"","" sc"",chy,ens,hyr,nst,ren,sch,str,tra,0x76a40e,yre}
Schönstraße		0.5625		{0x9a07c3,0xde3801,0xf00320,0x095f29,""  s"","" sc"",0x6deea5,nst,sch,str,tra,0x76a40e}
dbfiddle [here](https://dbfiddle.uk/?rdbms=postgres_10&fiddle=58f8436a5096a779f358572e615bf11f) Is there anything I can do to improve that or do I need to replace all umlauts in the DB?
Asked by Stephan (143 rep)
Jul 22, 2020, 09:35 AM
Last activity: Aug 1, 2023, 05:52 PM