Trigram similarity (pg_trgm) with German umlauts
3
votes
1
answer
1913
views
I try to figure out how to improve Postgres 10.6 pg_trgm queries with German umlauts (äöü). In german 'ö' can be written as 'oe'. But beware: not every 'oe' can be written as 'ö'.
CREATE TABLE public.names
(name text COLLATE pg_catalog."default");
CREATE INDEX names_idx
ON public.names USING gin (name COLLATE pg_catalog."default" gin_trgm_ops);
SHOW LC_COLLATE; -- de_DE.UTF-8
When I use the [similarity()
](https://www.postgresql.org/docs/10/pgtrgm.html#id-1.11.7.41.5)
function to query the similarity for ***'Schoenstraße'***.
SELECT
name,
similarity (name, 'Schoenstraße') AS similarity,
show_trgm (name)
FROM
names
WHERE
name % 'Schoenstraße'
ORDER BY
similarity DESC;
I get the following result:
Name similarity show_trgm
Schyrenstraße 0.588235 {0x9a07c3,0xde3801,"" s"","" sc"",chy,ens,hyr,nst,ren,sch,str,tra,0x76a40e,yre}
Schönstraße 0.5625 {0x9a07c3,0xde3801,0xf00320,0x095f29,"" s"","" sc"",0x6deea5,nst,sch,str,tra,0x76a40e}
dbfiddle [here](https://dbfiddle.uk/?rdbms=postgres_10&fiddle=58f8436a5096a779f358572e615bf11f)
Is there anything I can do to improve that or do I need to replace all umlauts in the DB?
Asked by Stephan
(143 rep)
Jul 22, 2020, 09:35 AM
Last activity: Aug 1, 2023, 05:52 PM
Last activity: Aug 1, 2023, 05:52 PM