Sample Header Ad - 728x90

Postgres full text search with unaccent and inflection (conjugation, etc.)

6 votes
2 answers
5478 views
I want to be able to search unaccented phrases in an inflected (Polish) language in Postgres. Say, if a document contains robiłem, the lexeme should be robić (the infinivite). Its forms are robię, robił, robiła and so on. I want to be able to find it, for example, with a phrase robie which is unaccented robię. **What I did** is I started out with a perfectly well working polish text search config CREATE TEXT SEARCH DICTIONARY polish_ispell ( TEMPLATE = pg_catalog.ispell, dictfile = 'polish', afffile = 'polish', stopwords = 'polish' ); Then I tried to extend it to include the unaccent. create extension unaccent; create text search configuration polish_unaccented (copy = polish); ALTER TEXT SEARCH CONFIGURATION polish_unaccented ALTER MAPPING FOR hword, hword_part, word WITH unaccen, polish_ispell, simple, ; Sadly, lexems are not created correctly with this config: select to_tsvector('polish_unaccented' ,'robił'); 'robil':1 The lexem should be of course: 'robić':1 So the below cant return true (and that's what I need I think): select to_tsvector('polish_unaccented','robić') @@ to_tsquery('polish_unaccented','robie'); I've googled but did not find any documents showing how to really configure Postgres for my case. The docs only show the lame 'Hôtels' example, which is not a 'lexemed' word. Cheers
Asked by Tomek (61 rep)
Jun 16, 2017, 02:42 PM
Last activity: Nov 15, 2017, 04:04 AM