Postgres full text search with unaccent and inflection (conjugation, etc.)
6
votes
2
answers
5478
views
I want to be able to search unaccented phrases in an inflected (Polish) language in Postgres.
Say, if a document contains
robiłem
, the lexeme should be robić
(the infinivite). Its forms are robię
, robił
, robiła
and so on. I want to be able to find it, for example, with a phrase robie
which is unaccented robię
.
**What I did** is I started out with a perfectly well working polish text search config
CREATE TEXT SEARCH DICTIONARY polish_ispell (
TEMPLATE = pg_catalog.ispell,
dictfile = 'polish', afffile = 'polish', stopwords = 'polish' );
Then I tried to extend it to include the unaccent
.
create extension unaccent;
create text search configuration polish_unaccented (copy = polish);
ALTER TEXT SEARCH CONFIGURATION polish_unaccented ALTER MAPPING FOR hword,
hword_part, word WITH unaccen, polish_ispell, simple, ;
Sadly, lexems are not created correctly with this config:
select to_tsvector('polish_unaccented' ,'robił');
'robil':1
The lexem should be of course:
'robić':1
So the below cant return true (and that's what I need I think):
select to_tsvector('polish_unaccented','robić') @@ to_tsquery('polish_unaccented','robie');
I've googled but did not find any documents showing how to really configure Postgres for my case. The docs only show the lame 'Hôtels' example, which is not a 'lexemed' word.
Cheers
Asked by Tomek
(61 rep)
Jun 16, 2017, 02:42 PM
Last activity: Nov 15, 2017, 04:04 AM
Last activity: Nov 15, 2017, 04:04 AM