Sample Header Ad - 728x90

How to filter extraneous Unicode values from a column?

0 votes
0 answers
37 views
I am cleaning up / priming our PostgreSQL 12 database for future data-related activities (e.g. data encryption). I have tried the following methods to delete non-basic Latin / basic accented Latin / punctuational values from one of our tables: - regexp_replace(field, '[^[:graph:]]', '', 'g') and SIMILAR TO '%[^[:graph:]]%' (in our implementation, [:print:] is not working in regexp_replace) - btrim(field, '') - not working for Unicode character-containing strings but only Latin strings. However, the extraneous Unicode characters, even those embedded within acceptable (i.e., Latin and punctuational) values (e.g. \ud83d) do not get filtered / deleted from the values, hence, I couldn't prime the data. What can I do to filter out only unacceptable Unicode values in regexp_replace and delete them, retaining only acceptable characters?
Asked by Bona Rae Villarta (1 rep)
Aug 31, 2023, 02:10 AM