How to filter extraneous Unicode values from a column?
0
votes
0
answers
37
views
I am cleaning up / priming our PostgreSQL 12 database for future data-related activities (e.g. data encryption). I have tried the following methods to delete non-basic Latin / basic accented Latin / punctuational values from one of our tables:
-
regexp_replace(field, '[^[:graph:]]', '', 'g')
and SIMILAR TO '%[^[:graph:]]%'
(in our implementation, [:print:]
is not working in regexp_replace
)
- btrim(field, '')
- not working for Unicode character-containing strings but only Latin strings.
However, the extraneous Unicode characters, even those embedded within acceptable (i.e., Latin and punctuational) values (e.g. \ud83d
) do not get filtered / deleted from the values, hence, I couldn't prime the data.
What can I do to filter out only unacceptable Unicode values in regexp_replace
and delete them, retaining only acceptable characters?
Asked by Bona Rae Villarta
(1 rep)
Aug 31, 2023, 02:10 AM