How to filter extraneous Unicode values from a column?

0 votes

0 answers

37 views

                          I am cleaning up / priming our PostgreSQL 12 database for future data-related activities (e.g. data encryption). I have tried the following methods to delete non-basic Latin / basic accented Latin / punctuational values from one of our tables:

- regexp_replace(field, '[^[:graph:]]', '', 'g') and SIMILAR TO '%[^[:graph:]]%' (in our implementation, [:print:] is not working in regexp_replace)
- btrim(field, '') - not working for Unicode character-containing strings but only Latin strings.

However, the extraneous Unicode characters, even those embedded within acceptable (i.e., Latin and punctuational) values (e.g. \ud83d) do not get filtered / deleted from the values, hence, I couldn't prime the data.

What can I do to filter out only unacceptable Unicode values in regexp_replace and delete them, retaining only acceptable characters?

Asked by Bona Rae Villarta (1 rep)

Aug 31, 2023, 02:10 AM

How to filter extraneous Unicode values from a column?

Related Questions