How do I properly convert the file to UTF-16LE encoding without strange characters appearing in the file?
4
votes
1
answer
4063
views
I'm having some peculiarities with the dictionary file of .dsl format I'm trying to convert. It's essentially a text file with the dictionary pairs. The dictionary software I use is GoldenDict. It requires UTF-16 dictionaries so they render properly.
All the dictionaries I have are UTF-16LE format. There is one standing out however. It has iso-8859-1 encoding. An entry looks like this when I open it with vim:
abandonarse
[m2][c crimson][b]Sinónimos[/b][/c][/m]
[m2][i][c green]verbo[/c][/i][/m]
[m1][trn][b]desanimarse:[/b] >, >, >, >, >, >[/trn][/m]
I have to convert it to UTF-16LE because Goldendict renders some Cyrillic characters instead of Spanish accented characters.
Then I try:
iconv -f iso-8859-1 -t utf-16le dictionary.dsl -o test.dsl
The new test.dsl dictionary is rendered correctly by Goldendict, however I can see some peculiar things I would love to get rid of. First is that the just converted file's encoding is not recognized as it usually is with the other dictionaries:
aleksandr@desktop:~/windoc/Dic/Es extra/dictionary.dsl> file dictionary.dsl
dictionary: data
When I open the file test.dsl with vim every character inside has ^@ added to it. Here is the example of the same entry:
^@^@>^@,^@ ^@^@>^@[^@/^@t^@r^@n^@]^@[^@/^@m^@]^@
^@ ^@[^@m^@2^@]^@[^@c^@ ^@c^@r^@i^@m^@s^@o^@n^@]^@[^@b^@]^@A^@n^@t^@ó^@n^@i^@m^@o^@s^@[^@/^@b^@]^@[^@/^@c^@]^@[^@/^@m^@]^@
^@ ^@[^@m^@2^@]^@[^@i^@]^@[^@c^@ ^@g^@r^@e^@e^@n^@]^@v^@e^@r^@b^@o^@[^@/^@c^@]^@[^@/^@i^@]^@[^@/^@m^@]^@
I tried removing this characters in vim
%s///g
However, then I save the file, it has the encoding iso-8859-1 again.
I would like to have this file to be show without ^@ characters, because I may need to edit some headings in the dictionary manually.
Asked by user7748093
(43 rep)
Sep 8, 2020, 03:14 PM
Last activity: Sep 9, 2020, 04:46 PM
Last activity: Sep 9, 2020, 04:46 PM