My input data is as follows (as generated by hexdump):
000000f0 69 61 6e e2 80 99 73 20 65 79 65 73 20 61 62 72 |ian...s eyes abr|
When I open this html () file in Firefox, it displays these characters as:
ian’s eyes abr
According to the link https://superuser.com/questions/1237545/characters-in-email-displayed-like-e2-80-99 , "E2 80 99 is the sequence of hex values that encode a right single quotation mark (’) in UTF-8".
This website concurs: https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128
When I run this iconv command on the file containing these characters:
iconv -f UTF-8 -t ISO-8859-15 test_chapter.html > blah.html
I get the output:
iconv: illegal input sequence at position 243
and the content of "blah.html" is truncated exactly where the apostrophe would be.
So, to summarise, the internet says that is a valid sequence of bytes for UTF-8, but iconv disagrees.
Can anyone please help me understand what is going on. Is this a bug in iconv?
As a side note, when I use this html file with kindlegen to generate an AZW file, the character is not displayed correctly. All the internet can tell me is that I need to convert the file to UTF-8, but as far as I can tell, it already is!
Asked by AlastairG
(213 rep)
Jan 6, 2025, 03:43 PM
Last activity: Jan 11, 2025, 12:48 AM
Last activity: Jan 11, 2025, 12:48 AM