Sample Header Ad - 728x90

iconv fails to detect valid utf-8 character as utf-8

5 votes
2 answers
684 views
My input data is as follows (as generated by hexdump): 000000f0 69 61 6e e2 80 99 73 20 65 79 65 73 20 61 62 72 |ian...s eyes abr| When I open this html () file in Firefox, it displays these characters as: ian’s eyes abr According to the link https://superuser.com/questions/1237545/characters-in-email-displayed-like-e2-80-99 , "E2 80 99 is the sequence of hex values that encode a right single quotation mark (’) in UTF-8". This website concurs: https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128 When I run this iconv command on the file containing these characters: iconv -f UTF-8 -t ISO-8859-15 test_chapter.html > blah.html I get the output: iconv: illegal input sequence at position 243 and the content of "blah.html" is truncated exactly where the apostrophe would be. So, to summarise, the internet says that is a valid sequence of bytes for UTF-8, but iconv disagrees. Can anyone please help me understand what is going on. Is this a bug in iconv? As a side note, when I use this html file with kindlegen to generate an AZW file, the character is not displayed correctly. All the internet can tell me is that I need to convert the file to UTF-8, but as far as I can tell, it already is!
Asked by AlastairG (213 rep)
Jan 6, 2025, 03:43 PM
Last activity: Jan 11, 2025, 12:48 AM