Sample Header Ad - 728x90

Why does awk's printf interpret character values greater than 127 as multi-byte characters?

2 votes
2 answers
1939 views
The ASCII character range is from 0 to 127, and within that range, awk's printf with the %c format specifier outputs one byte of data:
$ awk 'BEGIN{printf "%c", 97}'
a

$ awk 'BEGIN{printf "%c", 127}' | xxd
00000000: 7f

$ awk 'BEGIN{printf "%c", 127}' | xxd -b
00000000: 01111111
But for values greater than 127, it will print out multiple bytes:
$ awk 'BEGIN{printf "%c", 128}' | xxd
00000000: c280

$ awk 'BEGIN{printf "%c", 128}' | xxd -b
00000000: 11000010 10000000
What is the significance of 0xc280, and why does awk output that character instead of 0x80?
Asked by Sam (132 rep)
Jun 18, 2019, 02:52 PM
Last activity: Jan 25, 2022, 12:24 PM