Sample Header Ad - 728x90

Specifying Character Sets With The Curl Command

0 votes
1 answer
2683 views
I am attempting to extract a list of Chinese characters from https://lingua.mtsu.edu/chinese-computing/statistics/char/list.php?Which=MO to make a bash script. However, when I ran
curl -o list.txt https://lingua.mtsu.edu/chinese-computing/statistics/char/list.php?Which=MO 
I realised that curl is using UTF-8 encoding instead of the GB2312 encoding which the website uses, changing the Chinese characters into random characters. So my question becomes this: how do I change the encoding that curl uses to download the HTML? output of
curl --version

curl 8.0.1 (x86_64-pc-linux-gnu) libcurl/8.0.1 OpenSSL/3.0.8 zlib/1.2.13 brotli/1.0.9 zstd/1.5.5 libidn2/2.3.4 libpsl/0.21.2 (+libidn2/2.3.4) libssh2/1.10.0 nghttp2/1.52.0
Release-Date: [unreleased]
Protocols: dict file ftp ftps gopher gophers http https imap imaps mqtt pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS brotli GSS-API HSTS HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz NTLM NTLM_WB PSL SPNEGO SSL threadsafe TLS-SRP UnixSockets zstd
(I've noticed that this is missing the CharConv feature mentioned in the manual page)
Asked by James Norman (1 rep)
Apr 27, 2023, 07:16 AM
Last activity: Apr 23, 2025, 11:09 PM