Specifying Character Sets With The Curl Command
0
votes
1
answer
2683
views
I am attempting to extract a list of Chinese characters from https://lingua.mtsu.edu/chinese-computing/statistics/char/list.php?Which=MO to make a bash script. However, when I ran
curl -o list.txt https://lingua.mtsu.edu/chinese-computing/statistics/char/list.php?Which=MO
I realised that curl is using UTF-8 encoding instead of the GB2312 encoding which the website uses, changing the Chinese characters into random characters. So my question becomes this: how do I change the encoding that curl uses to download the HTML?
output of
curl --version
curl 8.0.1 (x86_64-pc-linux-gnu) libcurl/8.0.1 OpenSSL/3.0.8 zlib/1.2.13 brotli/1.0.9 zstd/1.5.5 libidn2/2.3.4 libpsl/0.21.2 (+libidn2/2.3.4) libssh2/1.10.0 nghttp2/1.52.0
Release-Date: [unreleased]
Protocols: dict file ftp ftps gopher gophers http https imap imaps mqtt pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS brotli GSS-API HSTS HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz NTLM NTLM_WB PSL SPNEGO SSL threadsafe TLS-SRP UnixSockets zstd
(I've noticed that this is missing the CharConv
feature mentioned in the manual page)
Asked by James Norman
(1 rep)
Apr 27, 2023, 07:16 AM
Last activity: Apr 23, 2025, 11:09 PM
Last activity: Apr 23, 2025, 11:09 PM