working with filenames in a different encoding over ssh
8
votes
2
answers
10647
views
I'm ssh'ing to a remote system where a different encoding for the filenames (and for the users' locales) has been used. And this causes some problems.
### Problems solved by matching the locale settings
Before I move to the problems with filenames, I want to say that some encoding problems with such an ssh session [are solved by setting the remote locale so that it matches the local locale](https://unix.stackexchange.com/q/16784/4319) , namely,
* the problems with editing the command line (I pressed Backspace trice, but since on my host the encoding is UTF-8, and on the remote end -- KOI8-R, or perhaps CP1251, some 8-bit Cyrillic encodings, this didn't affect my Cyrillic string correctly):
[imz@localhost ~]$ locale
LANG=ru_RU.UTF-8
LC_CTYPE="ru_RU.UTF-8"
LC_NUMERIC="ru_RU.UTF-8"
LC_TIME="ru_RU.UTF-8"
LC_COLLATE="ru_RU.UTF-8"
LC_MONETARY="ru_RU.UTF-8"
LC_MESSAGES="ru_RU.UTF-8"
LC_PAPER="ru_RU.UTF-8"
LC_NAME="ru_RU.UTF-8"
LC_ADDRESS="ru_RU.UTF-8"
LC_TELEPHONE="ru_RU.UTF-8"
LC_MEASUREMENT="ru_RU.UTF-8"
LC_IDENTIFICATION="ru_RU.UTF-8"
LC_ALL=
[imz@localhost ~]$ echo привет
привет
[imz@localhost ~]$ echo при
при
[imz@localhost ~]$ ssh -vv ivan@example.com
Last login: Fri Nov 25 13:44:56 2011 from NN.NN.NN.NN
[ivan@dell ~]$ locale
LANG=ru_RU.KOI8-R
LC_CTYPE="ru_RU.KOI8-R"
LC_NUMERIC="ru_RU.KOI8-R"
LC_TIME="ru_RU.KOI8-R"
LC_COLLATE="ru_RU.KOI8-R"
LC_MONETARY="ru_RU.KOI8-R"
LC_MESSAGES=POSIX
LC_PAPER="ru_RU.KOI8-R"
LC_NAME="ru_RU.KOI8-R"
LC_ADDRESS="ru_RU.KOI8-R"
LC_TELEPHONE="ru_RU.KOI8-R"
LC_MEASUREMENT="ru_RU.KOI8-R"
LC_IDENTIFICATION="ru_RU.KOI8-R"
LC_ALL=
[ivan@dell ~]$ echo привет
привет
[ivan@dell ~]$ echo при
привÐ
[ivan@dell ~]$ export LANG=ru_RU.UTF-8
[ivan@dell ~]$ echo привет
привет
[ivan@dell ~]$ echo при
при
[ivan@dell ~]$
* the problem with correct understanding of case-insensitivity for the strings processed; now it would work, after I set the locale:
[ivan@dell ~]$ echo привет | fgrep -i ВЕТ
привет
[ivan@dell ~]$
but this wouldn't work before.
### Minor problems with filenames
The utilities that print out filenames (which, as you remember, are stored remotely in a different encoding) wouldn't print them verbatim, but they susbstitute question marks for the foreign characters:
[ivan@dell ~]$ find ~mama/Desktop/ -iname '*.xls'
/home/mama/Desktop/????????? ????????.xls
/home/mama/Desktop/???????? ??? ???????????? (1).xls
/home/mama/Desktop/???????? ??? ???????????? (2).xls
/home/mama/Desktop/???????? ??? ???????????? (3).xls
/home/mama/Desktop/???????? ??? ????????????.xls
[ivan@dell ~]$ find ~mama/Desktop/ -iname '*.xls' -print
/home/mama/Desktop/????????? ????????.xls
/home/mama/Desktop/???????? ??? ???????????? (1).xls
/home/mama/Desktop/???????? ??? ???????????? (2).xls
/home/mama/Desktop/???????? ??? ???????????? (3).xls
/home/mama/Desktop/???????? ??? ????????????.xls
[ivan@dell ~]$
The same problem is exhibited by
ls
, and so on. But this can be easily overcome by passing them as strings to printing commands (that are not aware of the issue with non-matching encodings of the filenames and of the terminal--or for whatever reason, but it works):
[ivan@dell ~]$ find ~mama/Desktop/ -iname '*.xls' -print0 | xargs -0 -n 1 echo
/home/mama/Desktop/Êðåäèòíûé ïîðòôåëü.xls
/home/mama/Desktop/ÀÄÐÅÑÀÒÛ äëÿ ïîçäðàâëåíèé (1).xls
/home/mama/Desktop/ÀÄÐÅÑÀÒÛ äëÿ ïîçäðàâëåíèé (2).xls
/home/mama/Desktop/ÀÄÐÅÑÀÒÛ äëÿ ïîçäðàâëåíèé (3).xls
/home/mama/Desktop/ÀÄÐÅÑÀÒÛ äëÿ ïîçäðàâëåíèé.xls
[ivan@dell ~]$
Also, the fact that they are unreadable wasn't very annoying, because I could always append an | recode -f cp1251..utf-8
at the end of the command.
## The annoying problem
The essential problem is that selecting (with a mouse) the filenames in the terminal and pasting them doesn't work:
[ivan@dell ~]$ diff '/home/mama/Desktop/ÀÄÐÅÑÀÒÛ äëÿ ïîçäðàâëåíèé (1).xls' '/home/mama/Desktop/ÀÄÐÅÑÀÒÛ äëÿ ïîçäðàâëåíèé (3).xls'
diff: /home/mama/Desktop/ÀÄÐÅÑÀÒÛ äëÿ ïîçäðàâëåíèé (1).xls: No such file or directory
diff: /home/mama/Desktop/ÀÄÐÅÑÀÒÛ äëÿ ïîçäðàâëåíèé (3).xls: No such file or directory
[ivan@dell ~]$
I've noticed an escaped representation of the filenames in the output of stat
, so I was able to select and paste it (inside $''
in *bash*):
[ivan@dell ~]$ diff '/home/mama/Desktop/\300\304\320\305\321\300\322\333 \344\353\377 \357\356\347\344\360\340\342\353\345\355\350\351 (1).xls' '/home/mama/Desktop/\300\304\320\305\321\300\322\333 \344\353\377 \357\356\347\344\360\340\342\353\345\355\350\351 (3).xls'
diff: /home/mama/Desktop/\300\304\320\305\321\300\322\333 \344\353\377 \357\356\347\344\360\340\342\353\345\355\350\351 (1).xls: No such file or directory
diff: /home/mama/Desktop/\300\304\320\305\321\300\322\333 \344\353\377 \357\356\347\344\360\340\342\353\345\355\350\351 (3).xls: No such file or directory
[ivan@dell ~]$ diff $'/home/mama/Desktop/\300\304\320\305\321\300\322\333 \344\353\377 \357\356\347\344\360\340\342\353\345\355\350\351 (1).xls' $'/home/mama/Desktop/\300\304\320\305\321\300\322\333 \344\353\377 \357\356\347\344\360\340\342\353\345\355\350\351 (3).xls'
Files /home/mama/Desktop/ÀÄÐÅÑÀÒÛ äëÿ ïîçäðàâëåíèé (1).xls and /home/mama/Desktop/ÀÄÐÅÑÀÒÛ äëÿ ïîçäðàâëåíèé (3).xls differ
[ivan@dell ~]$
So, the question is:
> How to conveniently work with remote filenames (over *ssh*), which are
> in a different encoding?
It would be nice if they were readable, and selectable and pastable (and also typable by me from the keyboard and then completable by Tab in *bash*; to be typable by me conveniently, they must be readable, of course).
I'm working in *urxvt* in *X.org* on *Linux* on the local host, and it's *bash* on *Linux* at the remote end.
Asked by imz -- Ivan Zakharyaschev
(15862 rep)
Nov 25, 2011, 05:40 PM
Last activity: Nov 10, 2016, 05:08 PM
Last activity: Nov 10, 2016, 05:08 PM