Sample Header Ad - 728x90

inverse Heisenbug: iputils clockdiff returns host down, except when straced

3 votes
1 answer
45 views
I own two systems, host-a and host-b, connected in a lan via a switch; they both have no firewall. When I run clockdiff in each of them I receive the error message that the other host is down. All commands are executed in a clean bash by running exec -c bash --norc (files redacted).
root@host-a:~# clockdiff host-b
...........................clockdiff: host-b is down
root@host-a:~# clockdiff -o host-b
...........................clockdiff: host-b is down
At the same time, a tcpdump on the same machine host-a reveals that the queried host responds with ICMP messages (ICMP time stamp in the former case, ECHO in the latter)
root@host-a:~# tcpdump icmp
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on lan0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
 ....
:18.078956 IP host-b > host-a: ICMP time stamp reply id 29228 seq 7424: org :18.078, recv :08.347, xmit :08.347, length 20
:18.078956 IP host-b > host-a: ICMP time stamp reply id 29228 seq 7680: org :18.078, recv :08.347, xmit :08.347, length 20
1
 ...
Curiously, when I run the exact same commands under strace, the replies are measured and a correct response is returned.
root@host-a:~# strace -o /tmp/log clockdiff host-b
....................................................
host=host-b rtt=0(0)ms/0ms delta=-9737ms/-9737ms 
root@host-a:~# strace -o /tmp/log clockdiff -o host-b
...................................................
root@host-a:~# strace -o /tmp/log clockdiff  host-b
....................................................
host=host-b rtt=0(0)ms/0ms delta=-9737ms/-9737ms 
root@host-a:~# uname -a
Linux host-a 6.1.0-25-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux
Some excerpt from the strace log:
...
connect(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("")}, 16) = 0
ppoll([{fd=3, events=POLLIN|POLLHUP}], 1, {tv_sec=0, tv_nsec=0}, NULL, 8) = 0 (Timeout)
sendto(3, "\r\0"..., 20, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("")}, 16) = 20
ppoll([{fd=3, events=POLLIN|POLLHUP}], 1, {tv_sec=1, tv_nsec=0}, NULL, 8) = 1 ([{fd=3, revents=POLLIN}], left {tv_sec=0, tv_nsec=999837227})
recvfrom(3, "E\0"..., 1024, 0, NULL, 0x7ffff369ecd8) = 40
newfstatat(1, "", {st_mode=S_IFCHR|0600, st_rdev=makedev(0x88, 0xb), ...}, AT_EMPTY_PATH) = 0
write(1, ".", 1)                        = 1
ppoll([{fd=3, events=POLLIN|POLLHUP}], 1, {tv_sec=0, tv_nsec=937000000}, NULL, 8) = 0 (Timeout)
sendto(3, "\r\0"..., 20, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("")}, 16) = 20
ppoll([{fd=3, events=POLLIN|POLLHUP}], 1, {tv_sec=0, tv_nsec=937000000}, NULL, 8) = 1 ([{fd=3, revents=POLLIN}], left {tv_sec=0, tv_nsec=936829484})
recvfrom(3, "E\0"..., 1024, 0, NULL, 0x7ffff369ecd8) = 40
write(1, ".", 1)                        = 1
ppoll([{fd=3, events=POLLIN|POLLHUP}], 1, {tv_sec=0, tv_nsec=842000000}, NULL, 8) = 0 (Timeout)
sendto(3, "\r\0"..., 20, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("")}, 16) = 20
ppoll([{fd=3, events=POLLIN|POLLHUP}], 1, {tv_sec=0, tv_nsec=842000000}, NULL, 8) = 1 ([{fd=3, revents=POLLIN}], left {tv_sec=0, tv_nsec=841996498})
recvfrom(3, "E\0"..., 1024, 0, NULL, 0x7ffff369ecd8) = 40
write(1, ".", 1)                        = 1
ppoll([{fd=3, events=POLLIN|POLLHUP}], 1, {tv_sec=0, tv_nsec=736000000}, NULL, 8) = 0 (Timeout)
sendto(3, "\r\0"..., 20, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("")}, 16) = 20
ppoll([{fd=3, events=POLLIN|POLLHUP}], 1, {tv_sec=0, tv_nsec=736000000}, NULL, 8) = 1 ([{fd=3, revents=POLLIN}], left {tv_sec=0, tv_nsec=735845402})
recvfrom(3, "E\0"..., 1024, 0, NULL, 0x7ffff369ecd8) = 40
write(1, ".", 1)                        = 1
ppoll([{fd=3, events=POLLIN|POLLHUP}], 1, {tv_sec=0, tv_nsec=631000000}, NULL, 8) = 0 (Timeout)
sendto(3, "\r\0"..., 20, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("")}, 16) = 20
ppoll([{fd=3, events=POLLIN|POLLHUP}], 1, {tv_sec=0, tv_nsec=631000000}, NULL, 8) = 1 ([{fd=3, revents=POLLIN}], left {tv_sec=0, tv_nsec=630791917})
recvfrom(3, "E\0"..., 1024, 0, NULL, 0x7ffff369ecd8) = 40
write(1, ".", 1)                        = 1
ppoll([{fd=3, events=POLLIN|POLLHUP}], 1, {tv_sec=0, tv_nsec=532000000}, NULL, 8) = 0 (Timeout)
sendto(3, "\r\0"..., 20, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("")}, 16) = 20
ppoll([{fd=3, events=POLLIN|POLLHUP}], 1, {tv_sec=0, tv_nsec=532000000}, NULL, 8) = 1 ([{fd=3, revents=POLLIN}], left {tv_sec=0, tv_nsec=531768694})
recvfrom(3, "E\0"..., 1024, 0, NULL, 0x7ffff369ecd8) = 40
write(1, ".", 1)                        = 1
sendto(3, "\r\0"..., 20, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("")}, 16) = 20
ppoll([{fd=3, events=POLLIN|POLLHUP}], 1, {tv_sec=0, tv_nsec=442000000}, NULL, 8) = 1 ([{fd=3, revents=POLLIN}], left {tv_sec=0, tv_nsec=441949198})
recvfrom(3, "E\0"..., 1024, 0, NULL, 0x7ffff369ecd8) = 40
write(1, ".", 1)                        = 1
 ...
The excutable is not setuid and the OS is debian.
root@host-a:~# ls -laF which clockdiff
-rwxr-xr-x 1 root root 23120 Sep 24  2024 /usr/bin/clockdiff*
root@host-a# getcap /usr/bin/clockdiff                                
/usr/bin/clockdiff cap_net_raw=ep
root@host-a:~# dpkg -S /usr/bin/clockdiff
iputils-clockdiff: /usr/bin/clockdiff
root@host-a:~# clockdiff -V
clockdiff from iputils 20221126
libcap: yes, IDN: yes, NLS: no, error.h: yes, getrandom(): yes, __fpending(): yes
I have encountered strace Heisenbugs before, but in this case, I cannot fathom why the normal invocation would fail to see any responses, when it is still run as root. Also, this behaviour is empirically always reproducible in around 10 repetitions of the test. Why on earth can this be?
Asked by Janis (139 rep)
Jun 27, 2025, 09:03 AM
Last activity: Jun 27, 2025, 07:18 PM