I seem to be having the same issue as described in https://unix.stackexchange.com/questions/562256/the-join-utility-reports-file-is-not-sorted-but-in-fact-it-is-sorted however I have piped BOTH files through
sort
before attempting to join. I have also tried sort -d
and sort -g
.
This is running on Amazon Linux 2, using sort from coreutils-8.22-24
The following illustrates the issue:
root@host:/home/user# cat /tmp/db_schema_size | sort
directory 0.000106811523
directory_1 1.059814453265
directory_123 0.564987182688
directory_123123 0.564987182688
directory_1234 0.564987182688
directory_12345 0.564987182688
directory_1234567 0.564987182688
directory_82473 0.934677124123
directory_82475 0.751586914161
directory_82477 0.881881713968
directory_82479 0.751571655373
directory_82481 0.750396728614
directory_82483 0.589370727610
root@host:/home/user# cat /tmp/db_dir_sizes | sort
directory 132
directory_1 1115936
directory_123123 613244
directory_12345 613248
directory_1234567 613248
directory_1234 613244
directory_123 613244
directory_82473 1015140
directory_82475 818764
directory_82477 958628
directory_82479 818756
directory_82481 817500
directory_82483 638820
Both files are the same structure - no lead/trailing whitespace, a single tab char between the values.
Both files are processed by sort but produce output in a different order.
I do see that on Ubuntu 22.04 LTS, the output is consistent (and of the first form above).
What am I missing here?
**update**
For clarification...
On AWS Linux 2 with LANG=en_US.UTF-8, I get output as above - i.e. output differs
On AWS Linux 2 with LANG=C.UTF-8 output is the same
On Ubuntu with both LANG=C.UTF-8 and LANG=en_US.UTF-8, output is the same
Asked by symcbean
(6301 rep)
Jun 19, 2024, 01:37 PM
Last activity: Aug 31, 2024, 04:32 AM
Last activity: Aug 31, 2024, 04:32 AM