Sample Header Ad - 728x90

shuffle two parallel text files not giving same lines even if same random source

2 votes
0 answers
270 views
This is similar to https://unix.stackexchange.com/questions/220390/shuffle-two-parallel-text-files I have: - two large csv files with parallel lines. (they represent 'before' and 'after' states for particular items). The fields are sometimes strings, sometimes numbers. - a sufficiently long random data file to use with shuf when I want to get a matching random sample I thought of:
shuf -n10 --random-source="random.csv" "file1" 
shuf -n10 --random-source="random.csv" "file2"
but these files no longer match. However, if I put line-numbers in front, it solves the problem:
shuf -n10 --random-source="random.csv" <(cat -n "file1") 
shuf -n10 --random-source="random.csv" <(cat -n "file2")
Can someone explain why? here is sample of random.csv
0.293076138
0.446732207
0.552989654
0.16141527
0.099383023
...
Here is a snippet from the two files:
VA,DEFAULT,72.8027,11.9534.....
VA,DEFAULT,61.8356,11.9342....
VA,DEFAULT,61.8356,....
Note that the first two fields are identical in most of the rows in both files. Maybe this is the issue? I don't know shuf well enough.
Asked by Tim (237 rep)
Dec 4, 2019, 11:56 PM
Last activity: Dec 5, 2019, 09:43 PM