shuffle two parallel text files not giving same lines even if same random source
2
votes
0
answers
270
views
This is similar to https://unix.stackexchange.com/questions/220390/shuffle-two-parallel-text-files
I have:
- two large csv files with parallel lines. (they represent 'before' and 'after' states for particular items). The fields are sometimes strings, sometimes numbers.
- a sufficiently long random data file to use with
shuf
when I want to get a matching random sample I thought of:
shuf -n10 --random-source="random.csv" "file1"
shuf -n10 --random-source="random.csv" "file2"
but these files no longer match.
However, if I put line-numbers in front, it solves the problem:
shuf -n10 --random-source="random.csv" <(cat -n "file1")
shuf -n10 --random-source="random.csv" <(cat -n "file2")
Can someone explain why?
here is sample of random.csv
0.293076138
0.446732207
0.552989654
0.16141527
0.099383023
...
Here is a snippet from the two files:
VA,DEFAULT,72.8027,11.9534.....
VA,DEFAULT,61.8356,11.9342....
VA,DEFAULT,61.8356,....
Note that the first two fields are identical in most of the rows in both files. Maybe this is the issue? I don't know shuf
well enough.
Asked by Tim
(237 rep)
Dec 4, 2019, 11:56 PM
Last activity: Dec 5, 2019, 09:43 PM
Last activity: Dec 5, 2019, 09:43 PM