How to sample without replacement from a script that randomly extracts 200characters using shuf?
0
votes
2
answers
416
views
I have this script that extracts 200 random characters from a set:
#!/usr/bin/bash
n=$(stat -c "%s" newfile.txt)
r=$(shuf -i1-"$((n-200+1))" -n1)
output.txt
I know shuf
is very powerful but I want to include a sampling without replacement. This means that each 200 character extraction has only one chance to be selected when sampling.
Output should look like this:
>1
GAACTCTACCAAAAGGTATGTTGCTTTCACAAAAAGCTGCATTCGATCATGTGTATAATCTAGCAAAACTAGTAGGAGGAGCAAAATACCCCGAAATTGTTGCTGCTCAGGCAATGCACGAATCAAACTACCTAGATCCTAGG
ACTAATAGTGTTTATAATGCCACAAATAGAACTAATGCTTTCGGTCAAACTGGTGAC
>2
GCCTACCGCATAAAACAGCATCACCGCCACGGCTTCAGGGTATTCTCCAATGGCAAAGGCTCCCATGGTCGCGATGGACATTAAGAGAAATTCAGTAAAGAAATCTCCATTTAGAATACTTTTGAATCCTTCTTTTATCACCG
GAAAACCAACTGGGAGATAGGCCACAATGTACCAACCTACTCGCACCCAATCTGTAA
>3
GCACGTGTCACCGTCAGCATCGCGGCAGCGGAACGGGTCACCCGGATTGCTGTCGGGACCATCGTTTACGCCGTCATTGTCGTTATCGGGATCGCCCGGATTACAAATGCCGTCGCCATCGACGTCGTTACCGTCGTTCGCGG
CATCGGGGAAGCCGGCACCGGCGGCACAGTCATCGCAACCGTCGCCATCGGCATCGA
>4
GCGTTCGAAGCAATTGCACGAGACCCAAACAACGAATTGCTGGTTGTTGAACTGGAAAACTCTCTAGTCGGAATGCTTCAAATTACTTATATTCCCTACCTGACACATATTGGCAGTTGGCGTTGTCTTATAGAAGGTGTTCG
AATCCATAGTGACTATCGTGGACGAGGTTTTGGTGAGCAAATGTTCGCACATGCGAT
>5
GTTTAAGACTAACAGCAATCTGTAAGGACATAGGTGCTGGAGTTGAGGTTAGTCTGGAAGATATGATCTGGGCAGAGAAATTGTCCAAAGCAAACACCGCAGCAAGAGGTATGCTAAACACAGCAAGAAGAATAAGTAATGAT
CCTACTGATTCTTTTCTGAATGAGTTGAATATAGGAGACCCCGACTCAACTCATCAT
The input file is a ~8G file that looks like this:
CCAAGATCGCTGGTTGGCGAATCAATTTCATAAACGCCTACGCTTTCAAGGAACGTGTTAAGAATGTTCT
GGCCGAGTTCCTTATGAGACGTTTCGCGTCCCTTAAATCGAATAACGACACGAACCTTGTCGCCGTCATT
AAGAAAACCCTTTGCCTTCTTGGCCTTAATCTGAATATCACGGGTGTCCGTTACAGGTCGCAACTGGATT
TCCTTGACTTCAGAAACAGACTTACGTGAATTCTTCTTGATTTCTTTCTGACGCTTTTCATTTTCATACT
GGAACTTGCCGTAATCAATGATCTTACAAACAGGAATATCACCCTTATCAGAGATCAATACCAAATCAAG
TTCGGCATCAAAAGCGCGATCAAGTGCGTCTTCAATGTCGAGGACCGTTGTTTCTTCACCGTCAACCAAA
CGAATTGTGGAGGACTTGATGTCGTCTCGGGTACTAATTTTATTCACGTATATGTTACTCCTTATGTTGT
Any help would be appreciated. Thanks in advance.
Asked by GSQ
(37 rep)
Dec 19, 2020, 03:23 PM
Last activity: Dec 25, 2020, 11:12 AM
Last activity: Dec 25, 2020, 11:12 AM