Sample Header Ad - 728x90

How to count word from a column when consecutive cells are equal in a different column using shell script!

1 vote
5 answers
98 views
I'm trying to count the number of C_R and S_R in column 9 when consecutive cells in column 2, column 3, and column 1 are the same. The file is in bed format (tab-separated format). The original file is huge and column 1 defines chromosome number. The first few lines of the file look like this, chr1 10200 10300 8 10000 10214 100 214 S_R chr1 10200 10300 8 10009 10233 100 224 S_R chr1 10200 10300 8 10014 10220 100 206 S_R chr1 10200 10300 8 10045 10215 100 170 S_R chr1 10200 10300 8 10068 10209 100 141 S_R chr1 10200 10300 8 10074 10300 100 226 C_R chr1 10200 10300 8 10182 10283 100 101 S_R chr1 10200 10300 8 10182 10387 100 205 C_R chr1 10300 10400 4 10182 10387 100 205 S_R chr1 10300 10400 4 10331 10467 100 136 S_R chr1 10300 10400 4 10346 10461 100 115 S_R chr1 10300 10400 4 10352 10468 100 116 S_R chr1 10400 10500 3 10331 10467 100 136 S_R chr1 10400 10500 3 10346 10461 100 115 S_R chr1 10400 10500 3 10352 10468 100 116 S_R chr1 11000 11100 2 11024 11163 100 139 S_R chr1 11000 11100 2 11024 11188 100 164 S_R chr1 11100 11200 3 11024 11163 100 139 S_R chr1 11100 11200 3 11024 11188 100 164 S_R chr1 11100 11200 3 11127 11296 100 169 S_R chr1 11200 11300 1 11127 11296 100 169 S_R chr1 11400 11500 2 11412 11561 100 149 S_R chr1 11400 11500 2 11457 11608 100 151 S_R chr1 11500 11600 3 11412 11561 100 149 S_R chr1 11500 11600 3 11457 11608 100 151 C_R chr1 11500 11600 3 11574 11744 100 170 S_R chr1 11600 11700 3 11457 11608 100 151 S_R chr1 11600 11700 3 11574 11744 100 170 C_R chr1 11600 11700 3 11640 11815 100 175 S_R chr1 11700 11800 4 11574 11744 100 170 S_R chr1 11700 11800 4 11640 11815 100 175 C_R chr1 11700 11800 4 11784 11963 100 179 S_R chr1 11700 11800 4 11791 11936 100 145 S_R In this above table first 8 rows the col 1, 2, 3 are same so, the tentative output file would look like chr1 10200 10300 2 6 chr1 10300 10400 0 4 chr1 10400 10500 0 3 chr1 11000 11100 0 2 chr1 11100 11200 0 3 chr1 11200 11300 0 1 chr1 11400 11500 0 2 chr1 11500 11600 1 2 chr1 11600 11700 1 2 chr1 11700 11800 1 3 Where in ouput file, col 4 is C_R and col 5 is S_R
Asked by Debajyoti Kabiraj (251 rep)
Oct 8, 2023, 06:03 AM
Last activity: Feb 10, 2024, 02:44 AM