Sample Header Ad - 728x90

Count the longest stretch of consecutive patterns

5 votes
6 answers
570 views
I have a sequence file:
$ cat file
CACCGTTGCCAAACAATG
TTAGAAGCCTGTCAGCCT
CATTGCTCTCAGACCCAC
GATGTACGTCACATTAGA
ACACGGAATCTGCTTTTT
CAGAATTCCCAAAGATGG
I want to calculate the longest stretch of C+T. I could only count total C+T, but I want the longest stretch.
$ cat file | awk '{ print $0, gsub(/[cCtT]/,"",$1)}'
CACCGTTGCCAAACAATG 9
TTAGAAGCCTGTCAGCCT 10
CATTGCTCTCAGACCCAC 12
GATGTACGTCACATTAGA 8
ACACGGAATCTGCTTTTT 11
CAGAATTCCCAAAGATGG 7
The *Expected result* would be to show the longest C+T stretch.
CACCGTTGCCAAACAATG 9 2
TTAGAAGCCTGTCAGCCT 10 3
CATTGCTCTCAGACCCAC 12 5
GATGTACGTCACATTAGA 8 2
ACACGGAATCTGCTTTTT 11 6
CAGAATTCCCAAAGATGG 7 5
Asked by CN_229133 (115 rep)
Jun 29, 2018, 09:32 AM
Last activity: Feb 7, 2024, 09:58 AM