Sample Header Ad - 728x90

sed command to replace a word within a line following a pattern

1 vote
5 answers
132 views
I'm working with a file that looks like the following, containing with over 50,000 lines of gene IDs followed by their sequence: gene_A:3342234 CTCTTTCTTTTACGCCT gene_A:1244-5205 CTCTTTCTTTTACGCCT gene_A:1838438 CTCTTTCTTTTACGCCT gene_B:1848584 CTCTTTCTTTTACGCCT gene_B:1029-4920 CTCTTTCTTTTACGCCT gene_C:3849029 CTCTTTCTTTTACGCCT They all have the gene ID, followed by a colon, and then the reference number of 7-9 digits and (some include dashes). I want to replace the gene IDs with their actual names, for example geneA and geneB, whilst keeping the information that follows them. Desired output: geneA CTCTTTCTTTTACGCCT geneA CTCTTTCTTTTACGCCT geneA CTCTTTCTTTTACGCCT geneB CTCTTTCTTTTACGCCT geneB CTCTTTCTTTTACGCCT geneB CTCTTTCTTTTACGCCT This is my first time using sed, so I'm really not quite sure where to even start. I know how to replace all lines containing gene_A with 's/gene_A.*/geneA/' but I'm not sure how to preserve the information following the gene IDs.
Asked by bryophyta (11 rep)
Feb 3, 2024, 09:34 PM
Last activity: Apr 4, 2024, 03:49 PM