sed command to replace a word within a line following a pattern
1
vote
5
answers
132
views
I'm working with a file that looks like the following, containing with over 50,000 lines of gene IDs followed by their sequence:
gene_A:3342234 CTCTTTCTTTTACGCCT
gene_A:1244-5205 CTCTTTCTTTTACGCCT
gene_A:1838438 CTCTTTCTTTTACGCCT
gene_B:1848584 CTCTTTCTTTTACGCCT
gene_B:1029-4920 CTCTTTCTTTTACGCCT
gene_C:3849029 CTCTTTCTTTTACGCCT
They all have the gene ID, followed by a colon, and then the reference number of 7-9 digits and (some include dashes).
I want to replace the gene IDs with their actual names, for example
geneA
and geneB
, whilst keeping the information that follows them. Desired output:
geneA CTCTTTCTTTTACGCCT
geneA CTCTTTCTTTTACGCCT
geneA CTCTTTCTTTTACGCCT
geneB CTCTTTCTTTTACGCCT
geneB CTCTTTCTTTTACGCCT
geneB CTCTTTCTTTTACGCCT
This is my first time using sed, so I'm really not quite sure where to even start. I know how to replace all lines containing gene_A with 's/gene_A.*/geneA/'
but I'm not sure how to preserve the information following the gene IDs.
Asked by bryophyta
(11 rep)
Feb 3, 2024, 09:34 PM
Last activity: Apr 4, 2024, 03:49 PM
Last activity: Apr 4, 2024, 03:49 PM