Clean-up FASTA Header with Vim

The Raw FASTA file

>gi|937372680|gb|KT805061.1| Steiropteris leprieurii voucher A. Salino 14500 (BHCB) tRNA-Leu (trnL) gene and trnL-trnF intergenic spacer, partial sequence; chloroplast
AAATAAATTTCGGGCGATGAGTCGAGATAGGTACAGAGACTCGATGGGGGCCATTCCAACGAACAGTCTGTTAGTTACTAGTTCTCAAAAAAACTGAATATCTAACTGTTTTGCGTGGTTAACTTCATGGGTGGGGTAA

Cleaned up FASTA file

>Steiropteris_leprieurii_KT805061
AAATAAATTTCGGGCGATGAGTCGAGATAGGTACAGAGACTCGATGGGGGCCATTCCAACGAACAGTCTGTTAGTTACTAGTTCTCAAAAAAACTGAATATCTAACTGTTTTGCGTGGTTAACTTCATGGGTGGGGTAA

Step by Step Directions

1. Open terminal and navigate to your home directory (cd ~/)
2. Check to see if .vimrc exists (ls -l .vimrc). If it doesn’t move on to step 3, otherwise skip to step 4
3. Create .vimrc (touch .vimrc)
4. Open .vimrc (vim ~/.vimrc)
5. Insert following code in the file (Get into insert mode by pressing ‘i’). Copy pasting directly ‘as-is’ from here is recommended!

function! CleanFasta()
:%s/>.*gb|/>/g
:%s/\([A-Z][a-z]\+\)\s\([a-z]\+\).*/\1_\2/g
:%s/\(\u\+\d\+\).1|\s\(\u\l\+_\l\+\)/\2_\1/g
:%s/\([ATGC]\)$\n/\1/g
endfunction
nmap :call CleanFasta()

6. Save and close .vimrc (:wq)
7. Open Fasta file in vim (vim my_fasta_file.fas)
8. Press escape to ensure being in command mode, as opposed to insert mode
9. Press key and watch magic happen

Leave a comment