2023-04-04
sed
''
""
If you do not have it, download the file prots.fasta and put it in the folder with the previous classes files.
# delete lines 1 to 10 in ~/blastClass/prots.fasta
wc -l ~/blastClass/prots.fasta
sed '1,10d' ~/blastClass/prots.fasta | wc -l
# delete all lines appearing after the line containing Lychi
sed '/Lychi/q' ~/blastClass/prots.fasta | less
# delete lines containing the word "partial"
sed '/partial/d' ~/blastClass/prots.fasta | less
In the second case, if the word is not found in the file, the whole file will be printed.
# In each line of the ~/blastClass/prots.fasta file, replace the first "_" (separates then species from the protein ID)
sed 's/_/ protein /' ~/blastClass/prots.fasta | less
# In the ~/blastClass/prots.fasta file, replace all "_" (separates then species from the protein ID)
sed 's/_/-/g' ~/blastClass/prots.fasta | less
sed 's/WORD_SEARCHED/REPLACEMENT/g' input_file
s
- substituteg
- global (everywhere)Try the command with and without the number 1
# In line 1 of the ~/blastClass/prots.fasta file, replace the first "Agapanthus" in the line
sed '1 s/Agapanthus/Ag_/' ~/blastClass/prots.fasta | less
# In line 1 of the ~/blastClass/prots.fasta file, replace all "Agapanthus_africanus"
sed '1 s/Agapanthus/Ag_/g' ~/blastClass/prots.fasta | less
sed 'LINE_NUMBER s/WORD_SEARCHED/REPLACEMENT/g' input_file
s
- substituteg
- global (everywhere)# Replace PF0 with PFAMID PF0, and see which proteins have a match in PFAM
sed -n 's/PF0/PFAMID PF0/p' ~/blastClass/prots.fasta | less
-n
- do not print lines/p
- print lines where replacement occurred
-n
, every line would be printing, with lines replaced being printed twice (once automatically, then because you asked to see the replacement)Use w filename
# Replace the ">" in the ~/blastClass/prots.fasta file and save result to ~/blastClass/protsSED.fasta
sed "s/>/Fasta entry /w $HOME/blastClass/protsSED.fasta" ~/blastClass/prots.fasta
w
- write to file $HOME/blastClass/protsSED.fastaUse the option -E
: sed -E ''
$
- line end; ^
Line begins
[]
- when any character in a list is accepted in the search, put the list inside the []
sed -En '/>[ABC]/p' ~/blastClass/prots.fasta
will print lines for proteins of species that start with A, B or C[A-Z]
- any upper case letter of the alphabet[a-z]
- any lower case letter of the alphabet[0-9]
- any number (in grep, same as \d
)[A-Za-z0-9]
- any letter or number.
- any character. Ex.: sed -En '/>A.a/p' ~/blastClass/prots.fasta
*
- find anything 0 or more times. Ex.: remove anything after the space - sed -En 's/ .+$//g' ~/blastClass/prots.fasta | less
+
- find the previous character or []
one or more times.{Ns,NE}
- find the previous character at least Ns times and maximum NE times