AWK command

Karen Cristine Goncalves, Ph.D.

2023-02-13

AWK - print specific columns

  • Always put commands inside single quotes ''
  • Use {print} to indicate lines or columns you want to see
  • Indicate a column number using $N (replace N with the number)
  • If you want to see several columns, separate them with a ,:
    • {print $N,$X}
  • To indicate the whole line, use $0

AWK - print specific columns

# Print second column of the file blastp_results.txt
awk '{print $2}' blastp_results.txt
# Print second and third columns of the file blastp_results.txt
awk '{print $2,$3}' blastp_results.txt

AWK - print line with specific word

Inside the '' add the word your searching for in //

# Print lines containing "#" in blastp_results.txt
awk '/#/ {print}' blastp_results.txt

# Print lines NOT containing "#" in blastp_results.txt
awk '!/#/ {print}' blastp_results.txt
  • Note that when you do not specify the column, or when you use $0, you print the whole line

AWK - indicate how to read the input and write the output

  • NR - number of records (normally number of lines):
    • awk 'NR > 1 {print}' blastp_results.txt
      • will print all but the first line
    • awk 'NR == 1 {print $1} NR > 1 {print $2}' blastp_results.txt
      • will print the first column for line 1 and the second column for the other lines
  • NF - number of fields (normally columns):
    • awk '{print $NF}' blastp_results.txt
      • will print the last column the file

AWK - indicate how to read the input and write the output

Note the these can be specified by adding BEGIN {}:

  • FS - field separator - column delimiter
    • awk 'BEGIN {FS=","} {print $2}' blastp_results.csv
      • Indicates the columns in the file blastp_results.csv are separated by commas and asks for the second column
      • Outside the '' -F"," or --field-separator="," are the same
  • OFS - output field separator - column delimiter for the output
    • awk 'BEGIN {OFS=","} {print $1,$2,$3}' blastp_results.txt
      • Changes the separation of columns from space (default for awk) to commas

AWK - indicate how to read the input and write the output

Note the these can be specified by adding BEGIN {}:

  • RS - record separator
    • awk 'BEGIN {RS=">"} {print $1}' myFasta.fasta
      • Tells awk that new entries are indicated with a “>” symbol
  • ORS - record separator for the output
    • awk 'BEGIN {ORS=">"} {print $1}' myFasta.txt
      • Tells awk that new entries in the output will start with the “>” symbol

AWK example

  • Sequence length
awk 'BEGIN {FS = "\n"; RS=">"}\
 {print $1, length($2)}' ~/blastClass/myFasta.fasta

AWK example

  • Change fasta to a table format
awk 'BEGIN {FS = "\n"; RS=">";\
 OFS="\t"; ORS="\n"}\
 {print $1,$2}' ~/blastClass/myFasta.fasta

AWK - replace word

  • Replace the letter N in the sequence line by '*'
awk 'BEGIN {FS = "\n"; RS=">"}\
 {newSeq=gensub(/N/, "*", $2); print $1, newSeq}' ~/blastClass/myFasta.fasta

AWK - print with one condition

  • Conditions inside {} come after if and inside () OR they go outside the {}
# Print line if column 3 has a value greater than 80
awk '{if ($3 > 80) print}' ~/blastClass/myFasta.fasta
  • Or they go outside the {}
# Print line if column 3 has a value greater than 80
awk '$3 > 80 {print}' ~/blastClass/myFasta.fasta

AWK - print with multiple conditions

  • When the conditions must all be met, separate them with &&

# Print line if column 3 has a value greater than 80 AND column 4 has a value greater than 150
awk '{if ($3 > 80 && $4 > 150) print}' ~/blastClass/myFasta.fasta
    
# This also gives the same result
awk '$3 > 80 && $4 > 150 {print}' ~/blastClass/myFasta.fasta

AWK - print with multiple conditions

  • When only one condition must be met, separate them with ||
# Print line if column 3 has a value greater than 80 OR column 4 has a value greater than 150
awk '{if ($3 > 80 || $4 > 150) print}' ~/blastClass/myFasta.fasta

awk '$3 > 80 || $4 > 150 { print }' ~/blastClass/myFasta.fasta

Resources