2023-03-06
Click on Session
-> SSH
Remote Host
: cedar.computecanada.caSpecify username
then add yours, you can save your user name and password in thessh YOUR_USERNAME@cedar.computecanada.ca
YOUR_USERNAME
with your username, in my case: karencgs
When you log in, you are one of hundreds or thousands of people that are sharing the computer in the server, so you should simply run commands that you know or think that could take too much time.
Place files you want to use and you scripts in your Home
folder
mv myFile ~
To run a command, you need to ask for resources, like time and memory, and wait for them to be available.
$SCRATCH or in ~/projects/username
to be able to submit a jobGo to $SCRATCH
and write a simple script in nano:
cd $SCRATCH; nano $HOME/myTry.sh
#!/bin/sh
echo ">seq1
ATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGC
>seq2
GCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCAT" > $HOME/myFasta.fa
grep ">" $HOME/myFasta.fa
Give the script the right permissions and submit your script to the queue of jobs
chmod u=rwx $HOME/myTry.sh
srun --time=00:01:00 sh $HOME/myTry.sh
srun
is one of the commands to submit a script--time
is the option to tell how much time you need to execute the commands in the script
--account=def-projectName
- if you are associated with different accounts (def-desgagne, def-laboidp, def-germain1), you need to specify the one that will be billed for the current job. Replace def-projectName
with the correct account--mem=XXG
or --mem-per-cpu=XXG
- specify how much memory the script needs. For a small blast (less than 100 query sequences), 2G is enough--cpus-per-task=XX
- how many processors you need (big blast jobs run faster if using many cpus)module load StdEnv/2020 gcc/9.3.0 blast+
#!/bin/sh
blastp -query $HOME/myProtein.fa -db $HOME/database -outfmt 7 -out $HOME/myBlastpResult.txt
-query
- use this option to specify the file which you want to blast (here, I want to blast myProtein.fa)-db
- use this option to specify the database you want to search (need to be the full path, but omit the extension)
$HOME
called “proteins”, several files will be created, I just put $HOME/proteins
in the script-outfmt 7
- specifies the type of output, here I want a table, if you don’t specify, you get something similar to the webpage result-out
- use this option to specify where to save the output (similar to > myBlastpResult
)A more general script could take arguments from the command line every time
This is called $HOME/myGeneralBlast.sh
#!/bin/sh
blastp -query $1 -db $2 -outfmt 7 -out $3
To run it, we need to do:
chmod u=rwx $HOME/myGeneralBlast.sh
sh $HOME/myGeneralBlast.sh $HOME/myProtein.fa $HOME/database $HOME/myBlastpResult.txt
You can (MUST) write a script to run blast by submitting a job. Ex. $HOME/srun_blast.sh
:
#!/bin/sh
module load StdEnv/2020 gcc/9.3.0 blast+
blastp -query $1 -db $2 -outfmt 7 -out $3
Again, give the permissions and submit the job:
chmod u=rwx $HOME/srun_blast.sh
cd $SCRATCH
srun --time=00:10:00 --mem=2G\
sh $HOME/srun_blast.sh $HOME/myProtein.fa $HOME/database $HOME/myBlastpResult.txt
srun
you cannot close your session until the job is finished, so if your script takes too long to run or stays too long in the queue, you must either wait for it to finish or cancel it and submit another timeWith sbatch
, once your job is in the queue, you can close your session or continue doing other things in it.
You can use the same script you used for srun
:
$HOME/srun_blast.sh
:
#!/bin/sh
module load StdEnv/2020 gcc/9.3.0 blast+
blastp -query $1 -db $2 -outfmt 7 -out $3
Again, give the permissions and submit the job:
chmod u=rwx $HOME/srun_blast.sh
cd $SCRATCH
sbatch --time=00:10:00 --mem=2G\
sh $HOME/srun_blast.sh $HOME/myProtein.fa $HOME/database $HOME/myBlastpResult.txt
Another option is to write the resources request in the script itself.
$HOME/sbatch_blast.sh
:
#!/bin/sh
#SBATCH --time=00:10:00
#SBATCH --mem=2G
module load StdEnv/2020 gcc/9.3.0 blast+
blastp -query $HOME/myProtein.fa\
-db $HOME/database -outfmt 7\
-out $HOME/myBlastResult.txt
#SBATCH --account=def-project
- same as the account information for srun
srun
, with sbatch
you need to either give the full path to each file in their names OR you need to add the line cd myFolder
(where myFolder is the full path to folder where have your files)
sbatch
, it is like you are logging in againYou can use echo
to write a script that takes the values from the command line and outputs a blast script ready to submit with sbatch.
Ex. $HOME/create_master_blast.sh
#!/bin/sh
### Variables ###
myQuery=$1
database=$2
output=$3
scriptName=$4
## Create script ##
echo '#!/bin/sh
#SBATCH --time=00:10:00
#SBATCH --mem=2G
module load StdEnv/2020 gcc/9.3.0 blast+
blastp -query' $myQuery '\
-db' $database '-outfmt 7\
-out' $output > $scriptName
## Give permissions ##
chmod u=rwx $scriptName
The script has 3 blocks:
Variables - we save the values from the command line into variables that echo will use. We need to give 4 values: the input sequence path, the path to the database, the path to where we want the output and the name we want to give to the script.
The body of the script, in which we use echo to put write the file
The permissions - we give the right permissions to the new script directly inside the one that created it.
Now, all we need to do is:
sh $HOME/create_master_blast.sh\
$HOME/myProtein.fa\
$HOME/database\
$HOME/myBlastResult.txt\
$HOME/master_blast.sh
sbatch $HOME/master_blast.sh # srun sh $HOME/master_blast.sh also works