Skip to main content

Genomics Nodes on Quest

The Feinberg School of Medicine, in an initiative spearheaded by the Feinberg School of Medicine Office of the Dean, the Center for Genetic Medicine, and the Department of Biochemistry and Molecular Genetics provides 100 nodes on Northwestern’s High Performance Computing Cluster, Quest, to be used for genomics research. This University resource has been made available to the greater genomics community in an effort to foster genomics research and empower the computational genomics community at Northwestern University.

How can I apply to use the genomics nodes?

Principle Investigators (PIs) doing genomics research and their graduate students may apply to use the genomics nodes by completing the Genomics Node User Registration form. In addition to providing their name and NetID, prospective users will be prompted to enter a brief statement of research to be performed on the genomics nodes. Once requests have been approved, the user’s NetID will be added to the Quest buy-in group, b1042, and the researcher may begin using the nodes.

What is included with access to the genomics nodes?

Access to Genomics Nodes

The 100 Genomics Nodes each have 24 cores and 128 GB of memory for a combined resource of 2,400 cores. Submit jobs to these nodes through two different genomics queue options on Quest:

155 TB scratch in /projects/b1042

This shared storage space is intended for temporary files needed and created when genomics jobs are run. Users are requested to create a PI directory under projects/b1042 to keep scratch files from intermingling with other users’ work. Please keep in mind this is a shared resource and it is important to implement best practices regarding cleaning up these files and using the scratch space in the most efficient way possible, including ending batch files by deleting temporary files that are no longer needed and moving files to your /projects or /home directory when complete.

Files in the scratch space will be deleted after 30 days.  You will receive an email reminder before your scratch files are deleted, but it is your responsibility to move files you want to keep into long term storage such as your home directory on Quest. To learn about other research storage options at Northwestern or to increase the space you have available in your home directory. Contact quest-help@northwestern.edu for assistance.

10 TB shared library space in /projects/genomicsshare

This storage space is read-only and intended to store shared reference files for the use of the genomics research community.  To see a list of files available in genomicsshare, look in /projects/genomicsshare/README.  To request files be added to the shared genomics reference library, email quest-help@northwestern.edu and include genomicsshare in the subject line.

How do I start using the nodes?

You will first need to log in to Quest. Help for logging into Quest for the first time.

Moving files onto Quest

From Box: If you have sequencing done by NUSeq Core, they will deliver your FASTQ files via Northwestern Box.

To get files from Box into Quest:

  1. Make sure you have the necessary tunneling software; Northwestern IT recommends FastX2.  Download and Install FastX2. You will need to logout and login to your local machine for the tunneling software to take effect.
  2. ssh to Quest from your terminal on your local machine.
  3. Run Firefox by typing the following commands in the terminal window: “module load firefox”, followed by “firefox” to launch the Firefox browser.
  4. In the Firefox browser, type “northwestern.box.com” and log in.
  5. Select the file you wish to move onto Quest and click Download in the upper right corner. By default, the file will be placed into the Downloads directory in your home directory.

Note: If you want the file to go directly into /projects/b1042, before downloading click on the menu icon in the upper right of the Box browser, and click Preferences.  From there you can select which directory to download into on Quest, including the /project directories.

For information about transferring files from your local machine, RDSS, Globus, or FSMRESFILES (FSM secure data storage) to Quest please visit the Transferring Files on Quest web page.

Bioinformatics software available on Quest

To see a complete  list of software installed on Quest by the Quest administrators, type “module avail”.  Modules are scripts that make it possible to run software in your shell; if you would like to run a particular software package, type “module load”, and then run the software.

New software is continually being added to Quest. Please use the command “module avail” for an up-to-date listing of all software packages including genomics software on Quest.

Software related to genomics on Quest includes:

GeneTorrent/3.8.6 bcl2fastq/2.17.1.14 bedtools/2.17.0 blast/2.4.0 bowtie/1.1.2 bowti2/2.2.6 bsmap/2.90
bwa/0.7.12 fastqc/0.11.5 gatk/3.4.0 gtool/0.7.5 plink/1.07 plink/1.9 qctool/1.0
samtools/1.2 shapeit/v2.r837 shapeit/v2.r837 snptest/2.5 snptest/2.5.2 tophat/2.1.0 subread/1.5.1
picard/1.131 picard/2.6.0



If you would like additional software packages installed on Quest, please contact quest-help@northwestern.edu. Additionally, software can also be installed by users in their own directories.

Running jobs on b1042 nodes: Submission Scripts

The program that schedules jobs on Quest is Moab. To submit, monitor, modify, and delete jobs on Quest you must use Moab commands, which begin with #MSUB.

Useful commands to put into your job submission script:

Required Commands

Description

#!/bin/bash

The first line of your script, specifying the shell to use

#MSUB -A b1042

Tells the scheduler you are using the b1042 genomics account

#MSUB -q genomics

Puts your job into the genomics queue to run on the b1042 nodes. The genomics queue is appropriate for jobs running on less than 10 nodes with a walltime fewer than 48 hours. To run on more than 10 nodes or for longer than 48 hours, please contact quest-help@northwestern.edu to schedule your job to run on the genomics-blast queue.

#MSUB -l walltime=hh:mm:ss

Necessary to the scheduler for allocating resources.

#MSUB -e errlog

Writes the error file for the job into a file named errlog - the error file is very important for diagnosing jobs that fail to run properly.

Optional Commands

Description

#MSUB -N name_of_job

Gives the job a descriptive name, useful for reporting (qstat)

#MSUB -m abe

Sends an email if your job (a)borts, (b)egins, or (e)nds

#MSUB -M <your email>

Specifies email address, can be a comma separated list of users.

#MSUB -l nodes=N:ppn=p

Specify how many nodes and how many ppn (processors per node - also called cores). If this command is left out, one core on one node will be allocated. If your code is not parallelized, one core on one node will mostly likely be appropriate for your job.

#MSUB -o outlog

Writes the output log for the job into a file named outlog

#MSUB -j oe

Joins the (o)utput and (e)rror files into a single file. By default the name of this file will be <Name_of_Job>.o<JOBID>

cd $PBS_O_WORKDIR

Before starting the job, the current working directory of the script ($HOME by default) should be changed to the intended location, mostly likely /projects/b1042/<your_PI_dir>. If you haven’t yet created a directory for your PI group in /projects/b1042, please do it before running your job to keep your files separate from other users.

An example submission script

Note: Line-numbers are included for reference purposes - do not put them in your script.


1 #!/bin/bash
2 #MSUB -A b1042
3 #MSUB -q genomics
4 #MSUB -l walltime=24:00:00
5 #MSUB -M myemailaddress
6 #MSUB -j oe
7 #MSUB -N somename_tophat
8 #MSUB -l nodes=1:ppn=6
9 export PATH=$PATH:/projects/pxxxxx/tools/
10 module load bowtie2/2.2.6
11 module load tophat/2.1.0
12 module load samtools
13 module load boost
14 module load gcc/4.8.3
15 module load java
16 cd $PBS_O_WORKDIR
17 # Make Directory for FastQC reports in your PI folder in b1042
18 mkdir /projects/b1042/my_PI/fastqc/reports
19
20 # Trim poor quality sequence
21 java -jar <someinput> <someoutput>
22 # Running FastQC
23 fastqc -o <other_input> /projects/b1042/my_PI/fasqc/reports/<other_output>

If the name of the script was myscript.sh, job is submitted as: msub myscript.sh

Line 1 loads the bash shell.
LInes 2-8 are interpreted by MOAB. Until MOAB acquires the resources, no other line in this script is executed.
Line 9: Sets the user's path to include their own tools
Lines 10-15: Loads modules from centrally managed software
Line 16: Returns the users to the directory from where they submitted the job
Lines 17, 20 are comments
Line 18 is a shell command
Lines 21 & 23 Launch application executables

For more examples, see Examples of Jobs on Quest

Submitting your batch job

At the command line type “msub <name_of_script>”. Upon submission the scheduler will return your job number.
If you receive a “permission denied” error, check to make sure your script has the correct permission to execute by typing “ls -l <name_of_script>”; the fourth character in the permissions string indicates if you can execute your file. If it is not an “x”, type “chmod u+x <name_of_script>” to enable execution and resubmit.

Check the status of your jobs on Quest

showq -u <your_netID> Shows your active jobs and their job numbers
checkjob <job_number> Shows information about your job, including job status
checkjob -v -v <job_number> Very verbose report from checkjob; useful for debugging

Canceling your job

From the command line, type “canceljob <job_number>”

Join the Genomics Conversation

The genomics nodes on Quest have a Slack community open to anyone with a  Northwestern email address.  Join the Genomics Slack channel.

Getting help

Learn more about using Quest. If you need more personalized assistance, send an email stating your issue to quest-help@northwestern.edu so the Research Computing Services team may assist you with your issue.

Acknowledgment of Use – Genomics Computing Cluster

This research was supported in part through the computational resources and staff contributions provided by the Genomics Computing Cluster which is jointly supported by the Feinberg School of Medicine, the Center for Genetic Medicine, and Feinberg’s Department of Biochemistry and Molecular Genetics, the Office of the Provost, the Office for Research, and Northwestern Information Technology. The Genomics Computing Cluster is part of Quest, Northwestern University’s high-performance computing facility, with the purpose to advance research in genomics.

Last Updated: 5 September 2017

Get Help Back to top