Child pages
  • Running Pydpiper on SciNet
Skip to end of metadata
Go to start of metadata

SciNet is being decommissioned. It is no longer possible to submit compute jobs, and disk access will be available only until May 9th, 2018. To run MICe tools on Compute Canada systems, see the Pydpiper on Graham page.

Most of the information here not specific to Pydpiper is condensed from the detailed SciNet wiki.

Initial setup

  1. Apply for a Compute Canada account.
  2. Apply for a SciNet account.  (You need to be logged in using your Compute Canada account to see this page.)  This may take several days to be approved.
  3. SSH into SciNet (you may wish to generate SSH keys and run ssh-add to streamline the login process; see these instructions):

    ssh -X yourname@login.scinet.utoronto.ca

    (The -X flag is for X windows forwarding, and may be omitted.)

Don't set paths or load modules in your ~/.bashrc.  It's sourced by the scripts submitted to the remote machines, so doing so will eventually cause your pipelines to fail in mysterious ways.

Running a job

Once logged in, SSH from your current login node to one of eight devel nodes:

/scinet/gpc/bin/gpcdev

This is effectively ssh gpc0N, where N is from 1 to 8, depending on node availability.

We use "modules" to set the paths to various versions of Pydpiper and its dependencies.  To use our custom module files, execute the following commands:

# make our own modules available for loading (look in this directory to see what MICe modules are available and what they do):
module use -a /project/j/jlerch/matthijs/privatemodules
 
# load the modules that allow you to run image registration and analysis:
module load gcc/5.2.0 intel/15.0.2 gnuplot/4.6.1 hdf5/187-v18-serial-intel openblas/1.13-multithreaded  gotoblas/1.13-singlethreaded octave gcclib/5.2.0 jpeg/v9b cmake/3.5.2 Xlibraries/X11-64 extras/64_6.4 ImageMagick/6.6.7 python/3.5.1 curl/7.49.1 R/3.3.0 minc-toolkit minc-stuffs pydpiper RMINC
 
# to see which modules you specifically loaded, run
module list
 
# Currently (april 2017), you should see:
Currently Loaded Modulefiles:
  1) nano/2.2.4                     4) cmake/3.5.2                    7) extras/64_6.4                 10) python/3.5.1                  13) minc-toolkit/1.9.15           16) RMINC/1.4.3.4
  2) gcclib/5.2.0                   5) intel/15.0.2                   8) Xlibraries/X11-64             11) curl/7.49.1                   14) minc-stuffs/0.1.20
  3) jpeg/v9b                       6) openblas/0.2.13-intel-serial   9) ImageMagick/6.6.7             12) R/3.3.0                       15) pydpiper/master-v2.0.7
 
# you can also load specific pydpiper modules. To see which ones are available run:
module avail pydpiper

Modules are a bit finicky.  If you get weird errors here, type `module purge` and try again.  Also try `module help` for some other commands.

In addition to loading prerequisite modules and putting Pydpiper scripts on your path, this will also set $PYDPIPER_CONFIG_FILE to point to a file specifying some SciNet-specific options.  For instance:

gpc-f101n084-ib0-$ module load pydpiper/master-v1.13.1
gpc-f101n084-ib0-$ echo $PYDPIPER_CONFIG_FILE
/project/j/jlerch/matthijs/pydpiper_modules/pydpiper-master-v1.13.1/python/pydpiper-1.13.1-py2.7.egg/config/SciNet.cfg
gpc-f101n084-ib0-$ cat $PYDPIPER_CONFIG_FILE 
[SciNet (batch queue)]
mem 14 ; amount of memory available on a compute node
proc 8 
ppn 8
queue-name batch
queue-type pbs
min-walltime 900 ; sec
max-walltime 57600 ; sec (16 hours)
time-to-seppuku 30 ; minutes

You can use command-line flags to override these defaults.

Run your Pydpiper command as usual.  This must be done in /scratch, since /home is not writeable from the cluster.  Initial models are kept in:

/project/j/jlerch/matthijs/init-models/

You don't need to specify any queue options yourself (so you do not have to specify: --proc, --mem, --ppn, --queue-name, --queue-type, etc.; see above regarding configuration).  However, you should specify --time=HH:MM:SS, giving the total (sequential) time you expect your job to take, as well as --num-executors=N, where N is the number of brains in your study divided by 4 (for 56-micron data) or by 2 (for 40-micron data).  The explanation for this is as follows: we launch one executor to manage each 8-CPU compute node, so you might expect one executor for every 8 brains, but memory constraints force use to register fewer brains per node.  The current situation is a bit of a hack, and in the future we might be able to guess an appropriate number of nodes to request automatically.

The information below about time settings is somewhat dated and turned out not to work well for very large pipelines, revealing limitations of SciNet's suitability for use with our pipelines.

Note that the way Pydpiper executes pipelines on SciNet is by running batches of 1 server and a bunch of executors (you will specify the number of executors for your command using --num-executors=N). Because it is easier to get onto the SciNet compute machines when you request from a smaller time slot, it's better not to request for 48 hours for each job you have. For an MBM.py pipeline for instance, the time requested should be a bit longer than the longest job in the pipeline (last nonlinear stage). For a 56 micron mouse brain pipeline that stage for mincANTS takes about 6-7 hours, for 40 micron data this stage can take up to 9-10 hours. Currently the configuration file for pydpiper requests for 16 hours for each server/executor. This is a bit longer than necessary, but is far better than asking for 48 hours.

Some --time examples

Example 1: you have 20 56-micron mouse brains. The final mincANTS stages take about 3G of memory, so 4 processes can run on a compute node. The current config file runs the executors for 16 hours, and the final nonlinear stage takes about 8 hours, which should leave enough time to finish the rest of the pipeline. So we can submit 5 executors with the 16 hours:

Brains: 20
Size: 56 micron
--num-executors=5
--time=16:00:00

What will be submitted:

job1: server/executor (16 hours)
job2, job3, job4, job5: executors tied to job1 (16 hours each, hopefully starting at approximately the same time)

Example 2:  you have 20 40-micron mouse brains. The final mincANTS stages take about 7G of memory, so 2 processes can run on a compute node (we need 10 executors in this case). The current config file runs the executors for 16 hours, and the final nonlinear stage takes about 10 hours, which means we probably need 2 rounds of executors to finish all the other stages as well. Here is what we'll do:

Brains: 20
Size: 40 micron
--num-executors=10
--time=32:00:00

What will be submitted:

job1: server/executor (16 hours)
job2, job3, ..., job10: executors tied to job1 (16 hours each)

job11: server/executor (16 hours)
job13, job14, ..., job20: executors tied to job11 (16 hours each)

Make sure to transfer your results to another filesystem - /scratch is not backed up and unused files there are deleted after 3 months, though you'll get an email before this happens.

Transferring data

For transferring small volumes of data (less than 10G), such as your input .mnc files, you can use scp or rsync as usual:

scp -r /hpf/largeprojects/MICe/yourname/inputfiles/ yourname@login.scinet.utoronto.ca:/scratch/j/jlerch/yourname

For larger volumes, SciNet requests that you transfer via the datamover nodes.  However, the two datamover nodes are behind a firewall and inaccessible from the outside, so you must log into one of them from elsewhere on SciNet, and won't be able to transfer data from behind the Sick Kids firewall unless we expose a node to the outside.  Instead, transfer data via the login nodes using the ability of the rsync command to resume interrupted transfers in the event that your transfer process runs afoul of the login nodes' 5-minute CPU time limit (follow the instructions linked above to generate and use SSH keys to avoid being repeatedly prompted for your password):

for i in {1..100}; do        # try 100 times
  rsync ...                  # see below for rsync commands
  [ "$?" == "0" ] && break
done

On SciNet, you have access to the directories $HOME, $SCRATCH, and $PROJECT (which you may need to create via mkdir).  The compute nodes have read-only access to $HOME and $PROJECT but can only write to $SCRATCH, so you should run pipelines in this directory.  Here's an example of transferring some data there:

# on SciNet:
cd $SCRATCH
mkdir -p example_project/inputfiles
cd example_project/inputfiles

# at MICe:
# now use rsync to transfer data over into this directory
# port 10248 is bianca; alternatively, you can use the following ports: 10249 (geronimo), 10250 (topolina)
# the command ends in " ." (i.e. a space and a dot) which means you want the output files to be stored in the current directory.
# this could be replaced with an explicit directory, e.g., $SCRATCH/example_project/inputfiles/
rsync -cav --progress  /hpf/largeprojects/MICe/user/files_for_registration/*mnc username@login.scinet.utoronto.ca:example_project/inputfiles

When transferring results back to MICe, you can use rsync flags to avoid transferring unnecessary files:

despereaux$ rsync -uvr --exclude '*/tmp/' --exclude '*/transforms/*' --exclude '*/log/'  bcdarwin@login.scinet.utoronto.ca:/scratch/j/jlerch/bcdarwin/example_project/ /hpf/largeprojects/MICe/bdarwin

You can check your disk quotas via the command diskUsage (available after module load extras or in /scinet/gpc/bin6/).  Note that we also have access to a large volume of tape storage on SciNet.

Also see https://support.scinet.utoronto.ca/wiki/index.php/Data_Management#Data_Transfer.

Monitoring your jobs

First and foremost, you can run the check_pipeline_status.py command when a server is actually running at SciNet, similar to what you'd do at MICe (first load the modules as discussed in running a job):

> check_pipeline_status.py uri_file

If your pipeline is not running however (e.g. jobs are idle/blocked in the queue), you can find out whether there are stages that have failed, by running the following command:

grep RETRYING pipeline.log

# a sample output indicating issues could be:

[2015...INFO] RETRYING: ERROR in Stage 7: mincblur ...
[2015...INFO] RETRYING: adding this stage back to the runnable queue.
[2015...INFO] RETRYING: Logfile for Stage /scratch/j/jlerch/matthijs/some_log_file_to_examine.log

# in which case you want to look at that log file and see what's going wrong

A rough estimate of how far along your pipeline is when it's not currently running can be gotten by comparing the total number of stages in your pipeline and the number of finished stages:

cat *_pipeline_stages.txt | wc -l
cat *_finished_stages | wc -l

# if you have more than one pipeline in your directory, you should specify the pipeline name, i.e.:

cat {pipeline_name}_pipeline_stages.txt | wc -l
cat {pipeline_name}_finished_stages | wc -l

You can use the following commands to look at your jobs in the queue:

# by using your username (matthijs in this example)
showq -w user=matthijs

# or using qstat
qstat -u matthijs

# you can also look at all jobs in the queue within your group (jlerch here)
showq -w group=jlerch

Using the showstart command you can get an estimate of when your job might start running:

# you can find the job number either through the showq or qstat commands. 26762477 is the example job number here
showstart -e all 26762477

The output of this might be as follows:

gpc-f102n084-ib0-$ showstart -e all 26762477
job 26762477 requires 8 procs for 1:00:00:00

# the following bases the start on a queue without anyone else around 
# (i.e. very unrealistic)
Estimated Rsv based start in               00:01:14 on Tue Jan 13 15:40:40
Estimated Rsv based completion in        1:00:01:14 on Wed Jan 14 15:40:40

# this bases the start of your job on your current priority and other 
# people in the queue (good estimate)
Estimated Priority based start in           8:08:10 on Tue Jan 13 23:47:36
Estimated Priority based completion in   1:08:08:10 on Wed Jan 14 23:47:36

# this factors in some historical data
Estimated Historical based start in         8:09:26 on Tue Jan 13 23:48:52
Estimated Historical based completion in 1:08:09:26 on Wed Jan 14 23:48:52

Best Partition: DDR

To get detailed information on a job:

qstat -f 26762477  # if job ID is omitted, show info on all jobs

You can cancel jobs with qdel as usual, though SciNet recommends using canceljob.

The March 2015 tech talk slides have many details on scheduling/allocation, monitoring jobs, disk quota, and various usage summaries, while the slightly older job monitoring tech talk has further details on monitoring running jobs, finding what nodes your jobs are on (also see the executor log files), SSHing into the compute nodes, finding the stderr and stdout of the jobs (although we now redirect stdout into a file), &c.

Acknowledging SciNet

https://support.scinet.utoronto.ca/wiki/index.php/Acknowledging_SciNet

For developers

SciNet time estimates