Child pages
  • Pydpiper on the SickKids HPF

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Next, from a login node:

Code Block
languagebash
[bdarwin@hpf23bdarwin@hpclogin3 ~]$ # if your pipeline is very large, you can bump this up to a maximum of vmemmem=16G,walltime=120:00:00:
[bdarwin@hpf23bdarwin@hpclogin3 ~]$ qlogin -l vmem=2G8G,mem=8G,walltime=72:00:00
qsub: waiting for job 1904303 to start
qsub: job 1904303 ready
[bdarwin@qlogin1 ~]$

...

For instance, you could achieve this by starting a screen session on the login nodes - but apparently not the qlogin nodes - as follows:

[bdarwin@hpf23bdarwin@hpclogin3 ~]$ screen
[bdarwin@hpf23bdarwin@hpclogin3 ~]$ qlogin ...
[bdarwin@qlogin1 ~]$ MBM.py ...

...

# now a screen session is running independently on hpf23 hpclogin3 ... to reattach:

[bdarwin@hpf23 bdarwin@hpclogin3 ~]$ screen -r   # reattach

...

# if you're satisfied with your pipeline's progress, you can detach again ...

HPF has 4 several different login machines: hpf23, hpf24, hpf25 and hpf26, currently hpclogin1 through hpclogin4. When you log back into HPF you might end up on one of the other login nodes. You will find out by running the following command:

Code Block
[matthijs@hpf24matthijs@hpclogin4 ~]$ screen -ls
No Sockets found in /var/run/screen/S-matthijs.

Now what do you do? The answer is that you can simply ssh into the login node you want:

Code Block
[matthijs@hpf24matthijs@hpclogin4 ~]$ ssh hpf23hpclogin3
Last login: Tue Sep 26 13:31:55 2017 from hpf24.cm.cluster
[matthijs@hpf23matthijs@hpclogin3 ~]$

# now you will see your screen again:

[matthijs@hpf23matthijs@hpclogin3 ~]$ screen -ls
There is a screen on:
    13900.pts-6.hpf23    (Detached)
1 Socket in /var/run/screen/S-matthijs.

# and you can re-attach to your screen like so:

[matthijs@hpf23matthijs@hpclogin3 ~]$ screen -r 13900.pts-6.hpf23

...

Code Block
# starting empty:
screen -ls
No Sockets found in /var/run/screen/S-matthijs.

# now start a screen
[matthijs@hpf23matthijs@hpclogin3 ~]$ screen

#.... do things in your screen and detach by using Ctrl+a and then Ctrl+d
# you can see that screen:

[matthijs@hpf23matthijs@hpclogin3 ~]$ screen -ls
There is a screen on:
    13976.pts-6.hpf23    (Detached)
1 Socket in /var/run/screen/S-matthijs.

# you don't have to reattach to the same screen, but can start a new one just by typing screen again:
[matthijs@hpf23matthijs@hpclogin3 ~]$ screen

# now after doing some work and detaching again, you'll see the following:

[matthijs@hpf23matthijs@hpclogin3 ~]$ screen -ls
There are screens on:
    13998.pts-6.hpf23    (Detached)
    13976.pts-6.hpf23    (Detached)
2 Sockets in /var/run/screen/S-matthijs.

# you can reattach to the screen you want by specifying it specifically:
[matthijs@hpf23matthijs@hpclogin3 ~]$ screen  -r 13998.pts-6.hpf23

...

How long will your qlogin session still run for?

Warning

This section is out of date following the Centos 6 → Centos 7 upgrade on March 1, 2021.  You can use `/opt/qlogin_torque/bin/qstat` in a similar way as below but it doesn't seem to show the wall time elapsed, making this endeavour somewhat futile.


In the previous command you might have specified vmemmem=2G8G,walltime=72:00:00. A day or two later you can find out how much time is left in that session using:

Code Block
matthijs@mrjingles:~$ ssh hpf.ccm.sickkids.ca
Password:
[matthijs@hpf26matthijs@hpclogin6 ~]$
 
# on this node (so prior to logging into a qlogin node, you can run the following:
[matthijs@hpf26matthijs@hpclogin6 ~]$ /opt/qlogin/bin/qstat  -u $USER
# In my case this shows:

qtorquemaster.hpf.cluster:  
                                                                                  Req'd    Req'd       Elap

Job ID                  Username    Queue    Jobname          SessID  NDS   TSK   Memory   Time    S   Time

----------------------- ----------- -------- ---------------- ------ ----- ------ ------ --------- - ---------

101958.qtorquemaster.h  matthijs    qloginQ  STDIN            229398     1      1    2gb  02:00:00 R  00:54:45
 
# so I have roughly an hour left in the 2 hour qlogin session.

...

You generally don't need to specify –mem, –proc, –time --mem, --proc, --time, --queue-type, &c. since this is done in the configuration file.

...