nog functions
Functions for the neuronal oscillations group (nog). They are command line functions that should streamline the interaction with the ARC cluster. Once installed correctly (see initialise), they can be accessed from the ARC terminal. To make nog functions work properly, you MUST follow the setup procedure documented in initialise
Available Functions:
Prerequisite
- you are on a login node (not gateway, use
ssh htc-login) - System Environment set up (see initialise)
- MNE set up for
nog_jupyter_server(see setup-mne) - some mamba environment to be used with vscode set up for
nog_vscode_server(see setup-mamba-environment)
nog_jupyter_server
This functions will submit a batch job to the cluster, which sets up a jupyter server that you can connect to from your laptop:
nog_jupyter_server
Optional arguments and usage:
nog_jupyter_server [-h] [-b] [-m MEM] [-c CPUs] [-t TIME] [-p PARTITION] [-C CLUSTER]
-h help
-m memory (default: 16G)
-c CPUs (default: 8)
-t walltime (default: 08:00:00)
-p partition (default: medium)
-C cluster (default: htc)
Example:
nog_jupyter_server -m 16G -t 08:00:00
A server will start in the background and shortly after, instructions on how to connect will be printed to the Terminal. When connecting to the server, be patient, as this can take up to several minutes until everything is set up.
nog_vscode_server
This functions will submit an interactive job to the cluster, which sets up a VSCode server that you can connect to from your laptop:
nog_vscode_server
Optional arguments and usage:
nog_vscode_server [-h] [-g] [-m MEM] [-c CPUs] [-t TIME] [-p PARTITION] [-C CLUSTER]
-h help
-g request GPU
-m memory (default: 16G)
-c CPUs (default: 4)
-t walltime (default: 08:00:00)
-p partition (default: medium)
-C cluster (default: htc)
Example:
nog_vscode_server -m 16G -t 08:00:00
A server will start in the background and shortly after, instructions on how to connect will be printed to the Terminal. When connecting to the server, be patient, as this can take up to several minutes until everything is set up.
nog_rfs
This functions allows the user to interact with the RFS file storage. Requires that the RFS credentials file is set up (see initialise).
The main function call is followed by one of those actions. See help and examples for every action below:
nog_rfs (ls|mkdir|push|pull|rm) REMOTE_FILE [LOCAL_FILE]...
'ls' can take one argument - remote subdirectory to list (e.g. nog_rfs ls "RFS_DIR/RFS_SUBDIR")
'mkdir' can take two arguments - a directory tree to create, and a remote directory to create it in
(e.g. nog_rfs mkdir "NEW_DIR/NEW_SUBDIR" "RFS_DIR/RFS_SUBDIR")
if only one is given, remote directory is assumed to be the RFS root directory
'push' can take two arguments - local file or directory to push, and remote directory to put it into
(e.g. nog_rfs push "FILE_OR_DIR" "RFS_DIR/RFS_SUBDIR")
if only one is given, remote directory is assumed to be the RFS root directory
'pull' [-d] can take two arguments - remote directory to fetch, and local directory to put it into
-d for pulling directories. Ommit -d for single files
(e.g. nog_rfs pull -d "RFS_DIR/RFS_SUBDIR" "//data/psyc-neuosc/psyc1908/")
if only one is given, local directory is assumed to be the current working directory
'rm' [-d] takes one argument - remote file or folder to delete
-d for deleting directories. Ommit -d for single files
(e.g. nog_rfs rm -d "RFS_DIR/RFS_SUBDIR")
List RFS files and folders
# list files in RFS root
nog_rfs ls
# list files in RFS subdir
nog_rfs ls SOME_DIR/SOME_SUBDIR/
Create new directory on RFS
# make new directory in root
nog_rfs mkdir NEW_DIR
# make new directory in other directory
nog_rfs mkdir NEW_SUBDIR NEW_DIR/
# make new directory tree in root
nog_rfs mkdir NEW_DIR/NEW_SUBDIR
# make new directory tree in other directory
nog_rfs mkdir NEW_SUBSUBDIR/NEW_SUBSUBSUBDIR NEW_DIR/NEW_SUBDIR
Push data to RFS
# Push files or directories to root
nog_rfs push file_or_dir
# Push files or directories to other directory
nog_rfs push file_or_dir OTHER_DIR/
Pull data from RFS
Note that here you have to indicate whether you want to push a file or directory. Use the -d flag to pull directories (no flag = file). Note that when pulling a directory, not the directory itself, but it's content gets pulled.
# pull file to current directory
nog_rfs pull rfs_file
# pull file to other directory
nog_rfs pull rfs_file $DATA/
# pull directory to current directory
mkdir rfs_data
nog_rfs pull -d rfs_data rfs_data/
# pull directory to other directory
mkdir $DATA/rfs_data
nog_rfs pull -d rfs_data $DATA/rfs_data/
Remove files or directories from RFS
Note that here you have to indicate whether you want to remove a file or directory. Use the -d flag to remove directories (no flag = file).
# Remove file
nog_rfs rm RFS_DIR/RFS_SUBDIR/rfs_file
# Remove directory
nog_rfs rm -d RFS_DIR/RFS_SUBDIR
nog_slurm
Useful for starting, monitoring or canceling jobs.
nog_slurm <action> [OPTIONS]
The main function call is followed by one of those actions. See help and examples for every action below:
help
Usage: nog_slurm <action> [ACTION OPTIONS]
Actions:
help Show this help message
submit Submit a script as a SLURM batch job (must be .sh or .py)
list List your jobs (optional live view)
usage Show current and maximum memory (optional live view)
log View log file of running job (optional live view)
cancel Cancel jobs
exec Execute a command when a job starts or finishes
Options for submit:
-e <env> Path to conda / mamba environment to activate before running (default: none)
-c <cpus> Number of CPUs requested (default: $CPUs)
-g <gpu> Set GPU specification. Uses --gres=<gpu> (default: no GPU)
-m <mem> Memory requested (default: $MEM)
-p <partition> Partition to submit to (default: $PARTITION)
-t <time> Wall time limit (default: $TIME)
-w [id] Wait for a job to finish before starting. Without argument waits for
last submitted (default). Otherwise specify [id].
Uses --dependency=afterany:[id]
-C <cluster> Start job on specified cluster (default: $CLUSTER)
Options for list:
(none) List all current jobs
-l Live update every $UPDATE_TIME seconds
-C <cluster> Show result for specified cluster (default: $CLUSTER)
Options for usage:
(none) Show output for last submitted job
-j <id> Show output for a specific job ID
-l Live update every $UPDATE_TIME seconds
-s Shows list of jobs. Select job via index.
-C <cluster> Show result for specified cluster (default: $CLUSTER)
Options for log:
(none) Show output for last submitted job
-j <id> Show output for a specific job ID
-l Live update every $UPDATE_TIME seconds
-s Shows list of jobs. Select job via index.
-C <cluster> Show result for specified cluster (default: $CLUSTER)
Options for cancel:
(none) Cancel last submitted job
-a Cancel ALL jobs on $CLUSTER
-j <id[,id]> Cancel specified comma-separated job IDs
-s Shows list of jobs. Select via comma seperated indices.
-C <cluster> Operate on jobs for specified cluster (default: $CLUSTER)
Options for exec:
(none) Watch last submitted job, execute when finished
-j <id> Watch a specific job ID
-s Shows list of jobs. Select job to watch via index.
-r Execute when job starts running (default: wait for finish)
-C <cluster> Operate on jobs for specified cluster (default: $CLUSTER)
Examples:
nog_slurm submit MyScript.sh # submit bash script with defaults
nog_slurm submit -e MyEnv -g 1 -m 32G -c 16 -t 24:00:00 -p long MyScript.py arg1 arg2
nog_slurm submit -w MyScript.py # wait for last submitted job to finish before running
nog_slurm submit -w 12345 MyScript.py # wait for job 12345 to finish before running
nog_slurm list # list current jobs on $CLUSTER
nog_slurm list -C arc -l # list current jobs on arc, live update
nog_slurm usage -j 12345 # report usage for job 12345
nog_slurm usage -j 12345 -l # live update for job 12345
nog_slurm log -l # Show log file for last job, live update
nog_slurm log -j 12345 # Show log file for job 12345
nog_slurm cancel # cancel last job on $CLUSTER
nog_slurm cancel -s # cancel selected jobs on $CLUSTER
nog_slurm cancel -a -C arc # cancel all jobs on arc
nog_slurm cancel -j 12345,12346 # cancel specific jobs on $CLUSTER
nog_slurm exec "echo done" # Print "done" after last submitted job finished
nog_slurm exec -r "echo started" # Print "started" after last submitted job started
nog_slurm exec -s -r "bash notify.sh" # select job, run script on start
submit
nog_slurm submit -e $DATA/environments/mne_env -m 32G -c 16 -t 12:00:00 -w 123456 $DATA/scripts/MyScript.py S42
Submits a batch job that looks like this:
#!/bin/bash
#SBATCH --partition=short
#SBATCH --job-name=MyScript
#SBATCH --cpus-per-task=16
#SBATCH --mem=32G
#SBATCH --time=12:00:00
#SBATCH --clusters=htc
#SBATCH --output=$DATA/logs/MyScript-%j.log
#SBATCH --error=$DATA/logs/MyScript-%j.log
#SBATCH --dependency=afterany:123456
source $HOME/.bashrc
source activate $DATA/environments/mne_env
python $DATA/scripts/MyScript.py S42
This will request 32 Gb of memory, 16 CPUs, for 12 hours, and submits a job that activates the mamba environment mne_env and runs the script MyScript.py with S42 as an argument. However, this job will only start after job 123456 finished (-w 123456).
Note that either mamba or conda is expected to be available via ~/.bashrc in order to load environments. See initialise on how to setup your system environment.
list
nog_slurm list -l
This will show all currently running jobs on htc and update the view every 10s (-l for "live"). Omit -l for a one time lookup.
usage
nog_slurm usage -j 123456 -l
Shows a live view (-l) of memory usage of job 123456 (-j 123456), which updates every 10s.
log
nog_slurm log -s -l
Displays all jobs, of which you can select one running jobs (-s). The log file corresponding to this job will be displayed in terminal and live updated (-l).
cancel
nog_slurm cancel -s -C arc
Displays all jobs, of which you can select jobs to be canceled (-s) in a comma separated format (e.g. 1,3,4). Selected jobs are canceled on the "arc" cluster (-C arc). If you e.g. want to cancel all jobs on "htc" use nog_slurm cancel -a -C htc.
exec
nog_slurm exec -j 123456 -r "nog_slurm log -l -j 123456"
After job 123456 (-j 123456) is listed as running (-r), show a live view of its log file ("nog_slurm log -l -j 123456").