diff --git a/team/drp.rst b/team/drp.rst index b5b9d533..67dcd98d 100644 --- a/team/drp.rst +++ b/team/drp.rst @@ -51,13 +51,15 @@ Obtaining Accounts Accounts are issued on demand at the request of an appropriate PI. For our group, that means you should speak to either Robert or Yusra, and they will arrange one for you. -When your account has been created, you should check that you are a member of the groups ``astro``, ``hsc``, and ``lsst`` (use the :command:`groups` command). +When your account has been created, you should check that you are a member of the groups ``astro``, ``hsc``, ``lsst`` and ``rubin`` (use the :command:`groups` command to check). .. note:: - A new user account may not have the ``lsst`` group added by default. - This group is not being used for anything at present, so it shouldn't be a problem if you are not a member of it. - If you find that you do need to be a member of this group, please contact Robert or Yusra. + The ``lsst`` group is a shared group which allows all Tiger3 users to access the shared stack. + Being a member of this group does not provide access to shared Rubin data. + Instead, the ``rubin`` group is used to control access to Rubin data repositories. + In the prior Tiger2 cluster, the ``hsc`` group was used for both purposes. + If you find that you need to be made a member of any of these groups, please contact Robert, Yusra or Lee. .. _drp-princeton-available-systems: @@ -75,7 +77,7 @@ You can use this node for building software and running small and/or short-lived The ``/project`` filesystems are NFS-mounted on the Princeton clusters. As a consequence, the performance of these filesystems will be limited by the network speed between our head node and the filesystem. - For anything more than even the most basic testing, it is therefore strongly recommended that batch processing in your ``/scratch/gpfs/$USER`` space be utilized where possible instead of working directly on the head node (see :ref:`drp-princeton-cluster-usage`). + For anything more than even the most basic testing, it is therefore strongly recommended that batch processing takes place in your ``/scratch/gpfs/RUBIN/user/${USER}`` space (see :ref:`drp-princeton-cluster-usage`). .. _drp-princeton-shared-stack: @@ -88,19 +90,19 @@ To initialize the stack in your shell, run: .. code-block:: shell - source /scratch/gpfs/HSC/LSST/stack/loadLSST.sh + source /scratch/gpfs/LSST/stack/loadLSST.sh setup lsst_distrib By default, the most recent Rubin Environment will be used, as provided by the ``LSST_CONDA_ENV_NAME`` variable within the ``loadLSST.sh`` script. -If you wish to use a different version of the stack, you can do so by first setting the ``LSST_CONDA_ENV_NAME`` variable to the desired version before setting up the Science Pipelines: +If you wish to use a different Rubin Environment, you can do so by first setting the ``LSST_CONDA_ENV_NAME`` variable to the desired version before setting up the Science Pipelines: .. code-block:: shell - export LSST_CONDA_ENV_NAME="lsst-scipipe-4.0.1" - source /scratch/gpfs/HSC/LSST/stack/loadLSST.sh + export LSST_CONDA_ENV_NAME="lsst-scipipe-9.0.0" + source /scratch/gpfs/LSST/stack/loadLSST.sh setup lsst_distrib -t - # To reset to the default, uncomment this line before setting up again: + # To reset to the default, unset the variable before sourcing the script: # unset LSST_CONDA_ENV_NAME A list of all currently installed Rubin Environments can be found by running: ``mamba env list``. @@ -108,26 +110,23 @@ A list of all currently installed Rubin Environments can be found by running: `` .. note:: The current default shared stack, described above, is a symbolic link to the latest build using the post-:jira:`RFC-584` Conda environment. - Older builds, if any, are available in ``/scratch/gpfs/HSC/LSST/`` with the syntax ``stack_YYYYMMDD``. + Older builds, if any, are available in ``/scratch/gpfs/LSST/stacks`` with the syntax ``stack_YYYYMMDD``. .. _drp-princeton-repositories: Repositories ------------ -We currently maintain two data repositories for general use on the Princeton clusters: +We currently maintain a single data repository for general use on the Princeton clusters: -- ``/scratch/gpfs/HSC/LSST/repo/main``: The primary HSC/LSST butler data repository, containing all raw HSC data on-disk and a selection of non-embargoed LATISS data. -- ``/scratch/gpfs/HSC/LSST/repo/dc2``: The primary DC2 butler data repository, containing a selection of simulated DC2 data. +- ``/scratch/gpfs/RUBIN/repo/main``: The primary HSC/LSST butler data repository, containing raw HSC RC2 data. -For information on accessing these repositories, including setting up required permissions, see the top-level ``/scratch/gpfs/HSC/LSST/repo/README.md`` file. +For information on accessing repositories, including setting up required permissions, see the top-level ``/scratch/gpfs/RUBIN/repo/README.md`` file. .. note:: You will not be able to access the data within these repositories without first following the **Database Authentication** instructions in the above ``README.md`` file. -Information more specific to each repository is stored within a secondary ``README.md`` file in each repository's root directory. - .. _drp-princeton-storage: Storage @@ -138,11 +137,14 @@ This space may also be used to store your results. Note however that space is at a premium; please clean up any data you are not actively using. Also, be sure to set :command:`umask 002` so that your colleagues can reorganize the shared space. -For temporary data processing storage, shared space is available in :file:`/scratch/gpfs/` (you may need to make this directory yourself). +For long-term storage of user data, shared space is available in :file:`/projects/HSC/users/` (you may need to make this directory yourself). +This space is backed up, but it is **not** visible to the compute nodes. + +For temporary data processing storage, shared space is available in :file:`/scratch/gpfs/RUBIN/user/` (you may need to make this directory yourself). This General Parallel File System (GPFS) space is large and visible from all Princeton clusters, however, it is **not** backed up. More information on `Princeton cluster data storage `_ can be found online. -Space is also available in :file:`/scratch/` and in your home directory, but note that they are not shared across clusters (and, in the case of ``/scratch``, not backed up). +Space is also available in your home directory, but note that it is not shared across clusters. Use the :command:`checkquota` command to check your current storage and your storage limits. More information on storage limits, including on how to request a quota increase, can be found at `this link `_. @@ -157,14 +159,8 @@ Jobs are managed on cluster systems using `SLURM `_; Batch processing functionality with the Science Pipelines is provided by the `LSST Batch Processing Service (BPS) `_ module. BPS on the Princeton clusters is configured to work with the `ctrl_bps_parsl plugin `_, which uses the `Parsl `_ workflow engine to submit jobs to SLURM. -.. note:: - - Due to changes that occurred in Q1 2023 relating to how disks are mounted on the Tiger cluster, use of the ``ctrl_bps_parsl`` plugin will return an ``OSError`` when used in conjunction with any weeklies older than ``w_2023_09``. - To make use of BPS with older weeklies, you will need to build and set up the ``ctrl_bps_parsl`` plugin yourself. - Refer to the `ctrl_bps_parsl plugin documentation `_ and links therein for further details. - To submit a job to the cluster, you will first need to create a YAML configuration file for BPS. -For convenience, two generic configuration files have been constructed on disk at ``/projects/HSC/LSST/bps/bps_tiger.yaml`` and ``/projects/HSC/LSST/bps/bps_tiger_clustering.yaml``. +For convenience, two generic configuration files have been constructed on disk at ``/scratch/gpfs/RUBIN/bps/bps_tiger.yaml`` and ``/scratch/gpfs/RUBIN/bps/bps_tiger_clustering.yaml``. The former is intended for general use, while the latter is intended for use with quantum clusering enabled. These files may either be used directly when submitting a job or copied to your working directory and modified as needed. The following example shows how to submit a job using the generic configuration file: @@ -178,20 +174,20 @@ The following example shows how to submit a job using the generic configuration export NUMEXPR_MAX_THREADS=1 # All submissions must be made from your /scratch/gpfs directory. - cd /scratch/gpfs/$USER + cd /scratch/gpfs/RUBIN/user/${USER} # Save the output of the BPS submit command to a log file # (optional, but recommended). - LOGFILE=/path/to/my/log/file.txt + LOGFILE=$(realpath bps_log.txt) # Submit a job to the cluster. date | tee $LOGFILE; \ $(which time) -f "Total runtime: %E" \ - bps submit /projects/HSC/LSST/bps/bps_tiger.yaml \ - --compute-site tiger_1h_1n_40c \ - -b /projects/HSC/repo/main \ + bps submit /scratch/gpfs/RUBIN/bps/bps_tiger.yaml \ + --compute-site tiger_1n_112c_1h \ + -b /scratch/gpfs/RUBIN/repo/main \ -i HSC/RC2/defaults \ - -o u/$USER/test \ + -o u/${USER}/scratch/bps_test \ -p $DRP_PIPE_DIR/pipelines/HSC/DRP-RC2.yaml#step1 \ -d "instrument='HSC' AND visit=1228" \ 2>&1 | tee -a $LOGFILE; \ @@ -202,8 +198,8 @@ The following example shows how to submit a job using the generic configuration # --extra-qgraph-options "-c isr:doOverscan=False" A number of different compute sites are available for use with BPS as defined in the generic configuration file. -Select a compute site using the syntax ``tiger_Xh_Xn_Xc``, where ``X`` is replaced by the appropriate number of hours, nodes, and cores. -You may check the available compute sites defined in the generic configuration file using: ``grep "tiger" /projects/HSC/LSST/bps/bps_tiger.yaml``. +Select a compute site using the syntax ``tiger_${NODES}n_${CORES}c_${TIME}h``, replacing the variables by the appropriate number of nodes, cores and hours. +You can check the available compute sites defined in the generic configuration file using: ``grep "tiger_" /scratch/gpfs/RUBIN/bps/bps_tiger.yaml``. The following table lists the available compute site dimensions and their associated options: .. list-table:: @@ -211,12 +207,12 @@ The following table lists the available compute site dimensions and their associ * - Dimension - Options - * - Walltime (Hours) - - 1, 5, 24, 72 * - Nodes - - 1, 4, 10 + - 1, 10 * - Cores per Node - - 1, 5, 10, 20, 40 + - 1, 28, 112 + * - Walltime (Hours) + - 1, 5, 24, 72 A list of all available nodes is given using the :command:`snodes` command, or alternatively using :command:`sinfo`: @@ -228,7 +224,7 @@ To get an estimate of the start time for any submitted jobs, the :command:`squeu .. code-block:: shell - squeue -u $USER --start + squeue -u ${USER} --start To show detailed information about a given node, the :command:`scontrol` may be used: @@ -236,7 +232,7 @@ To show detailed information about a given node, the :command:`scontrol` may be scontrol show node -It is occasionally useful to be able to bring up an interactive shell on a compute node. +It is occasionally useful to be able to directly log in to an interactive shell on a compute node. The following should work: .. code-block:: shell @@ -254,16 +250,16 @@ Access to all of the Princeton clusters is only available from within the Prince If you are connecting from the outside, you will need to bounce through another host on campus first. Options include: +- Jumping through the Research Computing ``tigressgateway`` host; - Bouncing your connection through a `host on the Peyton network `_ (this is usually the easiest way to go); - Making use of the `University's VPN service `_. -- Using the Research Computing gateway. -If you choose the first option, you may find the ``ProxyCommand`` option to SSH helpful. +If you choose the first or second options, you may find the ``ProxyCommand`` or ``ProxyJump`` options to SSH helpful. For example, adding the following to :file:`~/.ssh/config` will automatically route your connection to the right place when you run :command:`ssh tiger`:: - Host tiger - HostName tiger2-sumire.princeton.edu - ProxyCommand ssh coma.astro.princeton.edu -W %h:%p + Host tiger + HostName tiger2-sumire.princeton.edu + ProxyCommand ssh coma.astro.princeton.edu -W %h:%p The following SSH configuration allows access via the Research Computing gateway:: @@ -274,14 +270,24 @@ The following SSH configuration allows access via the Research Computing gateway Host tiger Hostname tiger2-sumire.princeton.edu +or alternatively:: + + Host tigressgateway + HostName tigressgateway.princeton.edu + Host tiger + Hostname tiger2-sumire.princeton.edu + ProxyJump tigressgateway + (It may also be necessary to add a ``User`` line under ``Host tigressgateway`` if there is a mismatch between your local and Princeton usernames.) Entry to ``tigressgateway`` requires `2FA `_; we recommend using the ``ControlMaster`` feature of SSH to persist connections, e.g.:: ControlMaster auto - ControlPath ~/.ssh/controlmaster-%r@%h:%p + ControlPath ~/.ssh/cm/%r@%h:%p ControlPersist 5m +(It may be necessary to create the directory ``~/.ssh/cm``.) + See also the `Peyton Hall tips on using SSH `_. .. _drp-princeton-help-support: