Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/main' into ci-link-checking
Browse files Browse the repository at this point in the history
  • Loading branch information
jbusecke committed Dec 20, 2023
2 parents e58fa32 + f95169b commit 06c6f09
Show file tree
Hide file tree
Showing 10 changed files with 140 additions and 131 deletions.
1 change: 1 addition & 0 deletions .github/workflows/trigger-book-build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,5 @@ jobs:
environment_name: leap-docs-env
artifact_name: book-zip-${{ github.event.number }}
path_to_notebooks: "book"
build_command: "jupyter-book build -W --keep-going ."
# Other input options are possible, see ProjectPythia/cookbook-actions/.github/workflows/build-book.yaml
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -52,3 +52,4 @@ coverage.xml

book/_build/

.vscode/settings.json
2 changes: 1 addition & 1 deletion book/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ root: intro
parts:
- caption: LEAP-Pangeo
chapters:
- file: leap-pangeo/jupyterhub.md
- file: leap-pangeo/tutorial.md
- file: leap-pangeo/jupyterhub.md
- file: leap-pangeo/architecture
- file: leap-pangeo/implementation
- caption: Guides
Expand Down
14 changes: 7 additions & 7 deletions book/guides/education.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

🚧 Full Guide coming soon ... If you are a LEAP educator and want to run your class on the hub, please reach out to the [](support.data_compute_team).

(education:sing_up)=
(education.sing_up)=
### How to sign up students

Students should be signed up to the appropriate user [categories](users.categories) ahead of the class. Please direct your students to this documentation and try to ensure that everyone has [access to the Hub](hub:server:login) before the class starts.
Expand All @@ -13,17 +13,17 @@ Students should be signed up to the appropriate user [categories](users.categori

**Students cannot sign on**

Check if the students are part of the [appropriate github teams](users:categories).
Check if the students are part of the [appropriate github teams](users.categories).

If they **are not** follow these steps:
- [ ] Did the student [sign up for LEAP membership]()?
- [ ] Did the student receive a github invite? [Here](users.invite) is how to check for that.
- [ ] Check again if they are part of the [appropriate github teams](users:categories).
- If these steps do not work, please reach out to [](contact.data_compute_manager).
- [ ] Did the student [sign up for LEAP membership](users.membership.apply)?
- [ ] Did the student receive a github invite? [Here](users.membership.invite) is how to check for that.
- [ ] Check again if they are part of the [appropriate github teams](users.categories).
- If these steps do not work, please reach out to the [](support.data_compute_team).

If they **are**, ask them to try the following steps:
- [ ] Refresh the browser cache
- [ ] Try a different browser
- [ ] Restart the computer
- If these steps do not work, please reach out to [](contact.data_compute_manager).
- If these steps do not work, please reach out to the [](support.data_compute_team).

13 changes: 7 additions & 6 deletions book/guides/hub_guides.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,12 @@ We distinguish between two primary *types* of data to upload: "Original" and "Pu
- **Published Data** has been published and archived in a publically accessible location (e.g. a data repository like [zenodo](https://zenodo.org) or [figshare](https://figshare.com)). We do not recommend uploading this data to the cloud directly, but instead use [Pangeo Forge](https://pangeo-forge.readthedocs.io/en/latest/) to transform and upload it to the cloud. This ensures that the data is stored in an ARCO format and can be easily accessed by other LEAP members.
- **Original Data** is any dataset that is produced by researchers at LEAP and has not been published yet. The main use case for this data is to share it with other LEAP members and collaborate on it. For original data we support direct uploaded to the cloud. *Be aware that original data could change rapidly as the data producer is iterating on their code*. We encourage all datasets to be archived and published before using them in scientific publications.

##### Transform and Upload published data to an ARCO format (with Pangeo Forge)
#### Transform and Upload published data to an ARCO format (with Pangeo Forge)

Coming Soon

##### Upload medium sized original data from your local machine
(hub.guide.data.upload_manual)=
#### Upload medium sized original data from your local machine

For medium sized datasets, that can be uploaded within an hour, you can use a temporary access token generated on the JupyterHub to upload data to the cloud.

Expand All @@ -40,7 +41,7 @@ conda activate leap_pange_transfer

and set up a jupyter notbook (or a pure python script) that loads your data in as few xarray datasets as possible. For instance, if you have one dataset that consists of many files split in time, you should set your notebook up to read all the files using xarray into a single dataset, and then try to write out a small part of the dataset to a zarr store.

- Now start up a [LEAP-Pangeo server](leap.2i2c.cloud) and open a terminal. Install the [Google Cloud SDK](https://cloud.google.com/sdk/docs/install) using mamba
- Now start up a [LEAP-Pangeo server](https://leap.2i2c.cloud) and open a terminal. Install the [Google Cloud SDK](https://cloud.google.com/sdk/docs/install) using mamba

```shell
mamba install google-cloud-sdk
Expand Down Expand Up @@ -82,15 +83,15 @@ ds.to_zarr('gs://leap-scratch/<your_username>/test_offsite_upload.zarr') #adding

> Replace `<your_username>` with your actual username on the hub.
- Make sure that you can read the test dataset from within the hub (go back to [Basic writing to and reading from cloud buckets](hub:data:read_write)).
- Make sure that you can read the test dataset from within the hub (go back to [Basic writing to and reading from cloud buckets](hub.data.read_write)).

- Now the last step is to paste the code to load your actual dataset into the notebook and use `.to_zarr` to upload it.

> Make sure to give the store a meaningful name, and raise an issue in the [data-management repo](https://github.com/leap-stc/data-management/issues) to get the dataset added to the LEAP Data Library.
> Make sure to use a different bucket than `leap-scratch`, since that will be deleted every 7 days! For more info refer to the available [storage buckets](hub:data:buckets).
> Make sure to use a different bucket than `leap-scratch`, since that will be deleted every 7 days! For more info refer to the available [storage buckets](hub.data.buckets).
(hub:data:upload_hpc)=
(hub.data.upload_hpc)=
##### Uploading large original data from an HPC system (no browser access on the system available)

A commong scenario is the following: A researcher/student has run a simulation on a High Performance Computer (HPC) at their institution, but now wants to collaboratively work on the analysis or train a machine learning model with this data. For this they need to upload it to the cloud storage.
Expand Down
2 changes: 1 addition & 1 deletion book/how_to_cite.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

If you use any of the LEAP resources, please follow these guidlines to recognize our work.

## Add your publication to our [LEAP publication tracker]()
## Add your publication to our [LEAP publication tracker](https://docs.google.com/spreadsheets/d/1zVfivXK-GKLEma_uc-SAIRs7OP_qfTUmsypAQeVqstI/edit#gid=645657151)

## Cite LEAP-Pangeo Platform
If you used the JupyterHub platform to perform analysis, please add a statement similar to this to your acknowledgment section of the paper:
Expand Down
41 changes: 21 additions & 20 deletions book/leap-pangeo/jupyterhub.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,14 @@ For information who can access the hub with which privileges, please refer to
This document goes over the primary technical details of the JupyterHub.
- For a quick tutorial on basic usage, please see [Getting Started](tutorial.md).
- To get an in-depth overview of the LEAP Pangeo Architecture and how the JupyterHub fits into it, please see the [Architecture](architecture.md) page.
### The Software Environment
## The Software Environment
The software environment you encounter on the Hub is based upon [docker images](https://www.digitalocean.com/community/tutorials/the-docker-ecosystem-an-introduction-to-common-components) which you can run on other machines (like your laptop or an HPC cluster) for better reproducibility.

Upon start up you can choose between
- A list of preselected images
- The option of passing a custom docker image via the `"Other..."` option.

#### Preselected Images
### Preselected Images
LEAP-Pangeo uses several full-featured, up-to-date Python environments maintained by Pangeo. You can read all about them at the following URL:

- https://github.com/pangeo-data/pangeo-docker-images/
Expand All @@ -37,24 +37,24 @@ A complete list of all packages installed in this environment is located at:
:::{attention}
We regularly update the version of the images provided in the drop-down menu.

To ensure full reproducibility you should save the full info of the image you worked with (this is stored in the environment variable `JUPYTER_IMAGE_SPEC`) with your work. You can then use that string with the [custom images](hub:image:custom) to reproduce your work with exactly the same environment.
To ensure full reproducibility you should save the full info of the image you worked with (this is stored in the environment variable `JUPYTER_IMAGE_SPEC`) with your work. You can then use that string with the [custom images](hub.image.custom) to reproduce your work with exactly the same environment.
:::

(hub:image:custom)=
#### Custom Images
(hub.image.custom)=
### Custom Images

If you select the `Image > Other...` Option during [server login](hub:server:login) you can paste an arbitrary reference in the form of `docker_registry/organization/image_name:image_version`. As an example we can get the `2023.05.08` version of the pangeo tensorflow notebook by pasting `quay.io/pangeo/ml-notebook:2023.05.08`.

If you want to build your own docker image for your project, take a look at [this template](https://github.com/2i2c-org/hub-user-image-template) and the instructions to learn how to use [repo2docker](https://github.com/jupyterhub/repo2docker) to set up CI workflows to automatically build docker images from your repository.

#### Installing additonal packages
### Installing additonal packages

You can install additional packages using `pip` and `conda`.
However, these will disappear when your server shuts down.

For a more permanent solution we recommend building project specific dockerfiles and using those as [custom images](hub:image:custom).
For a more permanent solution we recommend building project specific dockerfiles and using those as [custom images](hub.image.custom).

### Files and Data
## Files and Data

Data and files work differently in the cloud.
To help onboard you to this new way of working, we have written a guide to Files and Data in the Cloud:
Expand All @@ -63,8 +63,8 @@ To help onboard you to this new way of working, we have written a guide to Files

We recommend you read this thoroughly, especially the part about Git and GitHub.

(hub:data:user_dir)=
#### Your User Directory
(hub.guide.data.user_dir)=
### Your User Directory

When you open your hub, you can navigate to the "File Browser" and see all the files in your User Directory
<img width="442" alt="image" src="https://github.com/leap-stc/leap-stc.github.io/assets/14314623/3ba6b45a-a077-4824-b0ec-9c111af50c33">
Expand All @@ -85,22 +85,22 @@ Please do not store large files in your user directory `/home/jovyan`. Your home
To check how much space you are using in your home directory open a terminal window on the hub and run `du -h --max-depth=1 ~/ | sort -h`.
:::

(hub:data:buckets)=
#### LEAP-Pangeo Buckets
(hub.data.buckets)=
### LEAP-Pangeo Buckets

LEAP-Pangeo provides users two cloud buckets to store data

- `gs://leap-scratch/` - Temporary Storage deleted after 7 days. Use this bucket for testing and storing large intermediate results. [More info](https://docs.2i2c.org/user/topics/data/cloud/#scratch-bucket)
- `gs://leap-persistent` - Persistent Storage. Use this bucket for storing results you want to share with other members.
- `gs://leap-persistent-ro` - Persistent Storage with read-only access for most users. To upload data to this bucket you need to use [this](hub:data:upload_hpc) method below.
- `gs://leap-persistent-ro` - Persistent Storage with read-only access for most users. To upload data to this bucket you need to use [this](hub.data.upload_hpc) method below.

Files stored on each of those buckets can be accessed by any LEAP member, so be concious in the way you use these.

- **Do not put sensitive information (passwords, keys, personal data) into these buckets!**
- **When writing to buckets only ever write to your personal folder!** Your personal folder is a combination of the bucketname and your github username (e.g. `gs://leap-persistent/funky-user/').

(hub:data:list)=
#### Inspecting contents of the bucket
(hub.data.list)=
### Inspecting contents of the bucket

We recommend using [gcsfs](https://gcsfs.readthedocs.io/en/latest/) or [fsspec](https://filesystem-spec.readthedocs.io/en/latest/) which provide a filesytem-like interface for python.

Expand All @@ -111,8 +111,8 @@ fs = gcsfs.GCSFileSystem() # equivalent to fsspec.fs('gs')
fs.ls('leap-persistent/funky-user')
```

(hub:data:read_write)=
#### Basic writing to and reading from cloud buckets
(hub.data.read_write)=
### Basic writing to and reading from cloud buckets

We do not recommend uploading large files (e.g. netcdf) directly to the bucket. Instead we recommend to write data as ARCO (Analysis-Ready Cloud-Optimized) formats like [zarr](https://zarr.dev)(for n-dimensional arrays) and [parquet](https://parquet.apache.org)(for tabular data) (read more [here](https://ieeexplore.ieee.org/document/9354557) why we recommend ARCO formats).

Expand All @@ -138,7 +138,7 @@ ds = xr.open_dataset('gs://leap-scratch/funky-user/processed_store.zarr', engine
... and you can give this to any other registered LEAP user and they can load it exactly like you can!

:::{note}
Note that providing the url starting with `gs://...` is assumes that you have appropriate credentials set up in your environment to read/write to that bucket. On the hub these are already set up for you to work with the [](hub:data:buckets), but if you are trying to interact with non-public buckets you need to authenticate yourself. Check out the sections [below](hub:data:upload_manual) to see an example how to do that.
Note that providing the url starting with `gs://...` is assumes that you have appropriate credentials set up in your environment to read/write to that bucket. On the hub these are already set up for you to work with the [](hub.data.buckets), but if you are trying to interact with non-public buckets you need to authenticate yourself. Check out the sections [below](hub.guide.data.upload_manual) to see an example how to do that.
:::


Expand All @@ -148,9 +148,10 @@ with fsspec.open('gs://leap-scratch/funky-user/test.txt', mode='w') as f:
f.write('hello world')
```

#### Deleting from cloud buckets
### Deleting from cloud buckets

:::{warning}
Depending on which cloud bucket you are working, make sure to double check which files you are deleting by [inspecting the contents](hub:data:list) and only working in a subdirectory with your username (e.g. `gs://<leap-bucket>/<your-username>/some/project/structure`.
Depending on which cloud bucket you are working, make sure to double check which files you are deleting by [inspecting the contents](hub.data.list) and only working in a subdirectory with your username (e.g. `gs://<leap-bucket>/<your-username>/some/project/structure`.
:::

You can remove single files by using a gcsfs/fsspec filessytem as above
Expand Down
8 changes: 6 additions & 2 deletions book/leap-pangeo/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ To get started using the hub, check out this video by [James Munroe](https://git

<iframe width="560" height="315" src="https://www.youtube.com/embed/RKXWxtNqWKw" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>


## How can I get access to the Hub
Only LEAP members will be able to access the hub. Please [become a member](users.membership.apply) and make sure you accept the [invitation on github](users.membership.invite) before proceeding

## Hub Usage

This is a rough and ready guide to using the Hub.
Expand All @@ -21,7 +25,7 @@ Feel free to [edit it yourself](https://github.com/leap-stc/leap-stc.github.io/b

<img width="410" alt="image" src="https://github.com/leap-stc/leap-stc.github.io/assets/14314623/088946a1-896f-4ff8-af91-8107c9f14cfd">

> Note: Depending on your [membership]() you might see additional options (e.g. for GPU machines)
> Note: Depending on your [membership](users.membership) you might see additional options (e.g. for GPU machines)
You have to make 3 choices here:
- The machine type (Choose between "CPU only" or "GPU" if available)
Expand Down Expand Up @@ -70,4 +74,4 @@ You can also navigate to this page from JupyterLab by clicking the `File` menu a

(hub:image)=

For more information on specific use cases or workflows that might arise while using the Hub, please refer to our [Guides](../guides/hub_guides.md).
For more information on specific use cases or workflows that might arise while using the Hub, please refer to our [Guides](../guides/hub_guides.md).
Loading

0 comments on commit 06c6f09

Please sign in to comment.