Skip to content

Commit

Permalink
Merge pull request #89 from MPI-Dortmund/rapids-23-12
Browse files Browse the repository at this point in the history
update to rapids 23.12
  • Loading branch information
thorstenwagner authored Apr 15, 2024
2 parents 3c9cb5e + 4225f5b commit 8ab4a77
Show file tree
Hide file tree
Showing 19 changed files with 184 additions and 88 deletions.
14 changes: 0 additions & 14 deletions conda_env_napari.yml

This file was deleted.

7 changes: 5 additions & 2 deletions conda_env_tomotwin.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,14 @@ dependencies:
- numpy
- matplotlib
- pytables
- cuml=23.10
- cuml=23.12
- cuda-version=11.8
- protobuf[version='>3.20']
- tensorboard
- optuna
- mysql-connector-python
- pytorch-metric-learning
- pip
- pip:
- mysql-connector-python # conda did not provide version 8.3. Version 8.0.3 failed with cuml 23.10
# - tomotwin-cryoet

Binary file added docs/img/tutorial_2/cluster_manager.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/tutorial_2/cluster_refine_05.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file added docs/img/tutorial_2/figure_anchor.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/tutorial_2/fine_tune_01.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/tutorial_2/fine_tune_02.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/tutorial_2/fine_tune_03.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/tutorial_2/fine_tune_04.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 5 additions & 2 deletions docs/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,9 @@ In case you have on old TomoTwin version installed, please remove the old one fi

.. prompt:: bash $

mamba env create -n tomotwin
mamba env remove -n tomotwin

Next you can create the TomoTwin environment:

.. prompt:: bash $

Expand All @@ -29,7 +31,8 @@ Here we assume that you don't have napari installed. Please do:

.. prompt:: bash $

mamba env create -n napari-tomotwin -f https://raw.githubusercontent.com/MPI-Dortmund/tomotwin-cryoet/main/conda_env_napari.yml
mamba env create -n napari-tomotwin -f https://raw.githubusercontent.com/MPI-Dortmund/napari-tomotwin/main/conda_env.yml
pip install napari-tomotwin

3. Link Napari
"""""""""""""""""""
Expand Down
4 changes: 2 additions & 2 deletions docs/strategies/strategy_01.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
Strategy 1: Refinement of references/targets using umaps
Strategy 1: Refinement of references using umaps
========================================================

When to use it
--------------

You have selected references or cluster targets, but you are not satisfied with the picking results. The embedding computed from a cluster or reference is not always an ideal representation. Some references just don't work well, and sometimes umap doesn't show all the structure that is actually in the umap embedding.
You have selected references, but you are not satisfied with the picking results. The embedding computed from a cluster or reference is not always an ideal representation. Some references just don't work well, and sometimes umap doesn't show all the structure that is actually in the umap embedding.

What it does
------------
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/text_modules/downscale/first_paragraph.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
TomoTwin was trained on tomograms with a pixelsize of 10Å. While in practice we've used it with pixel sizes ranging from 9.2Å to 25.0Å, it is probably ideal to run it at a pixel size close to 10Å. For that you may need to downscale your tomogram. You can do that by fourier shrink your tomogram with EMAN2. Lets say you have a Tomogram with a pixelsize of 5.9359Å. The fouriershrink factor is then 10Å/5.9359Å = 1.684
TomoTwin has been trained on tomograms with a pixel size of 10Å. While in practice we've used it with pixel sizes ranging from 9.2Å to 25.0Å, it's probably often ideal to run it with a pixel size close to 10Å. However, for proteins equal to or larger than the ribosome, we have found that a larger pixel size (e.g. 15Å) works better. For this you may need to rescale your tomogram. You can do this by Fourier shrinking your tomogram with EMAN2. Suppose you have a tomogram with a pixel size of 5.9359Å. The Fourier shrink factor is then 10Å/5.9359Å = 1.684



Expand Down
20 changes: 10 additions & 10 deletions docs/tutorials/tut01_reference.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. _tutorial-reference:

Tutorial 1: Reference based particle picking
============================================
=============================================

In this tutorial we describe how to use TomoTwin for picking in tomograms using references.

Expand All @@ -14,13 +14,13 @@ In this tutorial we describe how to use TomoTwin for picking in tomograms using
Download: `https <https://ftp.gwdg.de/pub/misc/sphire/TomoTwin/data/reference_picking/example_reference_picking.tar.gz>`_


1. Downscale your Tomogram to 10 Å
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1. Rescale your Tomogram
----------------------------------

.. include:: text_modules/downscale_reference.rst

2. Pick and extract your reference
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
----------------------------------

For the reference based approach you need, of course, references. To pick them follow the next steps:

Expand Down Expand Up @@ -63,13 +63,13 @@ You will find your extracted references in `reference/protein_a_X.mrc` where X i


3. Embed your Tomogram
^^^^^^^^^^^^^^^^^^^^^^
----------------------

.. include:: text_modules/embed.rst


4. Embed your reference
^^^^^^^^^^^^^^^^^^^^^^^
-----------------------

Now you can embed your reference:

Expand All @@ -85,7 +85,7 @@ Now you can embed your reference:


5. Map your tomogram
^^^^^^^^^^^^^^^^^^^^
---------------------

The map command will calculate the pairwise distances/similarity between the references and the subvolumes and generate a localization map:

Expand All @@ -94,12 +94,12 @@ The map command will calculate the pairwise distances/similarity between the ref
tomotwin_map.py distance -r out/embed/reference/embeddings.temb -v out/embed/tomo/your_tomo_a10_embeddings.temb -o out/map/

6. Localize potential particles
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-------------------------------

.. include:: text_modules/locate.rst

7. Inspect your particles with the boxmanager
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
---------------------------------------------


Open your particles with the following command or drag the files into an open napari window:
Expand Down Expand Up @@ -134,6 +134,6 @@ You will find coordinate file for each reference in :file:`.coords` format in th
Check out the :ref:`corresponding strategy <strategy-01>`!

8. Scale your coordinates
^^^^^^^^^^^^^^^^^^^^^^^^^
-------------------------

.. include:: text_modules/scale.rst
128 changes: 89 additions & 39 deletions docs/tutorials/tut02_cluster.rst
Original file line number Diff line number Diff line change
@@ -1,16 +1,22 @@

Tutorial 2: Clustering based particle picking
============================================

1. Downscale your Tomogram to 10 Å
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


1. Rescale your Tomogram
---------------------------------

.. include:: text_modules/downscale_clustering.rst

2. Embed your Tomogram
^^^^^^^^^^^^^^^^^^^^^^^
----------------------

.. include:: text_modules/embed.rst

3. Estimate UMAP manifold and Generate Embedding Mask
^^^^^^^^^^^^^^^^^^^^^^^^^^
-----------------------------------------------------


Now we will approximate the tomogram embeddings to 2D to allow for efficient visualization. To calculate a UMAP:

Expand All @@ -24,83 +30,126 @@ Now we will approximate the tomogram embeddings to 2D to allow for efficient vis


4. Load data for clustering in Napari
^^^^^^^^^^^^^^^^^^^^^^^^
-------------------------------------


Now that we have all the input files for the clustering workflow we can get started in Napari. First open your tomogram and the embedding mask by:

.. prompt:: bash $

napari your_tomo_a10.mrc

Next open the napari-tomotwin clustering tool via :guilabel:`Plugins` -> :guilabel:`napari-tomotwin` -> :guilabel:`Cluster UMAP embeddings`. Then choose the :guilabel:`Path to UMAP` by clicking on :guilabel:`Select file` and provide the path to your :file:`your_tomo_a10_embeddings.tumap`.
Click :guilabel:`Load` and a 2D plot of the umap embeddings should appear in the plugin window.
Next open the napari-tomotwin clustering tool via :guilabel:`Plugins` -> :guilabel:`TomoTwin clustering workflow`. Then choose the :guilabel:`Path to UMAP` by clicking on :guilabel:`Select file` and provide the path to your :file:`your_tomo_a10_embeddings.tumap`.
Click :guilabel:`Load` and a 2D plot of the umap embeddings should appear in the plugin window. It will do some calculating in the background and might take a few seconds.

5. Find target clusters
^^^^^^^^^^^^^^^^^^^^^^^^
5. Find target cluster
----------------------

The next step is to generate potential targets from the 2D umap using the interactive lasso (freehand) tool from the napari-clusters-plotter.
Once you loaded a umap by the previous step, a set of tools will open.

.. admonition:: **Check out the video demo of selecting clusters**
.. figure:: ../img/tutorial_2/find_cluster_targets_overview.png
:align: left
:width: 400

.. youtube:: PaJlaPAfqtI
:align: center
GUI for the clustering workflow.

Outline a set of points in the 2D plot and these points will become highlighted in your tomogram. To select multiple targets at once hold :kbd:`Shift` when outlining points.
* **Clustering area:** Here you can select clusters within the umap using the lasso (freehand) tool.
* **Plotting parameters:** Only two options are relevant for TomoTwin. The :guilabel:`Layer` combo box allows you to select which UMAP you want to visualize. At the beginning only one UMAP is available. Later in the workflow, more may appear. If you change it, you need to press the :guilabel:`Plot` button to update the UMAP. The second relevant option is the :guilabel:`Log scale` plot. For this you need to expand the :guilabel:`advanced options` and check the :guilabel:`log scale` checkbox.
* **Tools**: Here you will find some helpful tools. First you need to select a cluster from the dropdown box. :guilabel:`Show target` will help you evaluate if a cluster might be a good target. :guilabel:`Recompute UMAP` allows you to refine a selected cluster. Once you have found a good cluster, you can add it to the candidate list with :guilabel:`Add candidate`.
* **Candidates**: Each row represents a candidate target. The labels are label changeable. Left clicking on the table allows to :guilabel:`Show` or the :guilabel:`Delete` a candidate. Sve the candidate targets to disk by pressing :guilabel:`Save candidates`.

.. figure:: ../img/tutorial_2/img1.png
.. admonition:: **Use log scale to see weak clusters**

When the abundance of the protein is low, the clusters are often difficult to detect. Using a log scale for the plot may show clusters that are otherwise difficult to spot. To activate the log scale click on :guilabel:`Advanced settings` :guilabel:`Log scale`.

Locate potential targets
~~~~~~~~~~~~~~~~~~~~~~~~

The next step is to generate potential targets from the 2D umap. We will use a tomogram that shows two distinct particle populations (yellow: Tc toxin, blue: ribosome) as example:

.. figure:: ../img/tutorial_2/fine_tune_01.png
:width: 650
:align: center

.. admonition:: **Use log scale to see weak clusters**

When the abundance of the protein is low, the clusters are often difficult to detect. Using a log scale for the plot may show clusters that are otherwise difficult to spot. To activate the log scale click on :guilabel:`Advanced settings` :guilabel:`Log scale`.
Tomogram with UMAP inset. Two quite distinct particle populations can be identified. The yellow circle highlights a toxin particle, the blue circle a ribosome particle.

Alternatively you can click in the tomogram and a small red circle appears around the embedding for this position in the tomogram.
You can use the interactive lasso (freehand) tool from the "napari cluster plotter" to select clusters in the UMAP. When you outline an area in the UMAP, the corresponding area in the tomogram is highlighted.

.. figure:: ../img/tutorial_2/img3.png
.. figure:: ../img/tutorial_2/fine_tune_02.png
:width: 650
:align: center

.. |mag| image:: ../img/tutorial_2/mag.png
:width: 20
Tomogram with UMAP as inset. The selected cluster contains both particle populations.

.. admonition:: **The Anchor tool helps to locate clusters in the UMAP**

You can use the |mag| icon to change the displayed area/zoom and the :guilabel:`Home` icon to reset it.
Clicking on the tomogram creates an “anchor” (a little circle) in the UMAP. The anchor can help you to locate a cluster in the UMAP. By holding :kbd:`Shift` you can add multiple anchors.

.. image:: ../img/tutorial_2/img2.png
.. image:: ../img/tutorial_2/figure_anchor.png
:width: 450
:align: center

Refine cluster targets
~~~~~~~~~~~~~~~~~~~~~~

The selection we made is not satisfactory as both the toxins and the ribosomes are selected. TomoTwin uses UMAPs to reduce the 32-dimensional embedding space to a 2-dimensional space that can be visualized. However, this reduction is not perfect and sometimes a cluster can actually contain several sub-clusters. Pressing :guilabel:`Recompute UMAP` will compute a new UMAP for the embeddings contained in the selected cluster.

.. figure:: ../img/tutorial_2/fine_tune_03.png
:width: 300
:align: center

Recalculated UMAP for the embeddings contained in the previously selected cluster.

The new umap shows new structure. If we select the rather densely populated area on the left, we have identified the cluster that exclusively represents the toxin cluster. To select the ribosome cluster, we lasso the tip of the larger and fuzzier area by holding :kbd:`Shift` while outlining the area.

.. figure:: ../img/tutorial_2/cluster_refine_05.png
:width: 650
:align: center

In the recalculated UMAP we can now separate the toxin from the ribosome cluster.

.. admonition:: **Improved centering**
For the ribosome, we could get a more "complete" highlighting if we had selected the entire area. However, the way we did it is preferable because we only get the center of the ribosome, which results in better centered picks.

When generating targets to pick large proteins, it is best to outline points that only lay in the center of your protein rather than covering the entire protein. Note that due to the way embeddings are generated from the tomogram, this likely won't be in the center of the cluster. This will help ensure that your resulting picks are centered.
As a sanity check, we can press :guilabel:`Show target` for each cluster in the dropdown list. In TomoTwin, a cluster is reduced and represented by a single embedding point (the cluster center). It is a good sanity check to visualize which of the points in your cluster represents your cluster. By clicking :guilabel:`Show target`, the center (medoid) is calculated and visualized in the tomogram by a circle in the cluster color. If the circle is roughly centered on your protein of interest, its probably a good target. If the circle is approximately centered on your protein of interest, it is probably a good target. If it is not centered on a target, but rather on background, other structures or contamination, you should continue to refine your cluster target. Here, both cases are centered on the toxin and ribosome, respectively.

.. image:: ../img/tutorial_2/img4.png
:width: 650
:align: center
Add and save candidates
~~~~~~~~~~~~~~~~~~~~~~

Now that we are satisfied with our selection, we can add both clusters to the candidate list by selecting each cluster in the drop-down list and pressing :guilabel:`Add candidate`.


We recommend that you change the label of each candidate by double-clicking with the left mouse button in one of the label cells.

6. Save target clusters
^^^^^^^^^^^^^^^^^^^^^^^^
.. figure:: ../img/tutorial_2/cluster_manager.png
:width: 400
:align: center

All potential targets are listed and labeled candidates table. Right click on a row allow to :guilabel:`Delete` the candidate or by pressing :guilabel:`Show` to restore the UMAP selection.

Once you have outlined a target cluster for each protein of interest, it is time to save these targets to be used as picking references in this and additional tomograms.
Finally, we can save the the corresponding labels to disk by pressing :guilabel:`Save candidates`. Select a folder and write the candidate to disk. The folder will contain several files:

This can be done with :guilabel:`Plugins` -> :guilabel:`napari-tomotwin` -> :guilabel:`Save cluster targets` and providing an output directory :file:`cluster_targets.temb` will be written.
- :file:`cluster_targets.temb`: This is the file you will use in the next steps. It contains the medoid embedding for each cluster.
- :file:`embeddings_CLUSTER_LABEL.temb`: One file per cluster. It contains all the embeddings that are part of that cluster.
- :file:`medoid_CLUSTER_LABEL.coords`: The coordinates of the cluster centre (medoid). This is the same as what you get when you click on :guilabel:`Show target`

.. admonition:: **Check out the video demo of selecting clusters**

.. youtube:: PaJlaPAfqtI
:align: center


6. Map your tomogram
--------------------

7. Map your tomogram
^^^^^^^^^^^^^^^^^^^^

The map command will calculate the pairwise distances/similarity between the targets and the tomogram subvolumes and generate a localization map:

.. prompt:: bash $

tomotwin_map.py distance -r out/clustering/cluster_targets.temb -v out/embed/tomo/your_tomo_a10_embeddings.temb -o out/map/

8. Localize potential particles
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
7. Localize potential particles
-------------------------------

.. include:: text_modules/locate.rst

Expand Down Expand Up @@ -136,8 +185,9 @@ To convert the :file:`.tloc` file into :file:`.coords` you need to run

You will find coordinate file for each reference in :file:`.coords` format in the :file:`coords/` folder.

9. Scale your coordinates
^^^^^^^^^^^^^^^^^^^^^^^^^
8. Scale your coordinates
-------------------------


.. include:: text_modules/scale.rst

14 changes: 3 additions & 11 deletions tomotwin/modules/common/findmax/findmax.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,6 @@ def get_avg_pos(classes: List[int], regions: np.array, region_max_value: List, i
return maxima_coords

def find_maxima(volume: np.array, tolerance: float, global_min: float = 0.5, **kwargs) -> tuple[list, np.array]:

"""
:param volume: 3D volume
:param tolerance: Tolerance for detection
Expand Down Expand Up @@ -164,7 +163,7 @@ def find_maxima(volume: np.array, tolerance: float, global_min: float = 0.5, **k
if global_min == None:
global_min = np.min(image) + tolerance

print("effective global min:", global_min)
# print("effective global min:", global_min)



Expand All @@ -190,14 +189,8 @@ def find_maxima(volume: np.array, tolerance: float, global_min: float = 0.5, **k
k = 0
region_max_value = []
working_image_raveled = working_image.ravel(order)
import tqdm
desc="Locate"
pos=None
if 'tqdm_pos' in kwargs:
desc = f"Locate class {kwargs['tqdm_pos']}"
pos = kwargs["tqdm_pos"]

for seed_point in tqdm.tqdm(coords_sorted,position=pos, desc=desc):

for seed_point in coords_sorted:
try:
iter(seed_point)
except TypeError:
Expand Down Expand Up @@ -242,7 +235,6 @@ def find_maxima(volume: np.array, tolerance: float, global_min: float = 0.5, **k
chunked_arrays = np.array_split(region_list, num_cores)
from concurrent.futures import ProcessPoolExecutor as Pool
with Pool(multiprocessing.cpu_count()//2) as pool:
print("Call get_avg_pos")
maxima_coords = pool.map(partial(get_avg_pos, regions=regions, region_max_value=region_max_value, image=image),
chunked_arrays)
#maxima_coords = pool.map(get_avg_pos, repeat(regions), repeat(region_max_value), repeat(image), chunked_arrays)
Expand Down
Loading

0 comments on commit 8ab4a77

Please sign in to comment.