-
Notifications
You must be signed in to change notification settings - Fork 0
Filtering Template Matching Results
OPUS-TOMO can be used to filter template matching (TM) results.
In the following tutorial, you will learn how to filter TM results from the S. pombe tomogram.
The S. pombe tilt series and their alignment parameters to be used in this tutorial are deposited in https://www.ebi.ac.uk/empiar/EMPIAR-10988/ .
We should first acknowledge Judith's group for curating such a great dataset. We are using the Tilt series stacks and metadata of cryo-electron tomograms acquired with defocus-only (DEF) on S. pombe cryo-FIB lamellae
in this tutorial.
The dataset contains the following files for each tilt series,
TS_0xx.rawtlt (The tilt angles for tilts in st)
TS_0xx.st (The tilt series stack)
TS_0xx.xf (The deposited alignment parameters for tilts)
In this tutorial, the tilt series in st
files are sorted according to their tilt angles in ascending order from negative to positive.
The first step is tomogram reconstruction according to the deposited alignment parameters.
The tilt series alignment parameters for each tilt series are stored in TS_0xx_st.xf
.
The aligned tilt series can be created using newstack
from IMOD with the deposited alignment parameters.
For example, to align TS_028, create a directory named TS_028_Imod
, copy the newst.com
in the folder https://drive.google.com/drive/folders/1FcF1PC-0lY2C6DP0zP7K7qhcz_ltn5t-?usp=sharing to this directory, change the input file in
the script to TS_028 and
executing,
cd TS_028_Imod
newstack -pa newst.com
This command will create the TS_028_st.ali
file in the folder.
If you uncomment the first and last lines in newst.com
, you can execute this script using subm
from IMOD by
subm newst.com
The tomogram reconstruction can be performed by Aretomo with alignment disabled using the aligned stack. We recommend using Aretomo with commit hash f7d1d44c6cb68756be4776b4c3a79a02f491e278. You can checkout this commit using
git checkout f7d1d44c6cb68756be4776b4c3a79a02f491e278
and compile it.
The reconstruction is performed by Aretomo using the recon.sh
script in https://drive.google.com/drive/folders/1FcF1PC-0lY2C6DP0zP7K7qhcz_ltn5t-?usp=sharing . Note that the scripts used in this tutorial are in this folder!
sh recon.sh TS_028
The options used in the reconstruction script control the behavior of Aretomo during reconstruction. By using -Align 0
, Aretomo will backproject tilt series according to their tilt angles without performing alignment since the tilt series have been aligned by newstack
. By using -FlipInt 1
and -FlipVol 1
, aretomo will flip the intensity of particles in the tomogram to white and stored the tomogram in xyz
order.
By using -OutBin 1
, Aretomo will reconstruct the tomogram in full size. The reconstructed tomogram is named as TS_028_are.mrc
. Aretomo will update files related to IMOD in the folder TS_0xx_Imod
as well.
This folder contains the following contents,
ls -1 TS_028_Imod/
newst.com
tilt.com
TS_028_st.aln
TS_028_st.tlt
TS_028_st.xf
TS_028_st.xtilt
Link the tomogram TS_0xx_are.mrc into this folder using
cd TS_028_Imod
ln -s ../TS_028_are.mrc TS_028_st.mrc
Next, we can extract subtomograms from tomograms according to the template matching results. To extract subtomograms according to the template matching results from PyTOM, you should convert TM results to STAR and coords files.
Follow https://github.com/SBC-Utrecht/PyTom/wiki/Particle-Picking to perform template matching for 80S ribosomes in the tomograms. The template matching results from PyTOM is stored in xml file.
The conversion from xml file to STAR can be performed by conv.sh
in the shared google drive folder.
If TM is performed using 4x binned tomogram, setting the pixelsize
and binpytom
to 3.37 and 4 in this case. Then execute,
sh conv.sh TS_0xx
The resulting STAR file will have the coordinates for subtomograms in unbinned tomogram by specifying the correct binpytom
, which
can then be used to extract subtomograms in full size from the unbinned tomogram.
We have provided the template matching results from pyTOM in STAR format directly in the folder https://drive.google.com/drive/folders/1FcF1PC-0lY2C6DP0zP7K7qhcz_ltn5t-?usp=sharing
Next, you can extract the coordinates from the STAR file for each tilt series. This operation is
performed by star2coords.py
in the shared google drive. Execute the command,
python star2coords.py TS_0xx
This script simply reads STAR file into pandas dataframe using starfile
package, which can be installed via pip, then write out the coordinates to coords
file.
The variable factor
controls the scales of coordinates in coords
.
This command will output TS_0xx_st.coords
. Move this file to the corresponding ```TS_0xx_Imod`` folder.
You now have the required files for subtomogram extraction. We have provided the coords file for each tilt series in
the shared folder as well.
At this step, the folder contains the following contents,
ls -1 TS_028_Imod/
newst.com
tilt.com
TS_028_st.aln
TS_028_st.ali
TS_028_st.coords
TS_028_st.mrc -> ../TS_028_are.mrc
TS_028_st.tlt
TS_028_st.xf
TS_028_st.xtilt
Create a file named all_tomo.star
which points to the tomograms where subtomograms to be extracted.
This file has the following lines:
data_
loop_
_rlnMicrographName
metadata/TS_026_Imod/TS_026_st.mrc
metadata/TS_027_Imod/TS_027_st.mrc
metadata/TS_028_Imod/TS_028_st.mrc
metadata/TS_029_Imod/TS_029_st.mrc
metadata/TS_030_Imod/TS_030_st.mrc
metadata/TS_034_Imod/TS_034_st.mrc
metadata/TS_037_Imod/TS_037_st.mrc
metadata/TS_041_Imod/TS_041_st.mrc
metadata/TS_043_Imod/TS_043_st.mrc
metadata/TS_045_Imod/TS_045_st.mrc
Next, execute the command using relion_preprocess
from Relion 3.0.8,
relion_preprocess --coord_suffix .coords --coord_dir ./ --part_dir Extract/extract_tomo/ --extract --bg_radius 48 --norm --extract_size 128 --i all_tomo.star
We then have all subtomograms in the folder Extract/extract_tomo
.
Since OPUS-TOMO models 3DCTF during training, we also need to estimate the CTF before training.
The CTF estimation is done by ctfplotter
from IMOD. The ctf is estimated using
the aligned tilt series (though not necessary since current implementation neglects astigmatism). The aligned tilt series can be created by using the tilt parameter TS_0xx_st.xf
and newst.com
.
These two files are generated automatically by AreTOMO during reconstruction. The aligned tilt series can be generated by issuing
subm newst.com
For this tutorial, we have already got the TS_0xx_st.ali
stack in the first step.
Next, copy ctfplotter.com
to this folder, and change the locations of related files in this configuration file accordingly.
Execute
ctfplotter -pa ctfplloter.com
Then the ctfplotter GUI will pop up. You can then estimate defoucs for each tilt separately. The defocus parameters
are save in ctfplotter.defocus
in this folder, which is of the form,
1 0 0.0 0.0 0.0 3
1 1 -44.00 -44.00 2273.6 2247.7 -49.26
2 2 -42.00 -42.00 2269.1 2246.2 -51.05
3 3 -40.00 -40.00 2167.3 2143.0 -55.22
4 4 -38.00 -38.00 2176.1 2150.6 -43.42
5 5 -36.00 -36.00 2152.1 2126.5 -34.48
6 6 -34.00 -34.00 2115.0 2084.0 -32.59
7 7 -32.00 -32.00 2108.1 2071.1 -36.48
8 8 -30.00 -30.00 2128.7 2074.8 -31.51
9 9 -28.00 -28.00 2084.5 2050.2 -28.93
10 10 -26.00 -26.00 2074.1 2041.0 -30.26
11 11 -24.00 -24.00 2050.3 2014.6 -29.52
12 12 -22.00 -22.00 2040.1 2008.3 -26.12
13 13 -20.00 -20.00 2000.6 1964.4 -24.34
14 14 -18.00 -18.00 2003.3 1967.3 -32.28
15 15 -16.00 -16.00 1973.6 1939.3 -26.57
16 16 -14.00 -14.00 1950.4 1920.0 -26.15
17 17 -12.00 -12.00 1934.3 1894.9 -31.14
18 18 -10.00 -10.00 1897.1 1871.1 -30.25
19 19 -8.00 -8.00 1891.8 1872.1 -36.68
20 20 -6.00 -6.00 1837.8 1809.6 -38.37
21 21 -4.00 -4.00 1836.7 1810.7 -31.81
......
Additionally, you should check the handness of defocus gradient. Conventionally, the left hand of x axis will swing to higher z under
negative tilt angle. If higher z corresponds to larger defocus, the left hand side of tilt image should have larger defocus. Therefore, you can use split.sh
in shared google drive folder to split the aligned stack into two halves. Left hand half is named as TS_0xx_0.ali
, while right hand half is named as TS_0xx_1.ali
. You can then estimate the defocus values for both halves and compare the defoci of both halves of tilt image with negative tilt angles. If the defocus of TS_0xx_0.ali
is larger than the defocus of TS_0xx_1.ali
, then higher z corresponds to larger defocus, and you should set the line for ptcldefocus
as ptcldefocus = avgdefocus + deltaD
in relion_ctf_prepare.py
, or ptcldefocus = avgdefocus - deltaD
in relion_ctf_prepare.py
vice versa.
When defoci of all tilt series are estimated, you can have a look at the relion_ctf_prepare.py
and set the parameters according to your experimental settings (There is a version compatible with python3 in the google drive).
Important parameters are TomogramStarFileName
, ExtractJobAlias
, DPixSize
, apix
, RelionPartName
and subtomostarname
.
Note that this script depends on IMOD, make sure to load IMOD before executing.
Create the TS_0xx_st.order
file for each tilt series before proceeding. TS_0xx_st.order
records the tilt angle and its corresponding exposure dose (We only need a fake one with exposure dose being the index of tilt frame. No actual exposure does is required). For example, for TS_028
,
-30 15.00000000000000000000
-28 14.00000000000000000000
-26 13.00000000000000000000
-24 12.00000000000000000000
-22 11.00000000000000000000
-20 10.00000000000000000000
-18 9.00000000000000000000
-16 8.00000000000000000000
-14 7.00000000000000000000
-12 6.00000000000000000000
-10 5.00000000000000000000
-8 4.00000000000000000000
-6 3.00000000000000000000
-4 2.00000000000000000000
-2 1.00000000000000000000
0 0
2 1.00000000000000000000
4 2.00000000000000000000
6 3.00000000000000000000
8 4.00000000000000000000
10 5.00000000000000000000
12 6.00000000000000000000
14 7.00000000000000000000
16 8.00000000000000000000
18 9.00000000000000000000
20 10.00000000000000000000
22 11.00000000000000000000
24 12.00000000000000000000
26 13.00000000000000000000
28 14.00000000000000000000
30 15.00000000000000000000
This file can be generated by using gen_tilt_order.sh
script in google drive. For example, to generate the order file
for TS_028, executing
gen_tilt_order.sh TS_028.rawtlt > TS_028_Imod/TS_028_st.order
At this step, the folder contains:
ls -1 TS_028_Imod/
ctfplotter.com
ctfplotter.defocus
newst.com
tilt.com
TS_028_st.aln
TS_028_st.ali
TS_028_st.coords
TS_028_st.mrc -> ../TS_028_are.mrc
TS_028_st.order
TS_028_st.tlt
TS_028_st.xf
TS_028_st.xtilt
When all these inputs are ready and the parameters are properly set in relion_ctf_prepare.py
, you can generate the per-particle corrected CTF using
python relion_ctf_prepare.py
You then get the starfiles for the CTFs of subtomograms, and the starfile with all subtomograms' parameters.
The subtomgram STAR generated by relion_ctf_prepare
doesn't contain orientation information for subtomgrams. Hence,
we need to merge this STAR file with the Template Matching Results using starpy
from https://github.com/fuzikt/starpy. Replace
assign_column_star.py
by the file in the shared google drive, https://drive.google.com/file/d/1G06fWAFjMXSl3s56EvN8Esi810xU1Z0t/view?usp=sharing. (We made some modifications to make the following assignment work.)
Execute
python ~/soft/starpy/assign_column_star.py --i1 relion_subtomo.star --i2 TS_down.star --o added.star --col_lb rlnAngleRot
python ~/soft/starpy/assign_column_star.py --i1 added.star --i2 TS_down.star --o added.star --col_lb rlnAngleTilt
python ~/soft/starpy/assign_column_star.py --i1 added.star --i2 TS_down.star --o added.star --col_lb rlnAnglePsi
The added.star
contains all required information for training OPUS-TOMO! We have also provided a STAR file
added.star
for reference in the shared google drive folder.
Another option at this step is performing a subtomogram averaging to pre-align the subtomograms (though this
set is extremely noisy so far). If you used subtomograms from non-TM methods, this step is compulsory since the orientations of
subtomograms is unknown.
To perform subtomogram averaging using Relion, you should first reconstruct the CTFs using do_all_reconstruct_ctfs.sh
scripts.
Next, execute:
mpirun -n 7 --oversubscribe --bind-to none --mca btl '^openib' relion_refine_mpi --o Refine3D/job001/run --auto_refine --split_random_halves --i all_subtomo.star --ref Refine3D/job002/run_class001.mrc --ini_high 40 --dont_combine_weights_via_disc --pool 3 --pad 2 --ctf --particle_diameter 337 --flatten_solvent --zero_mask --oversampling 1 --healpix_order 2 --auto_local_healpix_order 4 --offset_range 6 --offset_step 2 --sym C1 --low_resol_join_halves 40 --norm --scale --j 4 --firstiter_cc --free_gpu_memory 256 --gpu 0,1,2,3
You can then using the run_data.star
file with subtomogram averaging result for heterogeneity analysis.
Performing subtomogram averaging using PyTOM also works! You then need to convert the xml file generated by PyTOM to STAR file.
OPUS-TOMO needs to read pose information from a pickle file. Therefore,
before training, extract pose information from added.star
using
dsdsh prepare added.star 128 3.37
The pose information then is stored into added_pose_euler.pkl
.
Create a folder for training. Change to this directory.
You can then create a mask from either template using relion, or a sphere mask using pytom! The detailed procedure for mask creation can be found in https://relion.readthedocs.io/en/release-3.1/SPA_tutorial/Mask.html. You can also invoke relion_mask_create
and set parameters according to the explanation.
You can now train OPUS-TOMO using the command:
dsd train_tomo added.star --poses added_pose_euler.pkl -n 40 -b 8 --zdim 12 --lr 4.e-5 --num-gpus 4 --multigpu --beta-control 0.8 --beta cos -o . -r ./mask.mrc --downfrac 0.75 --valfrac 0.1 --lamb 0.8 --split ribo-split.pkl --bfactor 3.5 --templateres 160 --angpix 3.37 --estpose --tmp-prefix tmp --datadir /work/
The meaning of each arguments can be checked in the readme file.
Check the Readme file for detailed analysis process. Change to the directory storing training results. We can first perform a KMean and PC analysis for the learned latent space at training epoch 29 using
dsdsh analyze . 29 10 20
The above command clustered the latent space into 20 classes and 12 principal components. The reconstructions at the KMeans cluster centers can be obtained by
dsdsh eval_vol . 29 kmeans 20 4.
To reconstruct volumes along a principal components, you can execute
dsdsh eval_vol . 29 pc 9 4.
The above command create 10 volumes along PC9. You can visualize them as movie using vseries
command in ChimeraX.
To get the subtomograms in each cluster, you can extract the STAR file for each cluster using
dsdsh parse_pose added.star 128 3.37 . 29 20
The expected filtering result is shown below:
The starfile for each cluster can be easily visualized in ArtiaX https://github.com/FrangakisLab/ArtiaX/ ! You can join the starfiles from several clusters corresponding to ribosomes and select rows from the tomogram that you would like to visualize, which can be done by modifying the star2csv.py
script in shared google drive. Save the starfile, you can directly visualize the subtomograms picked by OPUS-TOMO in ArtiaX!