Skip to content

Filtering Template Matching Results

alncat edited this page Dec 13, 2024 · 40 revisions

OPUS-TOMO can be used to filter template matching (TM) results. In the following tutorial, you will learn how to filter TM results from the S. pombe tomogram. The S. pombe tilt series and their alignment parameters to be used in this tutorial are deposited in https://www.ebi.ac.uk/empiar/EMPIAR-10988/ . We should first acknowledge Judith's group for curating such a great dataset. We are using the Tilt series stacks and metadata of cryo-electron tomograms acquired with defocus-only (DEF) on S. pombe cryo-FIB lamellae in this tutorial. The dataset contains the following files for each tilt series,

TS_0xx.rawtlt (The tilt angles for tilts in st)
TS_0xx.st (The tilt series stack)
TS_0xx.xf (The deposited alignment parameters for tilts)

In this tutorial, the tilt series in st files are sorted according to their tilt angles in ascending order from negative to positive. The first step is tomogram reconstruction according to the deposited alignment parameters.

Tomogram Reconstruction Using Deposited Alignment Parameters

The tilt series alignment parameters for each tilt series are stored in TS_0xx_st.xf. The aligned tilt series can be created using newstack from IMOD with the deposited alignment parameters. For example, to align TS_028, create a directory named TS_028_Imod, copy the newst.com in the folder https://drive.google.com/drive/folders/1FcF1PC-0lY2C6DP0zP7K7qhcz_ltn5t-?usp=sharing to this directory, change the input file in the script to TS_028 and executing,

cd TS_028_Imod
newstack -pa newst.com

This command will create the TS_028_st.ali file in the folder. If you uncomment the first and last lines in newst.com, you can execute this script using subm from IMOD by

subm newst.com

The tomogram reconstruction can be performed by Aretomo with alignment disabled using the aligned stack. We recommend using Aretomo with commit hash f7d1d44c6cb68756be4776b4c3a79a02f491e278. You can checkout this commit using git checkout f7d1d44c6cb68756be4776b4c3a79a02f491e278 and compile it. The reconstruction is performed by Aretomo using the recon.sh script in https://drive.google.com/drive/folders/1FcF1PC-0lY2C6DP0zP7K7qhcz_ltn5t-?usp=sharing . Note that the scripts used in this tutorial are in this folder!

sh recon.sh TS_028

The options used in the reconstruction script control the behavior of Aretomo during reconstruction. By using -Align 0, Aretomo will backproject tilt series according to their tilt angles without performing alignment since the tilt series have been aligned by newstack. By using -FlipInt 1 and -FlipVol 1, aretomo will flip the intensity of particles in the tomogram to white and stored the tomogram in xyz order. By using -OutBin 1, Aretomo will reconstruct the tomogram in full size. The reconstructed tomogram is named as TS_028_are.mrc. Aretomo will update files related to IMOD in the folder TS_0xx_Imod as well. This folder contains the following contents,

ls -1 TS_028_Imod/
newst.com
tilt.com
TS_028_st.aln
TS_028_st.tlt
TS_028_st.xf
TS_028_st.xtilt

Link the tomogram TS_0xx_are.mrc into this folder using

cd TS_028_Imod
ln -s ../TS_028_are.mrc TS_028_st.mrc

Next, we can extract subtomograms from tomograms according to the template matching results. To extract subtomograms according to the template matching results from PyTOM, you should convert TM results to STAR and coords files.

Convert Template Matching Result to STAR and coords

Follow https://github.com/SBC-Utrecht/PyTom/wiki/Particle-Picking to perform template matching for 80S ribosomes in the tomograms. The template matching results from PyTOM is stored in xml file. The conversion from xml file to STAR can be performed by conv.sh in the shared google drive folder. If TM is performed using 4x binned tomogram, setting the pixelsize and binpytom to 3.37 and 4 in this case. Then execute,

sh conv.sh TS_0xx

The resulting STAR file will have the coordinates for subtomograms in unbinned tomogram by specifying the correct binpytom, which can then be used to extract subtomograms in full size from the unbinned tomogram. We have provided the template matching results from pyTOM in STAR format directly in the folder https://drive.google.com/drive/folders/1FcF1PC-0lY2C6DP0zP7K7qhcz_ltn5t-?usp=sharing

Next, you can extract the coordinates from the STAR file for each tilt series. This operation is performed by star2coords.py in the shared google drive. Execute the command,

python star2coords.py TS_0xx

This script simply reads STAR file into pandas dataframe using starfile package, which can be installed via pip, then write out the coordinates to coords file. The variable factor controls the scales of coordinates in coords. This command will output TS_0xx_st.coords. Move this file to the corresponding ```TS_0xx_Imod`` folder. You now have the required files for subtomogram extraction. We have provided the coords file for each tilt series in the shared folder as well. At this step, the folder contains the following contents,

ls -1 TS_028_Imod/
newst.com
tilt.com
TS_028_st.aln
TS_028_st.ali
TS_028_st.coords
TS_028_st.mrc -> ../TS_028_are.mrc
TS_028_st.tlt
TS_028_st.xf
TS_028_st.xtilt

Subtomogram Extraction

Create a file named all_tomo.star which points to the tomograms where subtomograms to be extracted. This file has the following lines:

data_
loop_
_rlnMicrographName
metadata/TS_026_Imod/TS_026_st.mrc
metadata/TS_027_Imod/TS_027_st.mrc
metadata/TS_028_Imod/TS_028_st.mrc
metadata/TS_029_Imod/TS_029_st.mrc
metadata/TS_030_Imod/TS_030_st.mrc
metadata/TS_034_Imod/TS_034_st.mrc
metadata/TS_037_Imod/TS_037_st.mrc
metadata/TS_041_Imod/TS_041_st.mrc
metadata/TS_043_Imod/TS_043_st.mrc
metadata/TS_045_Imod/TS_045_st.mrc

Next, execute the command using relion_preprocess from Relion 3.0.8,

relion_preprocess --coord_suffix .coords --coord_dir ./ --part_dir Extract/extract_tomo/ --extract --bg_radius 48 --norm --extract_size 128 --i all_tomo.star

We then have all subtomograms in the folder Extract/extract_tomo.

CTF Estimation

Since OPUS-TOMO models 3DCTF during training, we also need to estimate the CTF before training. The CTF estimation is done by ctfplotter from IMOD. The ctf is estimated using the aligned tilt series (though not necessary since current implementation neglects astigmatism). The aligned tilt series can be created by using the tilt parameter TS_0xx_st.xf and newst.com. These two files are generated automatically by AreTOMO during reconstruction. The aligned tilt series can be generated by issuing

subm newst.com

For this tutorial, we have already got the TS_0xx_st.ali stack in the first step. Next, copy ctfplotter.com to this folder, and change the locations of related files in this configuration file accordingly. Execute

ctfplotter -pa ctfplloter.com

Then the ctfplotter GUI will pop up. You can then estimate defoucs for each tilt separately. The defocus parameters are save in ctfplotter.defocus in this folder, which is of the form,

1       0       0.0     0.0     0.0     3
1       1       -44.00  -44.00   2273.6  2247.7  -49.26
2       2       -42.00  -42.00   2269.1  2246.2  -51.05
3       3       -40.00  -40.00   2167.3  2143.0  -55.22
4       4       -38.00  -38.00   2176.1  2150.6  -43.42
5       5       -36.00  -36.00   2152.1  2126.5  -34.48
6       6       -34.00  -34.00   2115.0  2084.0  -32.59
7       7       -32.00  -32.00   2108.1  2071.1  -36.48
8       8       -30.00  -30.00   2128.7  2074.8  -31.51
9       9       -28.00  -28.00   2084.5  2050.2  -28.93
10      10      -26.00  -26.00   2074.1  2041.0  -30.26
11      11      -24.00  -24.00   2050.3  2014.6  -29.52
12      12      -22.00  -22.00   2040.1  2008.3  -26.12
13      13      -20.00  -20.00   2000.6  1964.4  -24.34
14      14      -18.00  -18.00   2003.3  1967.3  -32.28
15      15      -16.00  -16.00   1973.6  1939.3  -26.57
16      16      -14.00  -14.00   1950.4  1920.0  -26.15
17      17      -12.00  -12.00   1934.3  1894.9  -31.14
18      18      -10.00  -10.00   1897.1  1871.1  -30.25
19      19      -8.00   -8.00    1891.8  1872.1  -36.68
20      20      -6.00   -6.00    1837.8  1809.6  -38.37
21      21      -4.00   -4.00    1836.7  1810.7  -31.81
......

Additionally, you should check the handness of defocus gradient. Conventionally, the left hand of x axis will swing to higher z under negative tilt angle. If higher z corresponds to larger defocus, the left hand side of tilt image should have larger defocus. Therefore, you can use split.sh in shared google drive folder to split the aligned stack into two halves. Left hand half is named as TS_0xx_0.ali, while right hand half is named as TS_0xx_1.ali. You can then estimate the defocus values for both halves and compare the defoci of both halves of tilt image with negative tilt angles. If the defocus of TS_0xx_0.ali is larger than the defocus of TS_0xx_1.ali, then higher z corresponds to larger defocus, and you should set the line for ptcldefocus as ptcldefocus = avgdefocus + deltaD in relion_ctf_prepare.py, or ptcldefocus = avgdefocus - deltaD in relion_ctf_prepare.py vice versa.

When defoci of all tilt series are estimated, you can have a look at the relion_ctf_prepare.py and set the parameters according to your experimental settings (There is a version compatible with python3 in the google drive). Important parameters are TomogramStarFileName, ExtractJobAlias, DPixSize, apix, RelionPartName and subtomostarname. Note that this script depends on IMOD, make sure to load IMOD before executing. Create the TS_0xx_st.order file for each tilt series before proceeding. TS_0xx_st.order records the tilt angle and its corresponding exposure dose (We only need a fake one with exposure dose being the index of tilt frame. No actual exposure does is required). For example, for TS_028,

-30 15.00000000000000000000
-28 14.00000000000000000000
-26 13.00000000000000000000
-24 12.00000000000000000000
-22 11.00000000000000000000
-20 10.00000000000000000000
-18 9.00000000000000000000
-16 8.00000000000000000000
-14 7.00000000000000000000
-12 6.00000000000000000000
-10 5.00000000000000000000
-8 4.00000000000000000000
-6 3.00000000000000000000
-4 2.00000000000000000000
-2 1.00000000000000000000
0 0
2 1.00000000000000000000
4 2.00000000000000000000
6 3.00000000000000000000
8 4.00000000000000000000
10 5.00000000000000000000
12 6.00000000000000000000
14 7.00000000000000000000
16 8.00000000000000000000
18 9.00000000000000000000
20 10.00000000000000000000
22 11.00000000000000000000
24 12.00000000000000000000
26 13.00000000000000000000
28 14.00000000000000000000
30 15.00000000000000000000

This file can be generated by using gen_tilt_order.sh script in google drive. For example, to generate the order file for TS_028, executing

gen_tilt_order.sh TS_028.rawtlt > TS_028_Imod/TS_028_st.order

At this step, the folder contains:

ls -1 TS_028_Imod/
ctfplotter.com
ctfplotter.defocus
newst.com
tilt.com
TS_028_st.aln
TS_028_st.ali
TS_028_st.coords
TS_028_st.mrc -> ../TS_028_are.mrc
TS_028_st.order
TS_028_st.tlt
TS_028_st.xf
TS_028_st.xtilt

When all these inputs are ready and the parameters are properly set in relion_ctf_prepare.py , you can generate the per-particle corrected CTF using

python relion_ctf_prepare.py

You then get the starfiles for the CTFs of subtomograms, and the starfile with all subtomograms' parameters.

Combine Subtomogram STAR with Template Matching Results

The subtomgram STAR generated by relion_ctf_prepare doesn't contain orientation information for subtomgrams. Hence, we need to merge this STAR file with the Template Matching Results using starpy from https://github.com/fuzikt/starpy. Replace assign_column_star.py by the file in the shared google drive, https://drive.google.com/file/d/1G06fWAFjMXSl3s56EvN8Esi810xU1Z0t/view?usp=sharing. (We made some modifications to make the following assignment work.) Execute

python ~/soft/starpy/assign_column_star.py --i1 relion_subtomo.star --i2 TS_down.star --o added.star --col_lb rlnAngleRot
python ~/soft/starpy/assign_column_star.py --i1 added.star --i2 TS_down.star --o added.star --col_lb rlnAngleTilt
python ~/soft/starpy/assign_column_star.py --i1 added.star --i2 TS_down.star --o added.star --col_lb rlnAnglePsi

The added.star contains all required information for training OPUS-TOMO! We have also provided a STAR file added.star for reference in the shared google drive folder.

Perform Subtomogram Averaging (Optional)

Another option at this step is performing a subtomogram averaging to pre-align the subtomograms (though this set is extremely noisy so far). If you used subtomograms from non-TM methods, this step is compulsory since the orientations of subtomograms is unknown. To perform subtomogram averaging using Relion, you should first reconstruct the CTFs using do_all_reconstruct_ctfs.sh scripts. Next, execute:

mpirun -n 7  --oversubscribe  --bind-to none --mca btl '^openib' relion_refine_mpi --o Refine3D/job001/run --auto_refine --split_random_halves --i all_subtomo.star --ref Refine3D/job002/run_class001.mrc --ini_high 40 --dont_combine_weights_via_disc --pool 3 --pad 2  --ctf --particle_diameter 337 --flatten_solvent --zero_mask --oversampling 1 --healpix_order 2 --auto_local_healpix_order 4 --offset_range 6 --offset_step 2 --sym C1 --low_resol_join_halves 40 --norm --scale  --j 4 --firstiter_cc --free_gpu_memory 256 --gpu 0,1,2,3

You can then using the run_data.star file with subtomogram averaging result for heterogeneity analysis. Performing subtomogram averaging using PyTOM also works! You then need to convert the xml file generated by PyTOM to STAR file.

Prepare Pose Pickle File

OPUS-TOMO needs to read pose information from a pickle file. Therefore, before training, extract pose information from added.star using

dsdsh prepare added.star 128 3.37

The pose information then is stored into added_pose_euler.pkl.

Train OPUS-TOMO

Create a folder for training. Change to this directory. You can then create a mask from either template using relion, or a sphere mask using pytom! The detailed procedure for mask creation can be found in https://relion.readthedocs.io/en/release-3.1/SPA_tutorial/Mask.html. You can also invoke relion_mask_create and set parameters according to the explanation.

You can now train OPUS-TOMO using the command:

dsd train_tomo added.star --poses added_pose_euler.pkl -n 40 -b 8 --zdim 12 --lr 4.e-5 --num-gpus 4 --multigpu --beta-control 0.8 --beta cos -o . -r ./mask.mrc --downfrac 0.75 --valfrac 0.1 --lamb 0.8 --split ribo-split.pkl --bfactor 3.5 --templateres 160 --angpix 3.37 --estpose --tmp-prefix tmp --datadir /work/

The meaning of each arguments can be checked in the readme file.

Analyze Training Result

Check the Readme file for detailed analysis process. Change to the directory storing training results. We can first perform a KMean and PC analysis for the learned latent space at training epoch 29 using

dsdsh analyze . 29 10 20

The above command clustered the latent space into 20 classes and 12 principal components. The reconstructions at the KMeans cluster centers can be obtained by

dsdsh eval_vol . 29 kmeans 20 4.

To reconstruct volumes along a principal components, you can execute

dsdsh eval_vol . 29 pc 9 4.

The above command create 10 volumes along PC9. You can visualize them as movie using vseries command in ChimeraX.

To get the subtomograms in each cluster, you can extract the STAR file for each cluster using

dsdsh parse_pose added.star 128 3.37 . 29 20

The expected filtering result is shown below: image

The starfile for each cluster can be easily visualized in ArtiaX https://github.com/FrangakisLab/ArtiaX/ ! You can join the starfiles from several clusters corresponding to ribosomes and select rows from the tomogram that you would like to visualize, which can be done by modifying the star2csv.py script in shared google drive. Save the starfile, you can directly visualize the subtomograms picked by OPUS-TOMO in ArtiaX!