Code accompanying the publication at https://doi.org/10.1021/acs.jctc.3c00201
The scripts were created specifically for this project, with specific expectations for directory paths and some hard-coded filenames and other settings.
- Python version >= 3.9
- package list
requirements.txt
isomorphism_find_unique.py
- A small number of structures make up the entire set, the variations between the small molecules are based on different combinations of coarse-grained bead types.
make_template_file.py
- Suitable angles and constraints are manually optimized for one example of each of the recurring structures and subsequently used as a template.
isomorphs_insert_constraints.py
- Iterating over the entire set of small molecule graphs, identifiying the correct template and inserting the settings for angles and constraints.
The Spectrum of London and Axilrod-Teller-Muto (SLATM) potential was defined by Huang, Symonds and von Lilienfeld, https://arxiv.org/abs/1807.04259.
analyze_structures.py
- Handles loading of trajectory files and the required steps to translate the coarse-grained MD trajectories into SLATM representations.
clean_trajectories.py
- Corrects for periodic boundary conditions, centers systems around the solutes, selects frames by solute position.
preprocessing.py
- Selects solutes and environment particles within the long-range interaction cutoff distance around the solutes' center of mass.
generate_representations.py
- Handles the generation of the list of possible many-body interactions and unique particle identifiers required by the QML SLATM method.
- Generates the SLATM representations.
- Saves the results as pickled pandas dataframes.
analyze_SLATMs.py
- Loads the SLATM representations and required additional files,
- Handles preprocessing of SLATM representations including normalization for PCA
- PCA embedding of the SLATM representations
generate_labels.py
- Loads principal components and additional information, generates descriptors for further analysis.
plot_cross_correlations.py
- Cross-correlates a descriptor to principal components, performs linear regression.
correlate_loadings_interactions.py
- Visualizes the most relevant interactions selected by their loading values (
eigenvecor * sqrt(eigenvalue)
) to provide an idea about 3D structural aspects.
- Visualizes the most relevant interactions selected by their loading values (
biplot_scores_weights.py
- Plots most pairs of principal components and their most relevant eigenvector coefficients. Colored by a previously generated descriptor.