Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly sort gmx xvg files for analysis #384

Closed
wants to merge 2 commits into from

Conversation

fjclark
Copy link

@fjclark fjclark commented Jan 24, 2025

This pull request fixes issue #383

  • I confirm that I have merged the latest version of devel into this branch before issuing this pull request (e.g. by running git pull origin devel): y
  • I confirm that I have permission to release this code under the GPL3 license: y

Suggested reviewers:

@lohedges

@fjclark fjclark temporarily deployed to biosimspace-build January 24, 2025 16:26 — with GitHub Actions Inactive
@fjclark fjclark temporarily deployed to biosimspace-build January 24, 2025 16:26 — with GitHub Actions Inactive
@fjclark fjclark temporarily deployed to biosimspace-build January 24, 2025 16:26 — with GitHub Actions Inactive
@fjclark fjclark temporarily deployed to biosimspace-build January 24, 2025 16:29 — with GitHub Actions Inactive
@fjclark fjclark temporarily deployed to biosimspace-build January 24, 2025 16:29 — with GitHub Actions Inactive
@fjclark fjclark temporarily deployed to biosimspace-build January 24, 2025 16:29 — with GitHub Actions Inactive
@fjclark fjclark temporarily deployed to biosimspace-build January 24, 2025 16:29 — with GitHub Actions Inactive
@fjclark
Copy link
Author

fjclark commented Jan 24, 2025

Sorry, I forgot to blacken again.

@lohedges
Copy link
Contributor

Thanks, @fjclark. Is this only an issue for GROMACS, or are any of the other engines affected?

@fjclark
Copy link
Author

fjclark commented Jan 24, 2025

Looking at

function_glob_dict = {
            "SOMD": (Relative._analyse_somd, "**/simfile.dat"),
            "SOMD2": (Relative._analyse_somd2, "**/*.parquet"),
            "GROMACS": (Relative._analyse_gromacs, "**/[!bar]*.xvg"),
            "AMBER": (Relative._analyse_amber, "**/*.out"),
        }

I think the code:

  • Should currently be fine for SOMD as directories are named by lambda values
  • Is fine for SOMD2 as the the parquet file is named by the lambda value
  • I'm not sure about amber as I don't know how the output is named

An alternative fix, which is probably simpler, would be to ensure a consistent number of digits so that the string-based sorting works, e.g. the lower example:

In [26]: sorted(["lambda_19", "lambda_2"])
Out[26]: ['lambda_19', 'lambda_2']

In [27]: sorted(["lambda_19", "lambda_02"])
Out[27]: ['lambda_02', 'lambda_19']

Would you prefer this? Thanks.

@lohedges
Copy link
Contributor

I guess I'm confused. By default, when setting things up via BioSimSpace.FreeEnergy.Relative, then output directories are always namged by the lambda value not index for all of the engines. (I know that Exs changed this in their sandpit.) For example:

In [1]: import BioSimSpace as BSS

In [2]: import sire as sr

In [3]: mols = sr.load_test_files("merged_molecule.s3")

In [4]: system = BSS._SireWrappers.System(mols._system)

In [5]: freenrg_gmx = BSS.FreeEnergy.Relative(system, BSS.Protocol.FreeEnergy(), engine="gromacs", work_dir="fep_gmx")

In [6]: ls -l fep_gmx/
total 44
drwxr-xr-x 2 lester lester 4096 Jan 24 18:13 lambda_0.0000/
drwxr-xr-x 2 lester lester 4096 Jan 24 18:13 lambda_0.1000/
drwxr-xr-x 2 lester lester 4096 Jan 24 18:13 lambda_0.2000/
drwxr-xr-x 2 lester lester 4096 Jan 24 18:13 lambda_0.3000/
drwxr-xr-x 2 lester lester 4096 Jan 24 18:13 lambda_0.4000/
drwxr-xr-x 2 lester lester 4096 Jan 24 18:13 lambda_0.5000/
drwxr-xr-x 2 lester lester 4096 Jan 24 18:13 lambda_0.6000/
drwxr-xr-x 2 lester lester 4096 Jan 24 18:13 lambda_0.7000/
drwxr-xr-x 2 lester lester 4096 Jan 24 18:13 lambda_0.8000/
drwxr-xr-x 2 lester lester 4096 Jan 24 18:13 lambda_0.9000/
drwxr-xr-x 2 lester lester 4096 Jan 24 18:13 lambda_1.0000/

So, if each directory contains a gromacs.xvg file, the sorted glob gives:

import pathlib

glob_path = pathlib.Path("fep_gmx")
files = sorted(glob_path.glob("**/[!bar]*.xvg"))

for file in file:
    print(file)

gives:

fep_gmx/lambda_0.0000/gromacs.xvg
fep_gmx/lambda_0.1000/gromacs.xvg
fep_gmx/lambda_0.2000/gromacs.xvg
fep_gmx/lambda_0.3000/gromacs.xvg
fep_gmx/lambda_0.4000/gromacs.xvg
fep_gmx/lambda_0.5000/gromacs.xvg
fep_gmx/lambda_0.6000/gromacs.xvg
fep_gmx/lambda_0.7000/gromacs.xvg
fep_gmx/lambda_0.8000/gromacs.xvg
fep_gmx/lambda_0.9000/gromacs.xvg
fep_gmx/lambda_1.0000/gromacs.xvg

which is the correct order. Are you setting things up in a different way so that the directory naming structure is different for GROMACS? If so, then I agree that you would need to use a consistent number of digits in your naming scheme.

@fjclark
Copy link
Author

fjclark commented Jan 27, 2025

I know that Exs changed this in their sandpit.

Sorry, this was exactly the issue - I was running gmx ABFE with the sandpit, but analysing with standard BSS for consistency with SOMD2. The sandpit analysis works fine for the gmx output.

@fjclark fjclark closed this Jan 27, 2025
@lohedges
Copy link
Contributor

Ah, that makes sense. No problem. I think we'll eventually move over to naming things by index, so I'll make sure to use an appropriate number of digits when I do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants