-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenFF and AmberTools #384
Comments
Dear BioSimSpace developers, is it possible to reduce the dependencies that BioSimSpace needs? I recently built docker images and found that a fresh installation of BioSimSpace takes about 6 G disk space, which is very large for me. Many thanks! |
It's very challenging to do this because of how the conda build system works and how dependencies are handled. A lot of the bloat comes from the switch to using the conda-forge version of OpenMM, which pulls in Ideally package maintainers would create minimal base packages, with additional sub-packages for extra functionality. However, this quickly becomes a maintenance nightmare and I don't begrudge small development teams choosing not to do this, since the bandwidth simply isn't there. I also understand the need for developers to build and test against a known set of dependencies, i.e. bundling dependencies that could be provided externally, e.g. With BioSimSpace, we have provided a ModuleStub class in our internal Additionally, the whole of BioSimSpace depends on Sire, which is used everywhere, and it is Sire that pulls in the majority of the dependencies. (Other than RDKit, most of BioSimSpace's dependencies are pure Python.) |
Just to add that as you can see from Lester’s reply we are aware of this issue and are thinking about how to mitigate problems but i expect it will take some time to come up with a robust way to streamline distribution. |
From a quick search, it looks like it's the new version of
Yes, as @jmichel80 says, this is something that we are aware of and would like to develop a good long-term solution. Ultimately I think this will require collaboration between various projects, perhaps using a conda sub-channel, building against a mininal and fixed (or slowly evolving) set of dependencies. |
For BioSimSpace, we may even be able to remove the A lot of work has gone into minimising the size of the Sire binaries (currently less than 40MB), yet you can depend on one package that pulls in a GB of stuff 🤷♂️ |
Many thanks for your efforts! It's good to know that we will have a lighter version in the future. 😄 |
Just to note that I have now discovered that we can avoid the |
Closing this for now since we have figured out how to remove the |
Okay, it turns out that we can't drop the
will give:
This results in the latest version of If you specify the latest version of
you'll end up with:
This results in the latest version of As a solution we could either depend on With re-adding, we'll also need to check for issues with |
Having looked at the two |
Here's the diff between the two recipes:
The major change is that ambertools 22.0 py38hf8a91bc_3 conda-forge
it is evident that The problem would be resolved if the |
It looks like I really don't want to keep adding/removing or changing dependencies as a temporary workaround as this will cause environment resolution issues going forward, e.g. when the one with fewer downloads/changes may be preferred. |
I've now resolved this by adding This approach seems to work well and could be expand for any other problematic packages, e.g. common packages that users might want to have in their environment, but which aren't needed by BioSimSpace. We could even provide a method for users to build their own Sire conda package with a pre-specified environment, so that they can guarantee that it will work correctly. |
Feel free to reach out to myself or @j-wags if you encounter deployment issues with our stack in the future. I can't guarantee we'll be able to change things, but we're also generally interested in lowering the friction with installing our tools. |
Recent changes mean that we can no longer provide support for certain OpenFF functionality without also depending on the
openff-interchange
package, which pulls in theconda-forge
ambertools
package as a dependency. Previously, we only depended onopenff-toolkit-base
, which does not requireambertools
. This allows the user to provideambertools
via an external installation, which will likely be the case. We added our own check of the version number, since OpenFF requires a recent enoughantechamber
to work properly.I can understand the reasons for this change, i.e. it makes sense for OpenFF to depend on a known and tested version of
ambertools
. Going forward, we will need to be careful to ensure that we can correctly handle dual installation ofambertools
, i.e. lettingAMBERHOME
take precedence, but correctly finding the binaries in the conda environment as a fallback. I'll need to check if/how the condaambertools
setsAMBERHOME
, since it might be necessary for the user to directly set this in BIoSimSpace scripts, or when running them. Thankfully recent condaambertools
packages seem to be compatible with Sire/BioSimSpace, which wasn't the case a few years ago, e.g. limited Python variant support, breaking library compatibility, etc.(Note that it's probably easiest if we just change our dependency to
openff-toolkit
, rather thanopenff-toolkit-base
andopenff-interchange
.)The text was updated successfully, but these errors were encountered: