Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raise error when atom name encoding is violating the expected dimension #143

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

wckdouglas
Copy link

@wckdouglas wckdouglas commented Jan 8, 2025

In this PR, we are aiming at raising an Error when an invalid atom name is generated from the given smiles.

When a larger molecule (this example) is feed into boltz, we encountered a SI101 atom name where it violates what we expected (4-char string) in convert_atom_name. test code:

from rdkit import Chem
from rdkit.Chem import AllChem

seq = "CC(C)[C@@H](C(=O)C1=NC2=C(O1)C=CC(=C2)CO[Si](C)(C)C(C)(C)C)NC(=O)[C@@H]3CCCN3C(=O)[C@H](C(C)C)NC(=O)OCC4=CC=CC=C4"
mol = AllChem.MolFromSmiles(seq)
mol = AllChem.AddHs(mol)

# Set atom names
canonical_order = AllChem.CanonicalRankAtoms(mol)
for atom, can_idx in zip(mol.GetAtoms(), canonical_order):
    name = atom.GetSymbol().upper() + str(can_idx + 1)
    if len(name) > 4:
        print(name)
    atom.SetProp("name", name)

The above molecule gives a SI101, and convert_atom_name returns a (51, 41, 17, 16, 17) which later numpy complains not being in the correct dimension.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant