raise error when atom name encoding is violating the expected dimension #143

wckdouglas · 2025-01-08T16:29:13Z

In this PR, we are aiming at raising an Error when an invalid atom name is generated from the given smiles.

When a larger molecule (this example) is feed into boltz, we encountered a SI101 atom name where it violates what we expected (4-char string) in convert_atom_name. test code:

from rdkit import Chem
from rdkit.Chem import AllChem

seq = "CC(C)[C@@H](C(=O)C1=NC2=C(O1)C=CC(=C2)CO[Si](C)(C)C(C)(C)C)NC(=O)[C@@H]3CCCN3C(=O)[C@H](C(C)C)NC(=O)OCC4=CC=CC=C4"
mol = AllChem.MolFromSmiles(seq)
mol = AllChem.AddHs(mol)

# Set atom names
canonical_order = AllChem.CanonicalRankAtoms(mol)
for atom, can_idx in zip(mol.GetAtoms(), canonical_order):
    name = atom.GetSymbol().upper() + str(can_idx + 1)
    if len(name) > 4:
        print(name)
    atom.SetProp("name", name)

The above molecule gives a SI101, and convert_atom_name returns a (51, 41, 17, 16, 17) which later numpy complains not being in the correct dimension.

raise error when atom name encoding is violating the expected dimension

95dacb6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

raise error when atom name encoding is violating the expected dimension #143

raise error when atom name encoding is violating the expected dimension #143

wckdouglas commented Jan 8, 2025 •

edited

Loading

raise error when atom name encoding is violating the expected dimension #143

Are you sure you want to change the base?

raise error when atom name encoding is violating the expected dimension #143

Conversation

wckdouglas commented Jan 8, 2025 • edited Loading

wckdouglas commented Jan 8, 2025 •

edited

Loading