This document introduces a script that enables users to collect SMILES strings from MolPort using web scraping techniques with Selenium. The script generates a .csv
file as output, containing the IDs and SMILES strings of the desired particles. This .csv
file can, for example, be used to build a library of reactants for reaction-based enumeration in lead optimization.
To use this script:
- Download an
.sdf
file containing the molecules you want to collect. - Convert this file to a
.csv
file (you can use tools like the DataWarrior suite for this step). - Launch the script and follow the provided instructions.
- molport_webscraper - code to webscraping.
- spiro_all.sdf - file with spirocyclic compounds, downloaded from MolPort.
- sprio_all.csv - file with spirocyclic compounds, input file for script.
To run this script, ensure the following packages are installed in your virtual environment:
pandas
selenium
webdriver_manager
You can install them by running the following command in terminal:
pip install pandas selenium webdriver_manager