Molecular Generation
Generative models for molecules. Most typically text-based inputs (SMILES/SELFIES) or graph reps (parallel models on atom and bond matrices). Usually have some property optimization ability (latent space search/interpolation, reinformcement learning, guided genetic exploration). Most commonly these methods are autoregressive, but more recently non-autoregressive molecular generation methods have started to arise.
Reviews
-
Machine learning-aided generative molecular design
Yuanqi Du, Arian R. Jamasb, Jeff Guo, Tianfan Fu, Charles Harris, Yingheng Wang, Chenru Duan, Pietro Liò, Philippe Schwaller, and Tom L. Blundell
Nat. Mach. Intell. 2024, 6, 589 -
Deep Generative Models in De Novo Drug Molecule Generation
Chao Pang, Jianbo Qiao, Xiangxiang Zeng, Quan Zou, and Leyi Wei
J. Chem. Inf. Model. 2024, 64 (7), 2174
Diffusion / Flow Matching Models
-
Exploring Discrete Flow Matching for 3D De Novo Molecule Generation + GitHub Repo
Ian Dunn, and David R. Koes
NeurIPS 2024
Benchmarks discrete flow matching techniques for 3D molecular design. Paper introduces FlowMol-CTMC, a state-of-the-art model with fewer parameters than existing methods, demonstrating superior performance on small molecule generation. Other models often generate chemically valid but atypical functional groups outside the training distribution. Paper highlights gaps in structural motif prediction while advancing generative approaches for drug discovery. -
ShEPhERD: Diffusing shape, electrostatics, and pharmacophores for bioisosteric drug design + GitHub Repo (model) + GitHub Repo (scoring functions)
Keir Adams, Kento Abeywardane, Jenna Fromer, and Connor W. Coley
ArXiv 2024 -
A dual diffusion model enables 3D molecule generation and lead optimization based on target pockets + GitHub Repo
Lei Huang, Tingyang Xu, Yang Yu, Peilin Zhao, Xingjian Chen, Jing Han, Zhi Xie, Hailong Li, Wenge Zhong, Ka-Chun Wong & Hengtong Zhang
Nat. Commun. 2024, 15, 2657 -
Mixed Continuous and Categorical Flow Matching for 3D De Novo Molecule Generation + GitHub Repo
Ian Dunn and David Ryan Koes
ArXiv 2024
Extends the flow matching framework to categorical data by constructing flows that are constrained to exist on a continuous representation of categorical data known as the probability simplex. Finds that, in practice, a simpler approach that makes no accommodations for the categorical nature of the data yields equivalent or superior performance. Presents FlowMol, a flow matching model for 3D de novo generative model that achieves improved performance over prior flow matching methods. -
GeoLDM: Geometric Latent Diffusion Models for 3D Molecule Generation + GitHub Repo
Minkai Xu, Alexander Powers, Ron Dror, Stefano Ermon, and Jure Leskovec
ICML 2023
Stable (latent) diffusion model for 3D point clouds and 2D graphs. Capable of free and property conditioned generation (split-train-condition). -
Equivariant Diffusion for Molecule Generation in 3D + GitHub Repo
Emiel Hoogeboom, Vı́ctor Garcia Satorras, Clément Vignac, and Max Welling
in Proceedings of the 39th International Conference on Machine Learning, PMLR 162:8867-8887, 2022
Non-autoregressive diffusion model (rotation invariant). Reps: \(x = (x_1 ... x_M) \in \mathbb{R}^{M \times 3}\) (atom position matrix) with corresponding feature vectors \(h = (h_1 ... h_M) \in \mathbb{R}^{M \times num feat}\).
Normalizing Flows
-
MolGrow: A Graph Normalizing Flow for Hierarchical Molecular Generation (No implementation available)
Maksim Kuznetsov and Daniil Polykovskiy
Proceedings of the AAAI Conference on Artificial Intelligence 2021, 35 (9), 8226-8234
Heirarchical normalizing flow for molecular graphs, autoregressive. Builds either BFS or fragment based (better). Model is composed of “plug-and-play” modules. Trained on MOSES, QM9, Zinc250k. Property-constrained optimization is based on genetic algorithm. -
FastFlows: Flow-Based Models for Molecular Graph Generation
Nathan C. Frey, Vijay Gadepally, and Bharath Ramsundar
ELLIS Machine Learning for Molecule Discovery Workshop 2021
Framework for normalizing flows from SELFIES. Uses substructure filtering to speed up training and work from small training sets. Built in MPO functionality.
TDS article -
MoFlow: An Invertible Flow Model for Generating Molecular Graphs + GitHub Repo
Chengxi Zang and Fei Wang
in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 2020
Non-autoregressive normalizing Flow for molecular graphs; two-stage flow (bonds (based on GLOW network from Nvidia) > bond-conditioned flow for atoms). Similar to GraphNVP. Trained (NLL) on QM9 and ZINC250k. Developed new architecture. Excellent results. -
GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation + GitHub repo
Chence Shi, Minkai Xu, Zhaocheng Zhu, Weinan Zhang, Ming Zhang, and Jian Tang
ICLR 2020
How to explain this better than reviewer #1…
"This paper proposes a generative model architecture for molecular graph generation based on autoregressive flows. The main contribution of this paper is to combine existing techniques (auto-regressive BFS-ordered generation of graphs, normalizing flows, dequantization by Gaussian noise, fine-tuning based on reinforcement learning for molecular property optimization, and validity constrained sampling). Most of these techniques are well-established either for data generation with normalizing flows or for molecular graph generation and the novelty lies in the combination of these building blocks into a framework."
GANs
-
druGAN: An Advanced Generative Adversarial Autoencoder Model for de Novo Generation of New Molecules with Desired Molecular Properties in Silico - No official implementation available
Artur Kadurin, Sergey Nikolenko, Kuzma Khrabrov, Alex Aliper, and Alex Zhavoronkov
Mol. Pharmaceutics 2017, 14 (9), 3098–3104 -
The cornucopia of meaningful leads: Applying deep adversarial autoencoders for new molecule development in oncology + GitHub Repo
Artur Kadurin, Alexander Aliper, Andrey Kazennov, Polina Mamoshina, Quentin Vanhaelen, Kuzma Khrabrov, and Alex Zhavoronkov
Oncotarget. 2017, 8, 10883-10890 -
MolGAN: An implicit generative model for small molecular graphs + GitHub Repo
Nicola De Cao and Thomas Kipf
ICML 2018 Workshop on Theoretical Foundations and Applications of Deep Generative Models
GAN for molecular graphs (tandem atom identity and bond matrices). Trained as W-GAN on QM9 with basic “RL” input. Implemented with R-GCNs.
Other
-
TamGen: drug design with target-aware molecule generation through a chemical language model + GitHub Repo
Kehan Wu, Yingce Xia, Pan Deng, Renhe Liu, Yuan Zhang, Han Guo, Yumeng Cui, Qizhi Pei, Lijun Wu, Shufang Xie, Si Chen, Xi Lu, Song Hu, Jinzhi Wu, Chi-Kin Chan, Shawn Chen, Liangliang Zhou, Nenghai Yu, Enhong Chen, Haiguang Liu, Jinjiang Guo, Tao Qin & Tie-Yan Liu
Nat. Commun. 2024, 15, 9360 -
TurboHopp: Accelerated Molecule Scaffold Hopping with Consistency Models
Kiwoong Yoo, Owen Oertell, Junhyun Lee, Sanghoon Lee & Jaewoo Kang
NeurIPS 2024 -
DrugSynthMC: An Atom-Based Generation of Drug-like Molecules with Monte Carlo Search
Milo Roucairol, Alexios Georgiou, Tristan Cazenave, Filippo Prischi & Olivier E. Pardo
J. Chem. Inf. Model. 2024, 64, 18, 7097 -
Enabling target-aware molecule generation to follow multi objectives with Pareto MCTS + GitHub Repo
Yaodong Yang, Guangyong Chen, Jinpeng Li, Junyou Li, Odin Zhang, Xujun Zhang, Lanqing Li, Jianye Hao, Ercheng Wang & Pheng-Ann Heng
Commun. Biol. 2024, 7, 1074 -
Llamol: a dynamic multi-conditional generative transformer for de novo molecular design
Niklas Dobberstein, Astrid Maass & Jan Hamaekers
J. of Cheminf., 2024, 16, 73
Transformer based on Llama2, tweaked for molgen. Not the most impressive paper, but some interesting tidbits scatted throughout (e.g., SCL, etc…) -
REINVENT4: Modern AI–driven generative molecule design + GitHub Repo
Hannes H. Loeffler, Jiazhen He, Alessandro Tibo, Jon Paul Janet, Alexey Voronov, Lewis H. Mervin & Ola Engkvist
J. of Cheminf., 2024, 16, 20
AstraZeneca’s molecular design tool for de novo design, scaffold hopping, R-group replacement, linker design and molecule optimization. -
Masked graph modeling for molecule generation + GitHub Repo
Omar Mahmood, Elman Mansimov, Richard Bonneau, and Kyunghyun Cho
Nat. Commun. 2021, 12, 3156
MPNN for moleular graphs. Generation by iterative sampling of subsets of graphs components, furuter generation steps are conditionalized on the rest of the graph. Trained on QM9 and ChEMBL. Paper provides analysis of GuacaMol benchmark metrics particularly their independence. Conclusions:- Validity, KL-divergence and Fréchet Distance scores correlate highly with each other
- These three metrics correlate negatively with the novelty score
- Uniqueness does not correlate strongly with any other metric
Reaction Informatics
These models predict mechanisms for chemical reactions, ideally similar to how we teach 2nd years to push arrows. There are reltatively few of expamples of this task but they fall into 3 major categories electron flows, graph edits, reaction netowrks. At inference these models are used for forward synthesis prediction, potntially for prediction of chemo/regio-selectivity. Largely trained on pattern recognition from atom-mapped inputs (USPTO) though there are exceptions (e.g., Baldi papers below).
Electron Flow Prediction
-
Non-Autoregressive Electron Redistribution Modeling for Reaction Prediction (NERF) + GitHub Repo
Hangrui Bi, Hengyi Wang, Chence Shi, Connor Coley, Jian Tang, and Hongyu Guo
in Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021
Models reactions as electron flow. Predicts flow non-autoregressively. Trained on USPTO-MIT; SOTA results. Graph rep from SMILES input. Fast (27x). -
Data-Efficient, Chemistry-Aware Machine Learning Predictions of Diels–Alder Reaction Outcomes + GitHub Repo
Angus Keto, Taicheng Guo, Morgan Underdue, Thijs Stuyver, Connor W. Coley, Xiangliang Zhang, Elizabeth H. Krenske, and Olaf Wiest*
J. Am. Chem. Soc., 2024, 146, 23, 16052–16061
NERF applied to Diels-Alder cycloadditions. Typical training set reaches 90% accuracy (in line with general predictive accuracy…). No special improvements to NERF code released. -
A Generative Model For Electron Paths (ELECTRO) + GitHub Repo
John Bradshaw, Matt J. Kusner, Brooks Paige, Marwin H. S. Segler, and José Miguel Hernández-Lobato
ICLR 2019
Learns to generates probabilbty distribution of electron paths. Trained on USPTO (w/ and w/o reaction condition data), 2e- chemistry only. Graph rep from SMILES input. -
Molecule Edit Graph Attention Network (MEGAN): Modeling Chemical Reactions as Sequences of Graph Edits + GitHub Repo
Mikołaj Sacha, Mikołaj Błaż, Piotr Byrski, Paweł Dąbrowski-Tumański, Mikołaj Chromiński, Rafał Loska, Paweł Włodarczyk-Pruszyński, and Stanisław Jastrzębski
J. Chem. Inf. Model. 2021, 61, (7), 3273–3284
Not technically an electron flow model. Models chemical reations as series of graph edits, most similar to existing environment. Learns to predict sequences autoregressively.
Sources and Sinks
The Baldi papers map e- sources and sinks, combinatorially generates probability distribution of electron flows. Described classifiers are used to filter source-sink pairs before eval. Trained on in-house (unavailable) data. Papers don’t have available source code but ready-to-use programs are available on ChemDB.
-
Deep learning for chemical reaction prediction
David Fooshee, Aaron Mood, Eugene Gutman, Mohammadamin Tavakoli, Gregor Urban, Frances Liu, Nancy Huynh, David Van Vrankenb, and Pierre Baldi
Mol. Syst. Des. Eng. 2018, 3, 442-452 -
A Machine Learning Approach to Predict Chemical Reactions
Matthew Kayala and Pierre Baldi
NeurIPS 2011 -
Learning to Predict Chemical Reactions
Matthew A. Kayala, Chloé-Agathe Azencott, Jonathan H. Chen, and Pierre Baldi
J. Chem. Inf. Model. 2011, 51 (9), 2209–2222 -
PMechDB: A Public Database of Elementary Polar Reaction Step
Mohammadamin Tavakoli, Ryan J. Miller, Mirana Claire Angel, Michael A. Pfeiffer, Eugene S. Gutman, Aaron D. Mood, David Van Vranken, and Pierre Baldi
J. Chem. Inf. Model. 2024, 64, 6, 1975–1983
Dataset paper for the Baldi papers. 100k polar elementary steps. PMechDB platform
Reaction Network Graphs
-
Discovery of novel chemical reactions by deep generative recurrent neural network
William Bort, Igor I. Baskin, Timur Gimadiev, Artem Mukanov, Ramil Nugmanov, Pavel Sidorov, Gilles Marcou, Dragos Horvath, Olga Klimchuk, Timur Madzhidov, and Alexandre Varnek
Sci. Rep. 2021, 11, 3178 -
Efficient prediction of reaction paths through molecular graph and reaction network analysis
Yeonjoon Kim, Jin Woo Kim, Zeehyo Kim, and Woo Youn Kim
Chem. Sci. 2018, 9, 825-835
Search method for reaction intermediate networks. Uses DFT energies as heuristic.
Other
-
Reaction rebalancing: a novel approach to curating reaction databases + GitHub Repo
Tieu-Long Phan, Klaus Weinbauer, Thomas Gärtner, Daniel Merkle, Jakob L. Andersen, Rolf Fagerberg, and Peter F. Stadler
J. Cheminf. 2024, 16, 82 -
Transfer learning enables the molecular transformer to predict regio- and stereoselective reactions on carbohydrates + GitHub Repo
Giorgio Pesciullesi, Philippe Schwaller, Teodoro Laino, and Jean-Louis Reymond
Nat. Commun. 2020, 11, 4874
Seq2Seq model for SMILES strings. Transfer learning allows for success on few-instance reactions. Blog post
Atom Mapping
-
Precise atom-to-atom mapping for organic reactions via human-in-the-loop machine learning + GitHub Repo
Shuan Chen, Sunggi An, Ramil Babazade, and Yousung Jung
Nat. Commun. 2024, 15, 2250 -
Bidirectional Graphormer for Reactivity Understanding: Neural Network Trained to Reaction Atom-to-Atom Mapping Task + GitHub Repo
Ramil Nugmanov, Natalia Dyubankova, Andrey Gedich, and Joerg Kurt Wegner
J. Chem. Inf. Model. 2022, 62, 14, 3307–3315
Chython RxnMapper/Graphormer Mapper that’s heavily based on RXNMapper and CGR Tools. Semisupervised graph attention based model trained on USPTO and Pistachio datasets. -
Atom-to-atom Mapping: A Benchmarking Study of Popular Mapping Algorithms and Consensus Strategies
Arkadii Lin, Natalia Dyubankova, Timur I. Madzhidov, Ramil I. Nugmanov, Jonas Verhoeven, Timur R. Gimadiev, Valentina A. Afonina, Zarina Ibragimova, Assima Rakhimbekova, Pavel Sidorov, Andrei Gedich, Rail Suleymanov, Ravil Mukhametgaleev, Joerg Wegner, Hugo Ceulemans, and Alexandre Varnek
Mol. Inf. 2022, 41, 2100138 -
Extraction of organic chemistry grammar from unsupervised learning of chemical reactions + GitHub Repo +
RxnMapper appWebpage looks like it’s down
Philippe Schwaller, Benjamin Hoover, Jean-Louis Reymond, Hendrik Strobelt, and Teodoro Laino
Sci. Adv. 2021, 7 (15), eabe4166 -
Reaction Data Curation I: Chemical Structures and Transformations Standardization + GitHub Repo
Timur R. Gimadiev, Arkadii Lin, Valentina A. Afonina, Dinar Batyrshin, Ramil I. Nugmanov, Tagir Akhmetshin, Pavel Sidorov, Natalia Duybankova, Jonas Verhoeven, Joerg Wegner, Hugo Ceulemans, Andrey Gedich, Timur I. Madzhidov, and Alexandre Varnek
Mol. Inf. 2021, 40, 2100119
Computer-Aided Retrosynthesis Planning
-
Chimera: Accurate retrosynthesis prediction by ensembling models with diverse inductive biases
Krzysztof Maziarz, Guoqing Liu, Hubert Misztela, Aleksei Kornev, Piotr Gaiński, Holger Hoefling, Mike Fortunato, Rishi Gupta, Marwin Segler
arXiv 2024 -
RLSynC: Offline–Online Reinforcement Learning for Synthon Completion
Frazier N. Baker, Ziqi Chen, Daniel Adu-Ampratwum, and Xia Ning*
J. Chim. Inf. Model. 2024, 64 (17), 6723 -
Constrained synthesis planning with disconnection-aware transformer and multi-objective search
Annie M. Westerlund, Lakshidaa Saigiridharan, Samuel Genheden
ChemRxiv preprint 2024
Retrosynthesis planning with the ability to constrain bonds, i.e., built in divergent synthesis constraints leading to shorter routes.
AiZynthFinder with MO-MCTS and broken bonds score was used to run multistep experiments and can be found at: https://github.com/MolecularAI/aizynthfinder. Chemformer can be found at: https://github.com/MolecularAI/Chemformer. AiZynthTrain was used to tag disconnection-sites in the Chemformer training data and can be found at: https://github.com/MolecularAI/aizynthtrain. -
Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing + GitHub Repo
Weihe Zhong, Ziduo Yang, and Calvin Yu-Chian Chen
Nat. Commun. 2023, 14, 3009
Publication Parsing
-
OpenChemIE: An Information Extraction Toolkit for Chemistry Literature + Web App
Vincent Fan, Yujie Qian, Alex Wang, Amber Wang, Connor W. Coley, and Regina Barzilay
J. Chem. Inf. Model. 2024, 64 (14), 5521 -
Advancements in hand-drawn chemical structure recognition through an enhanced DECIMER architecture
Kohulan Rajan, Henning Otto Brinkhaus, Achim Zielesny & Christoph Steinbeck
J. Cheminf. 2024, 16, 76
ML Driven Drug Design
-
MolE: a foundation model for molecular graphs using disentangled attention + GitHub Repo
Oscar Méndez-Lucio, Christos A. Nicolaou & Berton Earnshaw
Nat. Commun. 2024, 15, 9431
Non-commercial license -
MolCompass: multi-tool for the navigation in chemical space and visual validation of QSAR/QSPR models + GitHub Repo
Sergey Sosnin
J. Cheminf. 2024, 16, 98 -
HydraScreen: A Generalizable Structure-Based Deep Learning Approach to Drug Discovery + GitHub Repo
Alvaro Prat, Hisham Abdel Aty, Orestis Bastas, Gintautas Kamuntavičius, Tanya Paquet, Povilas Norvaišas, Piero Gasparotto, and Roy Tal
J. Chem. Inf. Model. 64 (15), 5817 -
Exposing the Limitations of Molecular Machine Learning with Activity Cliffs
Derek van Tilborg, Alisa Alenicheva, and Francesca Grisoni
J. Chem. Inf. Model. 2022, 62 (23), 5938
Overview of SAR cliffs and challenges for ML
Property/Activity Prediction
-
Reusability report: exploring the utility of variational graph encoders for predicting molecular toxicity in drug design + GitHub Repo (NYAN) + GitHub Repo (Reuse) + GitHub Repo (Acute Tox MTL)
Ruijiang Li, Jiang Lu, Ziyi Liu, Duoyun Yi, Mengxuan Wan, Yixin Zhang, Peng Zan, Song He & Xiaochen Bo
Nat. Mach. Intell. 2024 -
ChemXTree: A Feature-Enhanced Graph Neural Network-Neural Decision Tree Framework for ADMET Prediction
Yuzhi Xu, Xinxin Liu, Wei Xia, Jiankai Ge, Cheng-Wei Ju, Haiping Zhang, John Z.H. Zhang
J. Chem. Inf. Model. 2024, ASAP -
Quantitative structure–activity relationships of chemical bioactivity toward proteins associated with molecular initiating events of organ-specific toxicity + GitHub Repo
Domenico Gadaleta, Marina Garcia de Lomana, Eva Serrano-Candelas, Rita Ortega-Vallbona, Rafael Gozalbes, Alessandra Roncaglioni & Emilio Benfenati
J. Cheminformatics 2024, 16, 122 -
A bioactivity foundation model using pairwise meta-learning + GitHub Repo
Bin Feng, Zequn Liu, Nanlan Huang, Zhiping Xiao, Haomiao Zhang, Srbuhi Mirzoyan, Hanwen Xu, Jiaran Hao, Yinghui Xu, Ming Zhang & Sheng Wang
Nat. Mach. Intell. 2024, 6, 962 -
Ligand-Based Compound Activity Prediction via Few-Shot Learning + GitHub Repo
Peter Eckmann, Jake Anderson, Rose Yu, and Michael K. Gilson
J. Chem. Inf. Model. 2024, 64, (14), 5492 -
QSARtuna: An Automated QSAR Modeling Platform for Molecular Property Prediction in Drug Design + GitHub Repo + Docs
Lewis Mervin, Alexey Voronov, Mikhail Kabeshov, and Ola Engkvist
J. Chem. Inf. Model. 2024, 64, (14), 5365 -
Quantum-Informed Molecular Representation Learning Enhancing ADMET Property Prediction
Jungwoo Kim, Woojae Chang, Hyunjun Ji, and InSuk Joung
J. Chem. Inf. Model. 2024, 64, (13), 5028
Supplementing a Graph Transformer with pretraing on DFT features. SoTA performance on 7 of 22 ADME-Tox tasks in TDC. Of particular interest is the data methods and architecture.
Active Learning Methods
-
Human-in-the-loop active learning for goal-oriented molecule generation + GitHub Repo
Yasmine Nahal, Janosch Menke, Julien Martinelli, Markus Heinonen, Mikhail Kabeshov, Jon Paul Janet, Eva Nittinger, Ola Engkvist & Samuel Kaski
J. Cheminformatics 2024, 16, 138 -
Finding the most potent compounds using active learning on molecular pairs + GitHub Repo
Zachary Fralish, and Daniel Reker
Beilstein J. Org. Chem. 2024, 20, 2152
Extension of the matched pair methodology from the Reker group. Applied to Chemprop and XGB. -
DeepDelta: predicting ADMET improvements of molecular derivatives with deep learning + GitHub Repo
Zachary Fralish, Ashley Chen, Paul Skaluba, and Daniel Reker
J. Cheminformatics 2023, 15, 101
Uses matched molecular pairs to predict property diverences. D-MPNN architecture is based on ChemProp, modified to take in 2 molecules.
Synthetic Accessibility
-
Analog Accessibility Score (AAscore) for Rational Compound Selection + GitHub Repo
Takato Ue, Akinori Sato, and Tomoyuki Miyao
J. Chem. Inf. Model. 2024, ASAP -
Estimating the synthetic accessibility of molecules with building block and reaction-aware SAScore + GitHub Repo
Shuan Chen and Yousung Jung
J. Cheminf. 2024, 16, 83
Molecular Optimization
-
Projecting Molecules into Synthesizable Chemical Spaces
Shitong Luo, Wenhao Gao, Zuofan Wu, Jian Peng, Connor W. Coley, and Jianzhu Ma
ArXiv Preprint, 2024
Interesting new approach to making molecules more synthesizable from genenerated virtual hits. Cleaning the chaff energy. Describes a new postfix notation (A B +) for synthetic transformations. Transformer-based model that translates graphs to postfix notation. Model capable of synthesis planning, generating similar and more synthesizable analogues, exploring chemical space in the syntesizablilty dimension. -
Evolutionary Multiobjective Molecule Optimization in an Implicit Chemical Space + GitHub Repo
Xin Xia, Yiping Liu, Chunhou Zheng, Xingyi Zhang, Qingwen Wu, Xin Gao, Xiangxiang Zeng, and Yansen Su
J. Chem. Inf. Model. 2024, 64, (13), 5161
Multiobjective molecule optimization framework (MOMO) is a pareto-based MPO tool that evolves moelcules into better molecules. Genetic/ecolutionary algorithm in the latent (implicit) space ended by a VAE.
Virtual Screening
-
Introducing SpaceGA: A Search Tool to Accelerate Large Virtual Screenings of Combinatorial Libraries + GitHub Repo
Laust Moesgaard and Jacob Kongsted
J. Chem. Inf. Model. 2024, 64 (21), 8123 -
molli: A General Purpose Python Toolkit for Combinatorial Small Molecule Library Generation, Manipulation, and Feature Extraction + GitHub Repo
Alexander S. Shved, Blake E. Ocampo, Elena S. Burlova, Casey L. Olen, N. Ian Rinehart & Scott E. Denmark
J. Chem. Inf. Model. 2024, 64 (21), 8083 -
PheSA: An Open-Source Tool for Pharmacophore-Enhanced Shape Alignment = GitHub Repo
Joel Wahl
J. Chem. Inf. Model. 2024, 64 (15), 5944-5953 -
ROSHAMBO: Open-Source Molecular Alignment and 3D Similarity Scoring
Rasha Atwi, Ye Wang, Simone Sciabola, and Adam Antoszewskim
ChemRxiv Preprint 2024 -
Pareto Optimization to Accelerate Multi-Objective Virtual Screening + GitHub Repo
Jenna C. Fromer, David E. Graff, and Connor W. Coley
arXiv preprint 2023
Application of MolPAL (Molecular Pool-based Active Learning) to multi-objective virtual screening. Implements multiobjective Bayesian optimization to reduce the computational cost and apply it to the identification of ligands predicted to be selective based on docking scores to on- and off-targets. Identifies all pareto front molecules after 8% library exploration. Demonstrates superority of pareto optimization over scalarization. -
Accelerating high-throughput virtual screening through molecular pool-based active learning + GitHub Repo
David E. Graff, Eugene I. Shakhnovicha, and Connor W. Coley
Chem. Sci., 2021, 12, 7866
Active learning tool for acceleration of virtual screening campaigns.
Cheminformatics
Reviews
-
Modern chemical graph theory
Leonardo S. G. Leite, Swarup Banerjee, Yihui Wei, Jackson Elowitt, Aurora E. Clark
WIREs Comput. Mol. Sci. 2024, 14, (5), e1729 -
Research Progresses and Applications of Knowledge Graph Embedding Technique in Chemistry
Chuanghui Wang, Yunqing Yang, Jinshuai Song & Xiaofei Nan
J. Chem. Inf. Model. 2024, 64, (19), 7213
General
-
UNIQUE: A Framework for Uncertainty Quantification Benchmarking + GitHub Repo
Jessica Lanini, Minh Tam Davide Huynh, Gaetano Scebba, Nadine Schneider, and Raquel Rodríguez-Pérez
J. Chim. Inf. Model. 2024 ASAP
Combo platform for model-agnostic uncertaintly quantification. Quasi-review paper. -
Topological Similarity Search in Large Combinatorial Fragment Spaces
Louis Bellmann, Patrick Penner, Matthias Rarey
J. Chem. Inf. Model. 2021, 61, (1), 238 -
Open-Source Approach to GPU-Accelerated Substructure Search + GitHub Repo
Andrew J. Whitehouse, Melchor Sanchez-Martinez, Seyedeh Maryam Salehi, Natalja Kurbatova, and Euan Dean
J. Chem. Inf. Model. 2024, 64, (18), 6993 -
When Do Quantum Mechanical Descriptors Help Graph Neural Networks to Predict Chemical Properties?
Shih-Cheng Li, Haoyang Wu, Angiras Menon, Kevin A. Spiekermann, Yi-Pei Li & William H. Green
J. Am. Chem. Soc. 2024, 146, (33), 23103 -
Hilbert-curve assisted structure embedding method + GitHub Repo
Gergely Zahoránszky-Kőhalmi, Kanny K. Wan & Alexander G. Godfrey
J. Cheminf. 2024, 16, 87
Δ-machine learning
-
Combining Hammett σ constants for Δ-machine learning and catalyst discovery + GitHub Repo
V. Diana Rakotonirina, Marco Bragato, Stefan Heinenc & O. Anatole von Lilienfeld
Digital Discovery 2024, ASAP -
Big Data meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach
Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, O. Anatole von Lilienfeld
J. Chem. Theory Comput. 2015, 11, (5), 2087
Protein Structure Prediction
-
Accurate structure prediction of biomolecular interactions with AlphaFold 3 - No code released
Josh Abramson, Jonas Adler, Jack Dunger, … & John M. Jumper
Nat. 2024 630, 493
AlphaFold 3, a diffusion-based architecture that is capable of predicting the joint structure of complexes including proteins, nucleic acids, small molecules, ions and modified residues. Blog at Isomorphic -
State-specific protein-ligand complex structure prediction with a multiscale deep generative model + GitHub Repo
Zhuoran Qiao, Weili Nie, Arash Vahdat, Thomas F. Miller III & Animashree Anandkumar
Nat. Mach. Intell. 2024, 6, 195
NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures solely using protein sequence and ligand molecular graph inputs. Owing to its specificity in sampling both ligand-free-state and ligand-bound-state ensembles, NeuralPLexer consistently outperforms AlphaFold2 in terms of global protein structure accuracy on both representative structure pairs with large conformational changes and recently determined ligand-binding proteins. Some interesting takes in the OpenReview for their ICLR 2023 submission.
Very interesting follow up study/comment from Pat Walters about test set infiltration and data purity (see LinkedIn for some spicy debate about this preprint). -
DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking + GitHub Repo
Gabriele Corso, Hannes Stärk, Bowen Jing, Regina Barzilay, and Tommi Jaakkola
ICLR 2023
Cool paper that treats docking as a generative task instead of a search/regression task. DiffDock is a diffusion model over the non-Euclidean manifold of ligand poses. Really interesting way of thinking of things. -
Structure-based Drug Design with Equivariant Diffusion Models (DiffSBDD) + GitHub Repo
Arne Schneuing, Yuanqi Du, Charles Harris, Arian Jamasb, Ilia Igashov, Weitao Du, Tom Blundell, Pietro Lió, Carla Gomes, Max Welling, Michael Bronstein, and Bruno Correia
ArXiv Preprint 2022
Diffusion model for SBDD, serious issues with results in this paper see OpenReview -
Accelerating Inference in Molecular Diffusion Models with Latent Representations of Protein Structure + GitHub Repo
Ian Dunn and David Ryan Koes
NeurIPS 2023
GNN-based architecture for learning latent representations of molecular structure. Encodes protein represntation into reduced set of key points. When trained end-to-end with a diffusion model (DiffSBDD) for de novo ligand design, achieves comparable performance to one with an all-atom protein representation while exhibiting a 3-fold reduction in inference time. Unclear whether or not the original issues with DiffSBDD were address in this implementation…
Deep Learning
-
Mixture of A Million Experts
Xu Owen He
ArXiv Preprint 2024 -
Graph Neural Networks with Learnable Structural and Positional Representations + GitHub Repo
Vijay Prakash Dwivedi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio, Xavier Bresson
ICLR 2022 -
Do Transformers Really Perform Badly for Graph Representation? + GitHub Repo
Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu
NeurIPS 2021
Microsoft’s Graphormer paper -
Denoising Diffusion Probabilistic Models + GitHub Repo + Website
Jonathan Ho, Ajay Jain, Pieter Abbee
NeurIPS 2020 -
Attention is All You Need + GitHub Repo (archived)
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin
NeurIPS 2017
The one, the only… Original transformer paper. This code eventually becomes the 🤗 Transformers library -
Adversarial Autoencoders - No official implementation available
Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, and Brendan Frey
ArXiv Preprint 2016 -
PointerNets
Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly
NeurIPS 2015
LLMs and Agents
-
MemGPT: Towards LLMs as Operating Systems + GitHub Repo
Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez
arXiv 2024
Infinite context for lanuage models. Now pacakged as part of Letta. -
AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning + GitHub Repo
Shirley Wu, Shiyu Zhao, Qian Huang, Kexin Huang, Michihiro Yasunaga, Kaidi Cao, Vassilis N. Ioannidis, Karthik Subbian, Jure Leskovec, and James Zou
NeurIPS 2024 -
ReAct: Synergizing Reasoning and Acting in Language Models + Project Site
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao
ICLR 2023
Integrates REasoning and ACTing in large language models (LLMs) to enhance their performance and versatility. By generating reasoning traces and task-specific actions in an interleaved manner, ReAct allows LLMs to synergize these processes. Key contributions include: 1) Reasoning and Action Synergy: Reasoning helps update action plans and handle exceptions, while actions enable interaction with external sources, such as APIs, to gather information. 2) Improved Accuracy and Interpretability: On tasks like question answering (HotpotQA) and fact verification (Fever), ReAct reduces hallucination and error propagation while providing interpretable problem-solving steps. 3) Superior Decision-Making Performance: In interactive decision-making benchmarks (ALFWorld and WebShop), ReAct outperforms imitation and reinforcement learning methods with significant success rate improvements. ReAct demonstrates improved effectiveness over state-of-the-art baselines, better human interpretability, and increased trustworthiness by combining reasoning and action. -
LoRA: Low-Rank Adaptation of Large Language Models + GitHub Repo
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen
ICLR 2022
LoRA is now wrapped into the 🤗 PEFT library -
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela
NeurIPS 2020
Neural Reasoning & Decision Making
-
Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning
Zhenni Bi, Kai Han, Chuanjian Liu, Yehui Tang, Yunhe Wang
arXiv 2024
Unlike existing approaches like Chain-of-Thought (CoT) and Tree-of-Thought (ToT), which rely on a single pass of reasoning, FoT uses multiple interconnected reasoning trees to enable collaborative decision-making. Key features include: 1) Sparse Activation: Focuses on the most relevant reasoning paths for enhanced efficiency and accuracy. 2) Dynamic Self-Correction: Allows real-time error detection and learning from mistakes. 3) Consensus-Guided Decision-Making: Balances correctness and computational resource usage. The framework achieves significant improvements in reasoning accuracy and efficiency, making LLMs more effective at tackling complex tasks. -
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Eric Zelikman, Georges Harik, Yijia Shao, Varuna Jayasiri, Nick Haber, and Noah D. Goodman
CoLM 2024 -
LLaVA-CoT: Let Vision Language Models Reason Step-by-Step + GitHub Repo
Guowei Xu, Peng Jin, Hao Li, Yibing Song, Lichao Sun, and Li Yuan
NeurIPS 2024 -
Tree of Thoughts: Deliberate Problem Solving with Large Language Models + GitHub Repo
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan
NeurIPS 2023
ReasoningAgent (Tree of Thought with Beam Search) -
STaR: Bootstrapping Reasoning With Reasoning
Eric Zelikman, Yuhuai Wu, Jesse Mu, and Noah D. Goodman
NeurIPS 2022 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou
NeurIPS 2022
Contrastive Learning
-
A Simple Framework for Contrastive Learning of Visual Representations + GitHub Repo
Ting Chen, Simon Kornblith, Mohammad Norouzi & Geoffrey Hinton
in Proceedings of the 37th International Conference on Machine Learning, PMLR 119:1597-1607, 2020 -
FaceNet: A Unified Embedding for Face Recognition and Clustering
Florian Schroff, Dmitry Kalenichenko, James Philbin
in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, 815
Chemistry
Med Chem
-
Escape from Flatland: Increasing Saturation as an Approach to Improving Clinical Success
Frank Lovering, Jack Bikker, and Christine Humblet
J. Med. Chem. 2009, 52 (21), 6752-6756 -
Escape from Flatland 2: complexity and promiscuity
Frank Lovering
Med. Chem. Commun. 2013, 4, 515-519
My papers
-
Sodium Alkyl(trimethylsilyl)amides: Substituent- and Solvent-dependent Solution Structures and Reactivities
Qiulin You, Yun Ma, Ryan A. Woltornist, Nathan M. Lui, Jesse A. Spivey, Ivan Keresztes, David B. Collum
Journal of the American Chemical Society 2024, 146 (44), 30397 -
Natural Product Isolation of the Extract of Cleome rupicola Fruits Exhibiting Antioxidant Activity
Yumi Gambrill, Patrick Commins, Stefan Schramm, Nathan M. Lui, Shaikha S. AlNeyadi, Panče Naumov
Chemistry & Biodiversity 2024, 21 (4), e20230138 -
Structure-Selectivity Principles Underlying Alkylations of Oppolzer’s Camphorsultam Enolates
Nathan M. Lui
Cornell University ProQuest Dissertations & Theses 2023, 30570998 -
Sodiated Oppolzer enolates: solution structures, mechanism of alkylation, and origin of stereoselectivity
Nathan M. Lui and David B. Collum
Organic Chemistry Frontiers 2023, 10 (19), 4750 -
MoFlowGAN: Combining adversarial and likelihood learning for targeted molecular generation + GitHub Repo
Nathan M. Lui, Max D. Li, and Matt Ford
ChemRxiv Preprint 2023 -
Lithiated Oppolzer Enolates: Solution Structures, Mechanism of Alkylation, and the Origin of Stereoselectivity
Nathan M. Lui, Samantha N. MacMillan, and David B. Collum
Journal of the American Chemical Society 2022, 144 (51), 23379 -
Sodium Isopropyl(trimethylsilyl)amide (NaPTA): A Stable and Highly Soluble Lithium Diisopropylamide Mimic
Yun Ma, Nathan M. Lui, Ivan Keresztes, Ryan A. Woltornist, and David B. Collum
The Journal of Organic Chemistry 2022, 87 (21), 14223 -
Spectrochemistry of firefly bioluminescence
Marieh B. Al-Handawi, Srujana Polavaram, Anastasiya Kurlevskaya, Patrick Commins, Stefan Schramm, César Carrasco-López, Nathan M. Lui, Kyril M. Solntsev, Sergey P. Laptenok, Isabelle Navizet, and Panče Naumov
Chemical Reviews 2022, 122 (16), 13207 -
The elusive relationship between structure and colour emission in beetle luciferases
César Carrasco-López, Nathan M. Lui, Stefan Schramm, and Panče Naumov
Nature Reviews Chemistry 2021, 5 (1), 4 -
Thermochemiluminescent peroxide crystals
Stefan Schramm, Durga Prasad Karothu, Nathan M. Lui, Patrick Commins, Ejaz Ahmed, Luca Catalano, Liang Li, James Weston, Taro Moriwaki, Kyril M. Solntsev, and Panče Naumov
Nature Communications 2019, 10 (1), 997 -
pH-Dependent fluorescence from firefly oxyluciferin in agarose thin films
Nathan M. Lui, Stefan Schramm, and Panče Naumov
New Journal of Chemistry 2019, 43 (3), 1122 -
Beetle luciferases with naturally red-and blue-shifted emission
César Carrasco-López, Juliana C Ferreira, Nathan M. Lui, Stefan Schramm, Romain Berraud-Pache, Isabelle Navizet, Santosh Panjikar, Panče Naumov, Wael M. Rabeh
Life Science Alliance 2018, 1 (4), e201800072