Library

Molecular Generation

Generative models for molecules. Most typically text-based inputs (SMILES/SELFIES) or graph reps (parallel models on atom and bond matrices). Usually have some property optimization ability (latent space search/interpolation, reinformcement learning, guided genetic exploration). Most commonly these methods are autoregressive, but more recently non-autoregressive molecular generation methods have started to arise.

Reviews

Diffusion / Flow Matching Models

Normalizing Flows

  • MolGrow: A Graph Normalizing Flow for Hierarchical Molecular Generation (No implementation available)
    Maksim Kuznetsov and Daniil Polykovskiy
    Proceedings of the AAAI Conference on Artificial Intelligence 2021, 35 (9), 8226-8234
      Heirarchical normalizing flow for molecular graphs, autoregressive. Builds either BFS or fragment based (better). Model is composed of “plug-and-play” modules. Trained on MOSES, QM9, Zinc250k. Property-constrained optimization is based on genetic algorithm.

  • FastFlows: Flow-Based Models for Molecular Graph Generation
    Nathan C. Frey, Vijay Gadepally, and Bharath Ramsundar
    ELLIS Machine Learning for Molecule Discovery Workshop 2021
      Framework for normalizing flows from SELFIES. Uses substructure filtering to speed up training and work from small training sets. Built in MPO functionality.
    TDS article

  • MoFlow: An Invertible Flow Model for Generating Molecular Graphs + GitHub Repo
    Chengxi Zang and Fei Wang
    in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 2020
      Non-autoregressive normalizing Flow for molecular graphs; two-stage flow (bonds (based on GLOW network from Nvidia) > bond-conditioned flow for atoms). Similar to GraphNVP. Trained (NLL) on QM9 and ZINC250k. Developed new architecture. Excellent results.

  • GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation + GitHub repo
    Chence Shi, Minkai Xu, Zhaocheng Zhu, Weinan Zhang, Ming Zhang, and Jian Tang
    ICLR 2020
      How to explain this better than reviewer #1…

"This paper proposes a generative model architecture for molecular graph generation based on autoregressive flows. The main contribution of this paper is to combine existing techniques (auto-regressive BFS-ordered generation of graphs, normalizing flows, dequantization by Gaussian noise, fine-tuning based on reinforcement learning for molecular property optimization, and validity constrained sampling). Most of these techniques are well-established either for data generation with normalizing flows or for molecular graph generation and the novelty lies in the combination of these building blocks into a framework."

GANs

Other

Reaction Informatics

These models predict mechanisms for chemical reactions, ideally similar to how we teach 2nd years to push arrows. There are reltatively few of expamples of this task but they fall into 3 major categories electron flows, graph edits, reaction netowrks. At inference these models are used for forward synthesis prediction, potntially for prediction of chemo/regio-selectivity. Largely trained on pattern recognition from atom-mapped inputs (USPTO) though there are exceptions (e.g., Baldi papers below).

Electron Flow Prediction

Sources and Sinks

The Baldi papers map e- sources and sinks, combinatorially generates probability distribution of electron flows. Described classifiers are used to filter source-sink pairs before eval. Trained on in-house (unavailable) data. Papers don’t have available source code but ready-to-use programs are available on ChemDB.

Reaction Network Graphs

Other

Atom Mapping

Computer-Aided Retrosynthesis Planning

Publication Parsing

ML Driven Drug Design

Property/Activity Prediction

Active Learning Methods

Synthetic Accessibility

Molecular Optimization

  • Projecting Molecules into Synthesizable Chemical Spaces
    Shitong Luo, Wenhao Gao, Zuofan Wu, Jian Peng, Connor W. Coley, and Jianzhu Ma
    ArXiv Preprint, 2024
      Interesting new approach to making molecules more synthesizable from genenerated virtual hits. Cleaning the chaff energy. Describes a new postfix notation (A B +) for synthetic transformations. Transformer-based model that translates graphs to postfix notation. Model capable of synthesis planning, generating similar and more synthesizable analogues, exploring chemical space in the syntesizablilty dimension.

  • Evolutionary Multiobjective Molecule Optimization in an Implicit Chemical Space + GitHub Repo
    Xin Xia, Yiping Liu, Chunhou Zheng, Xingyi Zhang, Qingwen Wu, Xin Gao, Xiangxiang Zeng, and Yansen Su
    J. Chem. Inf. Model. 2024, 64, (13), 5161
      Multiobjective molecule optimization framework (MOMO) is a pareto-based MPO tool that evolves moelcules into better molecules. Genetic/ecolutionary algorithm in the latent (implicit) space ended by a VAE.

Virtual Screening

Cheminformatics

Reviews

General

Δ-machine learning

Protein Structure Prediction

Deep Learning

LLMs and Agents

  • MemGPT: Towards LLMs as Operating Systems + GitHub Repo
    Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez
    arXiv 2024
      Infinite context for lanuage models. Now pacakged as part of Letta.

  • AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning + GitHub Repo
    Shirley Wu, Shiyu Zhao, Qian Huang, Kexin Huang, Michihiro Yasunaga, Kaidi Cao, Vassilis N. Ioannidis, Karthik Subbian, Jure Leskovec, and James Zou
    NeurIPS 2024

  • ReAct: Synergizing Reasoning and Acting in Language Models + Project Site
    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao
    ICLR 2023
      Integrates REasoning and ACTing in large language models (LLMs) to enhance their performance and versatility. By generating reasoning traces and task-specific actions in an interleaved manner, ReAct allows LLMs to synergize these processes. Key contributions include: 1) Reasoning and Action Synergy: Reasoning helps update action plans and handle exceptions, while actions enable interaction with external sources, such as APIs, to gather information. 2) Improved Accuracy and Interpretability: On tasks like question answering (HotpotQA) and fact verification (Fever), ReAct reduces hallucination and error propagation while providing interpretable problem-solving steps. 3) Superior Decision-Making Performance: In interactive decision-making benchmarks (ALFWorld and WebShop), ReAct outperforms imitation and reinforcement learning methods with significant success rate improvements. ReAct demonstrates improved effectiveness over state-of-the-art baselines, better human interpretability, and increased trustworthiness by combining reasoning and action.

  • LoRA: Low-Rank Adaptation of Large Language Models + GitHub Repo
    Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen
    ICLR 2022
      LoRA is now wrapped into the 🤗 PEFT library

  • Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela
    NeurIPS 2020

Neural Reasoning & Decision Making

Contrastive Learning

Chemistry

Med Chem

My papers