Like a todo-list but for knowledge. This page is (hopefully) auto-updated by the curating agent…
16 Dec 2024
-
Inconsistency of LLMs in Molecular Representations
Bing Yan, Angelica Chen, Kyunghyun Cho
ChemRxiv
2024-12-16
The paper investigates the consistency of large language models (LLMs) in molecular representations like SMILES and IUPAC names. Despite finetuning with a dual representation dataset and applying a Kullback-Leibler divergence loss for training, the models exhibited less than 1% consistency and no improvement in accuracy. Findings highlight the limitations of LLMs in understanding chemistry. -
LlaMa meets Cheburashka: impact of cultural background for LLM quiz reasoning
Bogdan Protsenko, Mikhail Lifar, Daniil Kupriianenko, Nazar Chubkov, Kirill Kulaev, Alexander Guda, Alexander Soldatov, Irina Piontkovskaya
ChemRxiv
2024-12-16
The paper investigates the reasoning abilities of the LlaMa3-405B language model in non-English quiz contexts, specifically using questions from the Russian-speaking “What?Where?When?” community. It finds that while the model performs well linguistically, its cultural knowledge is lacking, leading to decreased performance. The study also highlights the significance of reasoning strategies, achieving a 6% accuracy improvement over baseline methods. -
PathInHydro, a set of machine learning models to identify unbinding pathways of gas molecules in [NiFe] hydrogenases
Farzin Sohraby, Jing-Yao Guo, Ariane Nunes-Alves
ChemRxiv
2024-12-16
The paper presents PathInHydro, a machine learning framework for identifying unbinding pathways of gas molecules from [NiFe] hydrogenases using molecular dynamics simulations. Trained on CO and H2 unbinding trajectories from Desulfovibrio fructosovorans, it efficiently analyzes diverse gas molecules and enzyme mutations. The framework enhances data analysis by automating pathway identification, with associated codes and datasets available online. -
AlchemBERT: Exploring Lightweight Language Models for Materials Informatics
Xiaotong Liu, Xingchen Liu, Xiaodong Wen
ChemRxiv
2024-12-16
The paper presents AlchemBERT, a lightweight BERT model (110 million parameters) for materials informatics, demonstrating its effectiveness in predicting material properties using the Matbench dataset. AlchemBERT achieves performance comparable to larger models like GPT and LLaMA, excelling in structure prediction with CIF data. It outperforms state-of-the-art models, indicating fine-tuned LLMs can capture significant material insights.