Predicting peptide aggregation with protein language model embeddings

This article has 0 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

Amyloid fibrils, a form of peptide aggregate, are associated with multiple diseases and hinder the development of therapeutics. The experimental characterization of aggregating peptides is resource-intensive and data are scarce, limiting the development of accurate models. We present a deep-learning model, PALM (Predicting Aggregation with Language Model embeddings), which uses transfer learning to predict aggregation from embeddings extracted from a pretrained protein language model (pLMs). PALM is trained on the WaltzDB-2.0 dataset to classify peptides and identify aggregation-prone regions within a sequence at single-residue resolution. Compared to existing models, it exhibits strong performance on held-out experimental datasets. We find that PALM fails to identify single mutations that increase the rate of aggregation of amyloid beta peptide; however, training the PALM architecture on a larger dataset, CANYA NNK1-3, substantially improves performance in this task. These results show that transfer learning with pLM embeddings improves performance when training on small datasets, but highlight that challenging tasks, such as predicting the effect of single mutations, require more experimental data.

Related articles

Related articles are currently not available for this article.