Toward De Novo Protein Design from Natural Language
Abstract
De novoprotein design represents a fundamental pursuit in protein engineering, yet current deep learning approaches remain constrained by their narrow design scope. Here we present Pinal, a large-scale frontier framework comprising 16 billion parameters and trained on 1.7 billion protein-text pairs, that bridges natural language understanding with protein design space, translating human design intent into novel protein sequences. Instead of a straightforward end-to-end text-to-sequence generation, Pinal implements a two-stage process: first generating protein structures based on language instructions, then designing sequences conditioned on both the generated structure and the language input. This strategy effectively constrains the search space by operating in the more tractable structural domain. Through comprehensive experiments, we demonstrate that Pinal achieves superior performance compared to existing approaches, including the concurrent work ESM3, while exhibiting robust generalization to novel protein structures beyond the PDB database. The online demo is available at<ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="http://www.denovo-pinal.com/">http://www.denovo-pinal.com/</ext-link>.
Related articles
Related articles are currently not available for this article.