Enhancing Strain-level Phage-Host Prediction through Experimentally Validated Negatives and Feature Optimization Strategies

This article has 0 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

Background

Accurate prediction of phage-host interactions at the strain level is critical for understanding microbial ecology and for developing phage-based therapeutics. However, existing models are limited by the lack of experimentally validated negative interactions and inconsistencies in data construction strategies.

Results

In this study, we present a large-scale phage-host interaction dataset comprising 13,000 experimentally verified links between 125Klebsiella pneumoniae((K. pneumoniae)phages and 104K. pneumoniaestrains. Using this unique resource, we systematically evaluate the impact of negative data construction methods, feature extraction strategies, and machine learning algorithms on predictive performance. We show that randomly generated negatives significantly inflate model accuracy, while models trained on experimental negatives yield more realistic and robust results. Furthermore, protein-derived features outperform DNA-based features across various data conditions. Notably, models using only tail protein sequences achieve performance comparable to those using full-genome sequences, offering a time-efficient alternative without compromising accuracy. Finally, interpretable machine learning reveals amino acid preferences in both phages and hosts that align with known infection mechanisms and suggest novel determinants such as anti-transcriptional proteins.

Conclusions

Our findings highlight best practices for constructing high-fidelity strain-level phage-host prediction models. The dataset and insights presented here provide a valuable benchmark for future studies and lay the foundation for more biologically grounded, interpretable modeling frameworks in viromics and microbiome research.

Related articles

Related articles are currently not available for this article.