Network regularized Accelerated Failure Time Models for Robust Biomarker Identification
Abstract
Background
High-dimensional genomic studies in cancer research face significant challenges when analyzing survival outcomes, particularly in p ≫ n scenarios where thousands of genes are measured across relatively small patient cohorts. Traditional penalized regression methods like lasso and ridge regression apply uniform shrinkage without considering underlying biological relationships between genes, often resulting in unstable feature selection and reduced interpretability. Network-regularized methods that incorporate prior biological knowledge have emerged as promising alternatives, yet systematic comparisons of different network penalties in accelerated failure time (AFT) models remain limited.
Methods
We developed and evaluated Weibull AFT models incorporating two network-regularized approaches: inverse-degree penalties that apply stronger shrinkage to highly connected hub genes, and graph Laplacian penalties that promote smoothness across connected nodes. We conducted comprehensive simulations across varying signal strengths, censoring rates (30%, 50%, 80%), and noise levels, followed by real-data applications on three cancer datasets (TCGA-KIRC, TCGA-COAD, and IDH-wildtype gliomas) using both STRING protein-protein interaction (PPI) networks and co-expression networks. Performance was assessed using estimation and prediction mean squared error, false positive/negative rates, concordance index, and pathway enrichment analysis.
Results
The inverse-degree penalty demonstrated consistently more conservative feature selection behavior, achieving substantially lower false positive rates while maintaining competitive predictive performance compared to Laplacian penalties. In simulation studies, the inverse-degree method achieved false positive rates of 0.126 versus 0.299 for Laplacian penalties under strong signal conditions. Real-data applications showed comparable predictive accuracy (C-index: 0.61-0.73) between methods across cancer types. Both methods successfully identified biologically relevant pathways including mTORC1 regulation and fatty acid metabolism in kidney renal clear cell carcinoma (KIRC), and IL-17/chemokine signaling in IDH-wildtype gliomas. Notably, in colorectal adenocarcinoma (COAD) analysis, the inverse-degree method identified ten coherent pathways centered on inflammation, cell-cycle regulation, and proteasome function, while the Laplacian approach introduced potentially spurious neurotransmitter-release associations, demonstrating the inverse-degree penalty’s superior biological interpretability.
Conclusions
Network-regularized AFT models with inverse-degree penalties offer a valuable tool for biomarker discovery in high-dimensional genomic survival analysis, providing conservative feature selection and superior biological interpretability while maintaining competitive predictive performance. The accompanying DegreeAFT R package makes these methods accessible to the broader research community, facilitating adoption in precision oncology applications where identifying reliable prognostic signatures is critical for treatment decision-making.
Related articles
Related articles are currently not available for this article.