Penalised regression with multiple sources of prior effects#

Authors#

Armin Rauschenberger, Zied Landoulsi, Mark A. van de Wiel, Enrico Glaab

Abstract#

In many high-dimensional prediction or classification tasks, complementary data on the features are available, e.g. prior biological knowledge on (epi)genetic markers. Here we consider tasks with numerical prior information that provide an insight into the importance (weight) and the direction (sign) of the feature effects, e.g. regression coefficients from previous studies. We propose an approach for integrating multiple sources of such prior information into penalised regression. If suitable co-data are available, this improves the predictive performance, as shown by simulation and application. The proposed method is implemented in the R package transreg.

Data#

Data for the application on cervical cancer are available from van de Wiel et al. (2016, 10.1002/sim.6732), in the R package GRridge in the data set dataVerlaat.

Data for the application on pre-eclampsia are available from Erez et al. (2017, 10.1371/journal.pone.0181468), in the supporting file pone.0181468.s001.csv. For the application on Parkinson’s disease, the co-data are available from Nalls et al. (2019, 10.1016/S1474-4422(19)30320-5), in the online file nallsEtAl2019_excluding23andMe_allVariants.tab, and the target data are available upon request to request.ncer-pd@uni.lu

Source code#

The analysis script is provided on GitHub in the Analysis section.