Penalised regression with multiple sources of prior effects


Armin Rauschenberger, Zied Landoulsi, Mark A. van de Wiel, Enrico Glaab

In many high-dimensional prediction or classification tasks, complementary data on the features are available, e.g. prior biological knowledge on (epi)genetic markers. Here we consider tasks with numerical prior information that provide an insight into the importance (weight) and the direction (sign) of the feature effects, e.g. regression coefficients from previous studies. We propose an approach for integrating multiple sources of such prior information into penalised regression. If suitable co-data are available, this improves the predictive performance, as shown by simulation and application. The proposed method is implemented in the R package transreg.

Data


Data for the application on cervical cancer are available from van de Wiel et al. (2016, 10.1002/sim.6732), in the R package GRridge in the data set dataVerlaat.

Data for the application on pre-eclampsia are available from Erez et al. (2017, 10.1371/journal.pone.0181468), in the supporting file pone.0181468.s001.csv. For the application on Parkinson’s disease, the co-data are available from Nalls et al. (2019, 10.1016/S1474-4422(19)30320-5), in the online file nallsEtAl2019_excluding23andMe_allVariants.tab, and the target data are available upon request to request.ncer-pd@uni.lu

Source code


The analysis script is provided here in the Analysis section.