Penalised regression with multiple sources of prior effects
Armin Rauschenberger, Zied Landoulsi, Mark A. van de Wiel, Enrico Glaab
In many high-dimensional prediction or classification tasks, complementary data
on the features are available, e.g. prior biological knowledge on (epi)genetic
markers. Here we consider tasks with numerical prior information that provide
an insight into the importance (weight) and the direction (sign) of the feature
effects, e.g. regression coefficients from previous studies. We propose an approach
for integrating multiple sources of such prior information into penalised
regression. If suitable co-data are available, this improves the predictive
performance, as shown by simulation and application. The proposed method is
implemented in the R package transreg
.
Data
Data for the application on cervical cancer are available from van de Wiel et
al. (2016, 10.1002/sim.6732), in the R package GRridge
in the data set dataVerlaat
.
Data for the application on pre-eclampsia are available from Erez et al.
(2017, 10.1371/journal.pone.0181468), in the supporting file
pone.0181468.s001.csv
.
For the application on Parkinson’s disease, the co-data are available from
Nalls et al. (2019, 10.1016/S1474-4422(19)30320-5), in the online file
nallsEtAl2019_excluding23andMe_allVariants.tab
, and the target data are
available upon request to
request.ncer-pd@uni.lu