#### Date

2011-05-20

#### Document Type

Poster

#### Description

**Introduction**: Technological advances facilitating the acquisition of large arrays of biomarker data have led to new opportunities to study disease progression based on individual-level characteristics. This creates an analytical challenge, however, due to the large number of potentially informative markers, the high degrees of correlation among them, and changes that occur over time. To address these issues, we propose a mixed-ridge estimator which integrates ridge regression into the mixed model framework in order to account for both the correlation induced by repeatedly measuring the outcome on each individual over time, as well as the potential high degree of correlation among predictor variables. An extension of the EM algorithm is described to account for unknown variance/covariance parameters. A simulation study is conducted to illustrate model performance and a data example is provided.

**Hypothesis: **We predict that the mixed ridge estimator will result in somewhat biased coefficients with smaller standard deviations than those of the mixed model without ridge component. This will result in an improvement of power over the mixed model when correlations among predictors are sufficiently high, while type I error rates are maintained at about 0.05 for both methods.

**Methods: **

- Motivation
- Mixed Ridge Model
- EM Algorithm
- Testing
- Simulation Study
- Data Example

**Results:**

- Table 1: Comparison of MR and Mixed model for simulation study. As correlation among columns increases, power of mixed model decreases more rapidly than that of MR. MR coefficients tend to have smaller variance and slight bias; however, type I error rates are roughly the same.
- Figure 1: Plot of power over correlations when β=0.20. At about ρ= 0.80, MR begins to significantly outperform mixed model.
- Table 2: Comparison of MR and mixed model for data example. At the 0.05 level, MR finds 2 variables to be significant, compared with 1 variable for the mixed model. Correlations among predictors are as high as 0.90, so we expect the addition of the ridge component to improve prediction ability.
- Figure 2: Normal QQ plot of t-statistics for MR. Points circled in red are significant for MR, while the point circled in gray is significant for the mixed model.

**Conclusions:** MR outperforms the mixed model without ridge component when correlations among predictor variables are sufficiently large. The simulation study shows that when correlations are greater than about 0.80, power of MR is higher than that of the mixed model without a significant increase in type I error rate. At lower correlations, MR works just as well as the mixed model. The GENE study data set included predictors with correlation coefficients as high as 0.95, and subjects were measured 2 to 4 times each. Due to the high correlation, mixed modeling resulted in inflated variances of coefficients, and thus low power. The MR approach identified APOb as significantly associated with BP over time while the usual mixed modeling approach was unable to detect this association.

#### DOI

10.13028/hs1h-3933

#### Rights and Permissions

Copyright the Author(s)

#### Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.

#### Repository Citation

Eliot MN, Foulkes AS, Reilly MP, Ferguson J. (2011). Ridge regression for longitudinal data with application to biomarkers. UMass Center for Clinical and Translational Science Research Retreat. https://doi.org/10.13028/hs1h-3933. Retrieved from https://escholarship.umassmed.edu/cts_retreat/2011/posters/5

#### Included in

Ridge regression for longitudinal data with application to biomarkers

**Introduction**: Technological advances facilitating the acquisition of large arrays of biomarker data have led to new opportunities to study disease progression based on individual-level characteristics. This creates an analytical challenge, however, due to the large number of potentially informative markers, the high degrees of correlation among them, and changes that occur over time. To address these issues, we propose a mixed-ridge estimator which integrates ridge regression into the mixed model framework in order to account for both the correlation induced by repeatedly measuring the outcome on each individual over time, as well as the potential high degree of correlation among predictor variables. An extension of the EM algorithm is described to account for unknown variance/covariance parameters. A simulation study is conducted to illustrate model performance and a data example is provided.

**Hypothesis: **We predict that the mixed ridge estimator will result in somewhat biased coefficients with smaller standard deviations than those of the mixed model without ridge component. This will result in an improvement of power over the mixed model when correlations among predictors are sufficiently high, while type I error rates are maintained at about 0.05 for both methods.

**Methods: **

- Motivation
- Mixed Ridge Model
- EM Algorithm
- Testing
- Simulation Study
- Data Example

**Results:**

- Table 1: Comparison of MR and Mixed model for simulation study. As correlation among columns increases, power of mixed model decreases more rapidly than that of MR. MR coefficients tend to have smaller variance and slight bias; however, type I error rates are roughly the same.
- Figure 1: Plot of power over correlations when β=0.20. At about ρ= 0.80, MR begins to significantly outperform mixed model.
- Table 2: Comparison of MR and mixed model for data example. At the 0.05 level, MR finds 2 variables to be significant, compared with 1 variable for the mixed model. Correlations among predictors are as high as 0.90, so we expect the addition of the ridge component to improve prediction ability.
- Figure 2: Normal QQ plot of t-statistics for MR. Points circled in red are significant for MR, while the point circled in gray is significant for the mixed model.

**Conclusions:** MR outperforms the mixed model without ridge component when correlations among predictor variables are sufficiently large. The simulation study shows that when correlations are greater than about 0.80, power of MR is higher than that of the mixed model without a significant increase in type I error rate. At lower correlations, MR works just as well as the mixed model. The GENE study data set included predictors with correlation coefficients as high as 0.95, and subjects were measured 2 to 4 times each. Due to the high correlation, mixed modeling resulted in inflated variances of coefficients, and thus low power. The MR approach identified APOb as significantly associated with BP over time while the usual mixed modeling approach was unable to detect this association.