Semiparametric time to event models in the presence of error-prone, self-reported outcomes - with application to the Women's Health Initiative

UMMS Affiliation

Department of Medicine, Division of Preventive and Behavioral Medicine

Publication Date


Document Type



Clinical Epidemiology | Epidemiology | Longitudinal Data Analysis and Time Series | Women's Health


The onset of several silent, chronic diseases such as diabetes can be detected only through diagnostic tests. Due to cost considerations, self-reported outcomes are routinely collected in lieu of expensive diagnostic tests in large-scale prospective investigations such as the Women's Health Initiative. However, self-reported outcomes are subject to imperfect sensitivity and specificity. Using a semiparametric likelihood-based approach, we present time to event models to estimate the association of one or more covariates with a error-prone, self-reported outcome. We present simulation studies to assess the effect of error in self-reported outcomes with regard to bias in the estimation of the regression parameter of interest. We apply the proposed methods to prospective data from 152,830 women enrolled in the Women's Health Initiative to evaluate the effect of statin use with the risk of incident diabetes mellitus among postmenopausal women. The current analysis is based on follow-up through 2010, with a median duration of follow-up of 12.1 years. The methods proposed in this paper are readily implemented using our freely available R software package icensmis, which is available at the Comprehensive R Archive Network (CRAN) website.

DOI of Published Version



Ann Appl Stat. 2015 Jun;9(2):714-730. Link to article on publisher's site

Journal/Book/Conference Title

The annals of applied statistics

Related Resources

Link to Article in PubMed

PubMed ID