Using item response theory to calibrate the Headache Impact Test (HIT) to the metric of traditional headache scales
UMass Chan Affiliations
Department of Quantitative Health SciencesDocument Type
Journal ArticlePublication Date
2003-12-04Keywords
AdolescentAdult
Aged
Calibration
Data Collection
Headache
Humans
Middle Aged
Quality of Life
*Questionnaires
*Sickness Impact Profile
Telephone
United States
Biostatistics
Epidemiology
Health Services Research
Metadata
Show full item recordAbstract
BACKGROUND: Item response theory (IRT) scoring of health status questionnaires offers many advantages. However, to ensure 'backwards comparability' and to facilitate interpretations of results, we need the ability to express the IRT score in the metrics of the traditional scales. OBJECTIVES: To develop procedures to calibrate IRT-based scores on the Headache Impact Test (HIT) into the metrics of the traditional headache scales. To assess the degree to which the calibrated HIT scores agree with the observed traditional scores and lead to the same conclusions in group comparisons. METHODS: We used telephone interview data (n = 1016) and Internet data (n = 1103) from general population surveys of recent headache sufferers. Analyses were conducted in four steps: (1) develop IRT models for all items, (2) for each IRT score level, calculate the expected score on each of the traditional scales (calibration), (3) adjust this calibrated score for measurement error in the IRT score, (4) for each of the traditional scales, assess agreement between calibrated HIT scores and observed scores using intraclass correlation (ICC) and evaluate the agreement of mean scores and the relative validity (RV) in discriminating among groups differing in migraine diagnosis, headache severity, and change in impact over time. RESULTS: For the traditional categorical questionnaire items (the Migraine Specific Questionnaire (MSQ) and the Headache Disability Inventory (HDI)) the calibrated HIT agreed with the observed traditional scores: ICC's were between 0.80 and 0.94. In RV analyses the maximum mean difference between the observed and expected scores was 1.7 points on a 0-100 scale for comparisons at one point in time. Analyses of change over time and analyses calibrating scores from the fixed-form HIT-6 to the metric of other questionnaires were also satisfactory although less precise. Analysis of non-standard questionnaire items (e.g. On how many days in the past 3 months did you have a headache, from the HIMQ and the MIDAS) required special IRT models. Agreement was less good: ICC's were between 0.56 and 0.61 and the maximum mean differences were 2.9 (on a 0-270 scale) and 3.8 (on a 0-450 scale) in RV analyses at one point in time. The ability of the calibrated scale scores to discriminate between groups was at least as good as the ability of the observed sum scales and often remarkably better. CONCLUSION: The theoretical advantage of IRT models in scale calibration is supported by our results. This approach to achieving comparability of new and widely-used scales and accelerating the accumulation of interpretation guidelines based on previous work warrant testing for measures of other generic and disease-specific concepts.Source
Qual Life Res. 2003 Dec;12(8):981-1002. Link to article on publisher's siteDOI
10.1023/A:1026123400242Permanent Link to this Item
http://hdl.handle.net/20.500.14038/47450PubMed ID
14651417Related Resources
Link to Article in PubMedae974a485f413a2113503eed53cd6c53
10.1023/A:1026123400242