An investigation of the impacts of different generalizability study designs on estimates of variance components and generalizability coefficients
Office of Medical Education; Department of Family Medicine and Community Health; Meyers Primary Care Institute
Analysis of Variance; *Clinical Competence; Computer Simulation; Confidence Intervals; Data Collection; Educational Measurement; Humans; Students, Medical
Life Sciences | Medicine and Health Sciences | Women's Studies
In recent years, performance assessments have become increasingly popular in medical education. While the term “performance assessment” can be applied to many different types of assessments, in medical education this term usually refers to some sort of simulated patient encounter, such as an objective structured clinical examination (OSCE) or a computer simulation of an encounter. These types of assessments appeal to many educators because the tasks or items used are often seen as more realistic than items on multiple-choice examinations. However, this increased “realism” or apparent authenticity comes at a cost—performance examinations are typically more time-consuming and expensive both to administer and to score. On an OSCE, each encounter with a standardized patient is typically scored as a single item, often resulting in an examinee's completing only four to eight items in a two-hour testing period. In contrast, an examinee might complete 100 to 150 items during a two-hour multiple-choice examination. The fact that performance examinations are typically relatively short means that test users must pay particular attention to the reliability and validity of test scores. Generalizability theory provides a framework for estimating the relative magnitudes of various sources of error in a set of scores. In most performance assessments, both items and raters are potential sources of error. Generalizability theory allows estimation of the error associated with each of these sources separately, as well as the relevant interaction effects. In a generalizability study (G study), the variance in a set of scores is partitioned in a manner similar to that used in the analysis of variance. However, in a G study the emphasis is not on testing for statistical significance, but rather on assessing the relative magnitudes of the variance components. Depending on the study design, different variance components can be estimated. Once the variance components are estimated, additional analyses can be conducted. The purpose of the present study was to examine the impacts of different G-study designs.
Rights and Permissions
Citation: Acad Med. 2000 Oct;75(10 Suppl):S21-4.