Article Type

Full-Length Paper

Publication Date





In this paper we take an in-depth look at the curation of a large longitudinal survey and activities and procedures involved in moving the data from its generation to the state that is needed to conduct scientific analysis. Using a case study approach, we describe how large surveys generate a range of data assets that require many decisions well before the data is considered for analysis and publication. We use the notion of active curation to describe activities and decisions about the data objects that are “live,” i.e., when they are still being collected and processed for the later stages of the data lifecycle. Our efforts illustrate a gap in the existing discussions on curation. On one hand, there is an acknowledged need for active or upstream curation as an engagement of curators close to the point of data creation. On the other hand, the recommendations on how to do that are scattered across multiple domain-oriented data efforts.

In describing the complexities of active curation of survey data and providing general recommendations we aim to draw attention to the practices of active curation, stimulate the development of interoperable tools, standards, and techniques needed at the initial stages of research projects, and encourage collaborations between libraries and other academic units.


active curation, data management, longitudinal survey, survey research, data lifecycle

Data Availability

The paper documents a case study of curating an active dataset. The data and related products are under embargo until the end of the project.


The project is supported by the Indiana University Grand Challenge Precision Health Initiative through the Person to Person Health Interview Study (P2P). The P2P was supported by an award to the Indiana Clinical and Translational Science Institute’s Precision Health Initiative from Indiana University Grand Challenges and through a grant from the National Institutes of Health (Award Number UL1TR002529). The content of this publication is solely the responsibility of the authors and does not necessarily represent the official views of the Principal Investigators of the P2P or of the Precision Health Initiative.

Corresponding Author

Inna Kouper, Indiana University, Smith Research Center 120, 2805 E 10th St, Bloomington, IN 47408, USA; inkouper@indiana.edu

Rights and Permissions

© 2021 Kouper et al. This is an open access article licensed under the terms of the Creative Commons Attribution License.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.