Article Type

Full-Length Paper

Publication Date





Video data are uniquely suited for research reuse and for documenting research methods and findings. However, curation of video data is a serious hurdle for researchers in the social and behavioral sciences, where behavioral video data are obtained session by session and data sharing is not the norm. To eliminate the onerous burden of post hoc curation at the time of publication (or later), we describe best practices in active data curation—where data are curated and uploaded immediately after each data collection to allow instantaneous sharing with one button press at any time. Indeed, we recommend that researchers adopt “hyperactive” data curation where they openly share every step of their research process. The necessary infrastructure and tools are provided by Databrary—a secure, web-based data library designed for active curation and sharing of personally identifiable video data and associated metadata. We provide a case study of hyperactive curation of video data from the Play and Learning Across a Year (PLAY) project, where dozens of researchers developed a common protocol to collect, annotate, and actively curate video data of infants and mothers during natural activity in their homes at research sites across North America. PLAY relies on scalable standardized workflows to facilitate collaborative research, assure data quality, and prepare the corpus for sharing and reuse throughout the entire research process.


active curation, video data, behavioral science, data curation, identifiable data

Data Availability

All of the methods are openly shared. All protocols for data collection, coding, and curation are on the PLAY Project website (http://play-project.org). Videos are shared via links to the appropriate volumes on Databrary.org (Planning: https://nyu.databrary.org/volume/254; Implemented Protocol: https://nyu.databrary.org/volume/876). Workflow tools are shared with links to repositories on Github.com (https://github.com/PLAY-behaviorome).


KCS, MX, OH, and CTL were supported by NICHD R01HD094830, KEA was supported in part by DARPA N66001-19-2-4035, ROG was supported in part by NSF OAS2032713, and SLG was supported in part by NIMH T32-MH019524-28.

Corresponding Author

Kasey C. Soska, New York University, 4 Washington Place, Room 409, New York, NY 10003, USA; kasey.soska@nyu.edu

Rights and Permissions

© 2021 Soska et al. This is an open access article licensed under the terms of the Creative Commons Attribution License.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

10(3)-1208_(Hyper)active-Data-Curation_Figure1.pdf (10540 kB)
Figure 1: Steps in hyperactive curation (left column) and specific curation actions in the PLAY project (right column).

10(3)-1208_(Hyper)active-Data-Curation_Figure3.pdf (987 kB)
Figure 3: Diagram of the PLAY data collection, annotation, and curation workflow.