Objectives: The objective of this study is to evaluate the quality and usability of supplementary data files deposited, between 1971 and 2015, to our university institutional repository. Understanding the extent to which content historically deposited in digital repositories is usable by today’s researchers can help inform digital preservation and documentation practices for researchers today.
Methods: I identified all graduate level theses and dissertations (GTDs) in the institutional repository with multiple files as a first pass at identifying documents that included supplementary data files. These GTDs were then individually examined, removing supplementary files that were artifacts of either the upload or digitization process. The remaining “true” supplementary files were then individually opened and evaluated following elements of the DATA rubric of Van Tuyl and Whitmire (2016).
Results: Supplementary files were discovered in the repository dating back to 1971 in 116 GTD submissions totalling more than 25,000 files. Most GTD submissions included fewer than 30 files, though some submissions included thousands of individual data files. The most common file types submitted include imagery, tabular data, and databases, with a very large number of unknown file types. Overall, levels of documentation were poor while actionability of datasets was generally middling.
Conclusions: The results presented in this study suggest that legacy data submitted to our institutional repository with GTDs is generally in poor shape with respect to Transparency and somewhat less so for Actionability. It is clear from this study and others that researchers have a long road ahead when it comes to sharing data in a way that makes it potentially useable by other researchers.
institutional repository, research data management, data sharing, metadata, data obsolescence
I would like to thank Amanda Whitmire (Stanford University) and Chris Diaz (Northwestern University) for early and late (respectively) comments on this manuscript, and also the Research Data Access and Preservation community for supporting an early version of this work.
Van Tuyl S. What’s in the Box? Assessing the potential usability of four decades of thesis and dissertation supplementary files. Journal of eScience Librarianship 2019;8(1): e1142. https://doi.org/10.7191/jeslib.2019.1142. Retrieved from https://escholarship.umassmed.edu/jeslib/vol8/iss1/2
Rights and Permissions
Copyright Van Tuyl © 2019
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.