•  
  •  
 

Article Type

Full-Length Paper

Publication Date

2021-08-11

DOI

10.7191/jeslib.2021.1205

Abstract

Objective: To increase data quality and ensure compliance with appropriate policies, many institutional data repositories curate data that is deposited into their systems. Here, we present our experience as an academic library implementing and managing a semi-automated, cloud-based data curation workflow for a recently launched institutional data repository. Based on our experiences we then present management observations intended for data repository managers and technical staff looking to move some or all of their curation services to the cloud.

Methods: We implemented tooling for our curation workflow in a service-oriented manner, making significant use of our data repository platform’s application programming interface (API). With an eye towards sustainability, a guiding development philosophy has been to automate processes following industry best practices while avoiding solutions with high resource needs (e.g., maintenance), and minimizing the risk of becoming locked-in to specific tooling.

Results: The initial barrier for implementing a data curation workflow in the cloud was high in comparison to on-premises curation, mainly due to the need to develop in-house cloud expertise. However, compared to the cost for on-premises servers and storage, infrastructure costs have been substantially lower. Furthermore, in our particular case, once the foundation had been established, a cloud approach resulted in increased agility allowing us to quickly automate our workflow as needed.

Conclusions: Workflow automation has put us on a path toward scaling the service and a cloud based-approach has helped with reduced initial costs. However, because cloud-based workflows and automation come with a maintenance overhead, it is important to build tooling that follows software development best practices and can be decoupled from curation workflows to avoid lock-in.

Keywords

data curation, data repository, cloud-based workflow

Corresponding Author

Fernando Rios, 1510 E. University Blvd, PO Box 210055, Tucson, AZ 85721-055, USA; frios@arizona.edu

Rights and Permissions

© 2021 Rios & Ly. This is an open access article licensed under the terms of the Creative Commons Attribution License.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Appendix_1205.pdf (724 kB)
Appendix

Share

COinS