Journal of eScience Librarianship Journal of eScience Librarianship

Data Management Plans (DMPs) are often required for grant applications. But do strong DMPs lead to better data management and sharing practices? Several recent research projects in the Library and Information Science field have investigated data management planning and practice through DMP content analysis and data-management-related interviews. However, research hasn’t yet shown how DMPs ultimately affect data management and data sharing practices during grant-funded research. The research described in this article contributes to the existing literature by examining the impact of DMPs on grant awards and on Principal Investigators’ (PIs) data management and sharing practices. The results of this research suggest the following key takeaways: (1) Most PIs practice internal data management in order to prevent data loss, to facilitate sharing within the research team, and to seamlessly continue their research during personnel turnover; (2) PIs still have room to grow in understanding specialized concepts such as metadata and policies for use and reuse; (3) PIs may need guidance on practices that facilitate FAIR data, such as using metadata standards, assigning licenses to their data, and publishing in data repositories. Ultimately, the results of this research can inform academic library services and support stronger, more actionable DMPs.


Introduction and Background
In our data-centric era, research data have become valuable resources for encouraging research reproducibility, supporting information equity, accelerating science, and amplifying the impact of research (Vision 2010). Acknowledging the value of research data, federal funding agencies increasingly require that data management plans (DMPs) be included in grant proposals to support good data stewardship practices and to promote data sharing and reuse (Holdren 2013). In response to these trends, academic libraries are seeing increased demand to provide research data services (Tenopir et al. 2017;Bryant, Lavoie, and Malpas 2017;Tenopir, Sandusky, Allard, and Birch 2014). Common library research data services include education and training in data management best practices; consultations on writing DMPs for grants; and publication services for open datasets (Cox, Kennan, Lyon, and Pinfield 2017;Yoon and Schultz 2017). The ultimate goal of DMPs and library research data services is to support good data stewardship practices. Therefore, the DMP requirement from funding agencies is a commitment to the idea that strong data management planning can help ensure that published research data adheres to community standards like the FAIR Data Principles (Wilkinson et al., 2016). The FAIR principles support the idea that data must be carefully structured and described in order to be Findable, Accessible, Interoperable, and Reusable.
DMPs have been researched from various angles. A review of DMPs from the University of Illinois revealed that the proposed storage venue for datasets did not have a statistically significant effect on successful funding of National Science Foundation (NSF) proposals (Mischo, Schlembach, and O'Donnell 2014). A study at Georgia Tech found that faculty widely share and reuse language from previous DMPs when producing new DMPs (Parham and Doty 2012). The Data Management Plans as a Research Tool (DART) project has produced a rubric for analyzing DMPs , as well as an analysis of data management practices across domains . Researchers at the University of Houston conducted interviews with NIH and NSF grant recipients to illuminate data management needs on their campus (Peters and Dryden 2011). Most recently, Berman (2017) analyzed 35 DMPs and conducted six interviews in order to understand campus needs and inform development of future library research data services.
At Montana State University Library, research data services have been offered since 2014. These services include assistance with writing DMPs for grants. The purpose of this research is to build an evidence-based understanding of the impact of DMPs on grant-funded projects. This research also aims to inform library research data services, helping Principal Investigators (PIs) efficiently create useful, actionable DMPs that support FAIR research data.

Research Questions and Methods
A 2013 memo from the White House Office of Science and Technology Policy (OSTP) (Holdren, 2013) suggests that publicly accessible research data can accelerate discovery and more broadly share the benefits of scientific research. The memo directs "each Federal agency with over $100 million in annual conduct of research and development expenditures to develop a plan to support increased public access to the results of research funded by the Federal Government" (Holdren, 2013, p. 2). DMPs are one method for supporting public access to research data, through asking researchers to plan ahead for data management and data sharing. NSF began requiring DMPs prior to the 2013 OSTP memo in January 2011. In theory, DMPs should help PIs build strong policies and procedures that support FAIR data. This research aims to understand how DMPs work in practice. Specifically, this research aims to determine the utility of DMPs by asking two key questions: Q1a. Are grant proposals with more complete/detailed DMPs more likely to be funded?
Q1b. Which sections within DMPs have the most and least complete/detailed information?
Q2. Does writing a DMP affect PIs' data management and data sharing practice for grant-funded projects?
The research team consisted of the author and one student research assistant. To answer Q1a and Q1b, the author trained the student research assistant to analyze the content of 186 DMP documents from awarded and declined NSF grant proposals at Montana State University To answer Q2, semi-structured interviews were conducted with 17 PIs who were selected from among the PIs whose DMPs came from awarded grants. These interviews were also used to provide additional insight to the results of the DMP content analysis when answering Q1a and Q1b.

DMP Content Analysis
Q1a. Are grant proposals with more complete/detailed DMPs more likely to be funded?

Q1b. Which sections within DMPs have the most and least complete/detailed information?
This research was designated as exempt by the Montana State University Institutional Review Board. DMPs were collected through Montana State University's online approval system for all grant proposals. The online approval system only requires that a proposal abstract be uploaded to the system, but some PIs choose to upload their grant proposal in full. Therefore, not all entries in the online approval system included a DMP. Since NSF's DMP requirement went into effect on January 18, 2011 (NSF 2018), results were filtered to contain only NSF proposals from 2011 or later. Each proposal was then examined by hand, and full proposals were downloaded if they were present. Since 2011, 88 awarded full proposals and 110 declined full proposals have been uploaded to the online approval system. Therefore, the dataset consisted of 198 full proposals. The DMP was pulled from each full proposal, and the remainder of the proposal was discarded. Of the 198 DMPs, 11 reported that no data would be produced, which left a total of 187 DMPs that were suitable for analysis. All the PIs whose DMPs were included in the dataset were sent an email offering an opportunity to opt out of the research. One PI opted out of the research, producing a final dataset of 186 DMPs.
A Qualtrics survey available from the Data Management Plans as a Research Tool (DART) project was used to conduct content analysis the 186 DMP documents. The survey is based on the DART Rubric , which is organized according to DMP components outlined by NSF: 1. "the types of data, samples, physical collections, software, curriculum materials, and other materials to be produced in the course of the project; 2. the standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies); 3. policies for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements; 4. policies and provisions for re-use, re-distribution, and the production of derivatives; 5. plans for archiving data, samples, and other research products, and for preservation of access to them" (NSF 2017).
The DART rubric shortens each NSF component as follows: 1. types of data produced; 2. standards for data and metadata; 3. policies for access and sharing; 4. policies and provisions for re-use and redistribution; and 5. plans for data archiving and preservation of access.
For each section, the rubric suggests standards to evaluate whether each DMP: 1. provides complete/detailed information; 2. addresses the issue, but information was incomplete; or 3. does not address the issue.
For example, in the section "standards for data and metadata," the rubric asks whether the DMP "identifies metadata standards and/or metadata formats that will be used for the proposed project." To qualify as complete/detailed, the DMP must clearly state and describe a metadata standard that will be followed; if no disciplinary standard exists, the DMP should clearly describe a project-specific approach. See Figure 1 for more detail. See DART Rubric Guidance for complete information (Whitmire, Carlson, Westra, Hswe, and Parham 2017).

Q2. Does writing a DMP affect PIs' data management and data sharing practice for grant-funded projects?
Interview request emails were sent to all PIs with DMPs in the dataset that came from awarded grants. 17 PIs responded positively to the request to be interviewed. The semi-structured interviews examined whether PIs used DMPs during their grant-funded research process, whether they shared data from their grant project, and whether they considered writing the DMP to be a helpful exercise. The full interview instrument is available as Appendix A. The author conducted and recorded the semi-structured interviews, which were transcribed in full by the Montana State University Human Ecology Learning and Problem Solving (HELPS) Lab 1 . The author and the student research assistant conducted a conventional qualitative content analysis of the interview transcripts using an inductive coding approach, as outlined in Zhang and Wildemuth (2009). After reviewing the research questions, each coder individually identified chunks of text in the interview transcripts that represented key themes of the research. After coding each transcript, the two coders convened to compare and normalize themes, in accordance with the constant comparative method (Glaser and Strauss 1967). Through an iterative process, as the two coders continued to code interview transcripts, coding consistency was checked through assessment of intercoder agreement. If any disagreement occurred, the two coders discussed and resolved the disagreements, as suggested by Schilling (2006). Once sufficient coding consistency was achieved, the coding rules were then applied to all 17 interview transcripts.
Four key themes emerged from the interviews, and PI responses relating to these key themes were grouped into subthemes. See Table 1 for an overview of key themes and subthemes.

Q1a. Are grant proposals with more complete/detailed DMPs more likely to be funded?
The difference in completeness between DMPs from awarded and declined DMPs was not statistically significant. This result is underscored by interview responses. Some interviewees had experience serving on grant review panels (n=2, 12%), and suggested that while the DMP was always reviewed, it was usually simply deemed either adequate or inadequate. Two PIs (n=2, 12%) suggested that the DMP is viewed as a supplementary document in the proposalless important than the scientific research at the core of the proposal. While the DMP must be adequate to pass review, the completeness of a DMP does not appear to influence the success of a grant proposal.

Q1b. Which sections within DMPs have the most and least complete/detailed information?
The DMP content analysis and subsequent PI interviews revealed some trends in the completeness/detailedness of the DMPs. Figures 2-6 show the completeness of DMPs in the five main sections of the DART rubric. Standards for "complete/detailed," "addressed issue but incomplete," and "did not address the issue" are defined by the DART Rubric (Whitmire, Carlson, Westra, Hswe, and Parham 2017).

Data Types
82% (n=153), of DMPs provided complete lists of data types. 16% (n=29) of DMPs had incomplete lists, and 2% (n=4) of DMPs didn't address the type of data they would be collecting. Figure 2 shows the completeness/detailedness of DMPs when addressing the types of data that would be produced by the grant-funded project.

Metadata Standards
This research suggests that PIs would benefit from metadata guidance. 37% (n=69) of DMPs offered complete/detailed information about the metadata they planned to use in their grant-funded project, 40% (n= 75) addressed the issue, but incompletely, and 23% (n=42) did not address the issue. However, some departments' DMPs tended to provide more detailed metadata information than others (see Figure 3). Figure 3 shows results from DMPs organized by academic department at Montana State University; please note that some departments are represented at higher rates than others. Figure 3 illustrates that metadata standards information was more complete in DMPs from Chemical and Biological Engineering (62%, or 13 out of 21 DMPs, were complete/detailed), Computer Science (53%, or 10 out of 19 DMPs, were complete/detailed), and Earth Sciences (69%, or 11 out of 16 DMPs, were complete/ detailed). How the Data will be Publicly Shared 94% (n=175) of the DMPs in this analysis included plans for making data publicly accessible (See Figure 4). 55% (n=103) of DMPs had "complete/detailed" information about the plan to make data public, and 39% (n=72) of DMPs "addressed the issue, but incomplete." This result suggests that PIs at Montana State University know that they are required to share their data publicly.
However, this research also suggests that many PIs do not use data repositories to facilitate long-term access and reuse. Most DMPs (56%, n=104) indicated that data would be shared via informal venues like a personal website or wiki, through a file sharing service like Dropbox or Google Drive, or upon request. Of the 186 DMPs analyzed, only 44% (n=82) proposed to share data via a repository or as a supplement to a manuscript.

Policies for Use and Reuse
The DMPs in this analysis also rarely provided policies for the use and reuse of data, as shown in Figure 5. 62% (n=115) of DMPs did not address the issue of policies for use and reuse. Such policies are a key part of FAIR data, enabling future users to actually use and reuse the data.

Data Storage and Archiving
Most DMPs showed that PIs understand how to store and archive their data over the long term (see Figure 6) with 73% (n=136) of DMPs either providing complete/detailed information, or addressing the issue incompletely.

Q2. Does writing a DMP affect PIs' data management and data sharing practice for grant-funded projects?
Research question Q2 was answered using the semi-structured interviews with PIs. Four key themes emerged from the interviews, which are explained in more detail here. For more information about methods and themes, please see Research Questions and Methods and Table 1, above.

Utility of the DMP during the grant-funded project
None of the PI interviewees used the DMP as a guiding document for data management in their lab. However, some (n=4, 24%) reported that the process of writing the data management plan did inform policies to facilitate sharing within their research team and to help with personnel transitions. Most (n=13, 76%) reported that while they generally followed the protocol outlined in the DMP, this was only because the DMP describes their usual activities and standard data management practices in their field. Some PIs (n=5, 29%) said that they adhered to the spirit of their DMP, but their actual data management practices were continually being updated to account for efficiency, changing best-practices, and new research strategies. For example, one PI said that the DMP is "an earnest best effort to describe what you are going to do. It's not a contract, it's a grant. [NSF] wants good science. And if you're going to do good science a different way, it's like 'whatever is best for the science.'" Another PI said, "to the letter the DMP isn't exactly what we are doing, but this is not a change in scope so much as because we ended up using different models to answer the same questions." Some PI interviewees (n=2, 12%) reported that they used generic language in their DMP such as "we will comply with standards in the field," and noted that these standards are constantly changing. One PI joked that after writing the DMP to get the grant, they "look back once a year to say, 'oh, that was eye opening.'" Figure 8 illustrates the subthemes that emerged while discussing the use of DMPs during grant-funded projects.

Plans for publicly sharing data
Overall, the PI interviewees understood that data sharing is required by NSF. However, some PIs (n=3, 18%) reported that they generally aim to share as little data as possible to prove the experiment or published conclusions. As one PI said, "it seems like a big extra bit of work to get all that done, especially if no one is asking for it." Many interviewees (n=10, 59%) reported concrete plans for sharing their data; however, about a third of those interviewees (n=3, 18%) planned to share upon request, which is less sustainable than sharing using a data repository or supplementary material. Of the interviewees who offered reasons for sharing their data (n=5, 29%), some cited the NSF data sharing mandate (n=3, 18%), and some suggested that sharing supports replication and validation of research results (n=2, 12%). While the majority of interviewees (n=15, 88%) planned to share at least some of their data publicly, many (n=9, 53%) also voiced concern over data sharing requirements, as shown in Figure 9.

Writing the DMP as a helpful or unhelpful exercise
Some PIs (n=5, 29%) reported that writing the DMP felt like yet another task to complete at the end of a long grant-proposal-writing process, and irrelevant to their actual data management practices. For example, one PI said, "I thought that [writing the DMP] was busy work, because I do this anyway. The only way for me to communicate what I am doing is by publishing, by saving the data, by sharing it with colleagues. And I do that anyway. I don't need to write a data management plan to tell someone I do it." However, most PIs (n=13, 76%) reported that the DMP-drafting process encouraged them to reflect on their own data management practices. Illustrative quotes include:  The DMP ensured that "the data itself would be backed-up, so we wouldn't lose it.
We followed protocols established for data management for that [purpose]."  The DMP was an opportunity to "spend a little time describing what we actually do."  "DMPs are an important piece to try to instill a sense of importance on the data that is collected, as well as its legacy impact on science. To train … faculty and students on how to maintain that structure, so that ten years from now you could go back to that data and it could still be useful."  "You have to kind of tailor for the discipline, so if I need a data management plan for an engineering project, it's probably a lot different than for social psychology and human subjects." In addition, one PI reflected upon the bigger picture of data reuse, saying "I think that [writing a DMP is] incredibly important because I want to see more open access to science and data, and the usability of other people's data is important."

Discussion and Recommendations for Practice
By understanding which sections of the DMP are most and least complete/detailed, libraries can provide tailored resources that provide guidance to PIs in areas of the DMP that tend to be the least complete/detailed. This research indicates that library data management planning services should aim to streamline the DMP writing process, focusing on the following areas.

Understanding what PIs Already Know
This analysis reveals that PIs have a strong grasp on what types of data will be produced over the course of their research (as illustrated in Figure 2, above). It also shows that PIs know how to document their data to facilitate work by a rotating research staff. This analysis also shows that the DMP writing process helps PIs consider data archiving and sharing requirements, and many of them include complete/detailed information in their DMPs describing archiving and publicly sharing data (as illustrated in Figures 4 and 6 above).

Specialized Concepts in the DMP
PIs need help navigating the structure of the DMP, and understanding how concepts like metadata, data sharing, and data archiving match with the internal data management work that they are already doing. Two of the PI interviewees also suggested that a lack of understanding of key terms such as "metadata," "archiving," and "data licensing" was a barrier to writing better DMPs. Referring to the sections of the DMP on metadata and data storage, one PI said that their DMP had "a lot of wording about computers" and that their own "lack of computer literacy made [them] self-conscious."

Facilitating FAIR Data
PIs need more information about how to facilitate findability, accessibility, interoperability, and reuse for their data, including metadata, licensing, and selecting a data repository. This DMP analysis revealed that only 37% (n=69) of DMPs addressed the issue of metadata standards completely/in detail. This research additionally shows that metadata guidance may be especially useful if tailored to faculty in the departments whose metadata information was least complete in this analysis, such as Chemistry and Biochemistry (in which 92%, or 22 out of 24 DMPs either addressed the issue of metadata standards incompletely, or did not address the issue). Libraries should provide guidance on licenses such as Creative Commons that govern use and reuse of research data, since 62% (n=115) of DMPs in this analysis did not include policies for use and reuse.

Stronger, more Actionable DMPs
These recommendations for practice aim to help data librarians create tailored DMP resources that respond to PI needs regarding data management planning and public sharing of FAIR data. Such tailored services support stronger, more actionable DMPs.

Sampling
The DMPs and PI interviews were both convenience samples. DMP documents were selected according to which PIs had uploaded a full proposal to the online proposal approval system, producing a total of 198 DMPs to analyze. Of those 198 DMPs, 11 reported that no data would be produced, and one PI opted out of the research, producing a final dataset of 186 DMPs. The 17 PI interviewees were volunteers.

DMP Scoring
A single student research assistant scored the DMP documents. The DART tool has been tested for intercoder consistency; the author also provided in-depth training and conducted spot checks on the completed rubric scoring. However, some inconsistencies may exist in the DMP scoring data.

Conclusion
This analysis of DMP documents and semi-structured interviews with PIs produced the following answers to the research questions.
Q1a. Are grant proposals with more complete/detailed DMPs more likely to be funded?
The difference in completeness between DMPs from awarded and declined DMPs was not statistically significant.

Q1b. Which sections within DMPs have the most and least complete/detailed information?
The sections of the DMPs that had the most complete/detailed information were the sections on data types, data archiving and public data sharing (although most DMPs (56%, n=104) indicated that data would be shared via informal venues, rather than data repositories). DMPs tended to be least complete/detailed when discussing specialized ideas such as metadata standards and policies for use and reuse.

Q2. Does writing a DMP affect PIs' data management and data sharing practice for grantfunded projects?
This research suggests that PIs generally value the DMP writing process as a moment of reflection about data management and sharing. For internal data management, this moment of reflection is simply an opportunity to refine the activities already being done in their labs, not a major learning experience; PIs continue using their already-established practices. For data sharing, the moment of reflection prompted by the DMP writing process did appear to help PIs consider how they would share their data.
The DMP content analysis and the subsequent PI interviews suggest a few key takeaways. (1) Most PIs practice internal data management in order to prevent data loss, to facilitate sharing within the research team, and to seamlessly continue their research during personnel turnover; (2) PIs still have room to grow in understanding specialized concepts such as metadata and policies for use and reuse; (3) PIs may need guidance on practices that facilitate FAIR data, such as using metadata standards, assigning licenses to their data, and publishing in data repositories. The insights produced by this research can inform how library data services are delivered, in order to help PIs get more value out of the DMP writing process, and to support FAIR data practices from the early stages of a grant-funded project.

Supplemental Content
Appendix A An online supplement to this article can be found at http://dx.doi.org/10.7191/jeslib.2018.1155 under "Additional Files".

Data Availability
Data associated with this article are available from Zenodo at https://doi.org/10.5281/zenodo.2432419.