From Plan to Action: Successful Data Management Plan Implementation in a Multidisciplinary Project

Objective and Setting: While data management planning becomes more commonplace, moving from planning into implementation remains a hurdle for many researchers. With little specific guidance from funding agencies and libraries in the early stages of developing services to assist researchers, insights into what contributes to successful data management are sorely needed. The objective of this study was to document how a multidisciplinary research team, after consultation with the University of Illinois Library, took steps to implement a data management plan. Design and Methods: A case study was designed to gather insights from the research group through semi-structured interviews. Questions focused on which of the recommended data management strategies were adopted and how those strategies affected the project in terms of cost, time, effectiveness, and long-term data use. Results: From these interviews five major themes emerged as important: intentional staffing, addressing essential data management elements, iterative improvement, training and mentorship, and increased efficiency and peace of mind. Conclusions: Despite the initial investment that data management requires, researchers report significant benefits. Correspondence: Margaret H. Burnette: phburn@illinois.edu


Introduction
Academic research has come under fire in recent years as high profile studies have failed to live up to scrutiny. While multiple issues contribute to rigor, transparency, and reproducibility of research, the stewardship and sharing of research data has been one area where dedicated improvement has been focused. In response to calls for better management of digital data that results from federally funded research projects in the U.S., Data Management Plans (DMPs) are becoming a ubiquitous component of grant applications. Although highly variable between agencies or even sub-agencies in length, detail, and extent of review (Mischo, Schlembach, and O'Donnell 2014;Whitmire et al. 2015), DMPs are generally short, high-level descriptive plans that prescribe the data to be generated by a research project, how that data will be stored (securely, as required), who will have access, what documentation and metadata will be created with the data, and preservation intentions if the data are to be preserved long-term. The U.S. National Institutes of Health (NIH) implemented data sharing requirements for grants $500,000 and above in 2003 and the U.S. National Science Foundation (NSF) began requiring DMPs in 2011. As a result, the University Library at the University of Illinois at Urbana-Champaign implemented consultation services in 2011 to assist researchers with data creation and review as researchers prepared DMPs for grant applications. After one such consultation, a researcher contacted the Library in search of assistance with implementation of the DMP created for an externally funded project. The timing was ideal as Illinois had just formally established a Research Data Service (RDS) and was expanding services and outreach.
The externally funded project involved three University of Illinois faculty members working collaboratively across disciplines from their respective departments to carry out a complex study involving hundreds of human subjects. Prior to this project, the collaborators had conducted a pilot study funded through a campus seed grant, which served as the basis for the main project proposal. The seed grant project functioned as a learning process that allowed the collaborators to pilot their study design and refine execution while enabling critical collection of preliminary data. As a locally funded pilot study at a time when data management planning was just beginning to emerge as an important issue, the researchers did not develop an explicit data management plan at that time. Investigators did consult with the RDS for the main externally funded study, however. Because managing data requires an investment in funds and time (Holdren 2013), it is essential to document outcomes, to assess success, and demonstrate return on investment for data management. The RDS team anticipated that the ad hoc data management in the pilot study juxtaposed against the proactive data management of the main project presented a unique opportunity to examine outcomes and return on investment (ROI) from the researcher's point of view.
The externally funded main project exemplified the challenges that face multidisciplinary research: many actors (in this case research subjects, undergraduates, graduates, staff, and faculty) requiring highly effective communication and cooperation for successful project coordination, multiple primary and secondary data forms to be documented and managed securely, and complex analysis that must transcend individual disciplines to contribute to the larger conclusions. As with most grant projects, the lead researcher (referred to here as the "principle investigator" or PI) bore ultimate responsibility for project management, direction, and execution of the overall grant along with responsibility for carrying out the research specific to the PI's area of expertise. The two Co-PIs assumed responsibility for direction and execution within their own respective areas. In this study the project team recruited research subjects for Successful DMP Implementation in a Multidisciplinary Project JeSLIB 2016; 5(1): e1101 doi: 10.7191/jeslib.2016.1101 face-to-face encounters. The resulting primary data included multimedia recordings as well as session notes and surveys.
In response to the request for assistance with DMP implementation the Illinois RDS assembled a team that included a data librarian in the life sciences, a subject specialist in health sciences, and the director of the Illinois RDS. Three in-person consultations were held with the project research team between August 2014 and November 2014. The first of these was held with just the PI and the subsequent two meetings were held with the PI and the project managers as the study commenced. The researchers requested guidance about identifying computational resources on campus, appropriate strategies for data documentation and organization, how to manage multiple people accessing and manipulating files, and how to increase the long-term viability of the data (e.g., preservation formats for data files). To assist the researchers with these questions, the RDS team crafted a report with recommendations and resources, which they provided to the PI in November 2014. The report included the following information: The RDS team followed up with researchers post-report to determine the usefulness of the recommendations and inquire if other issues had surfaced. As the main project progressed it was clear that the researchers took data management seriously and were successfully implementing strategies -an outcome we'd like to see readily repeated across our campus and beyond. The Illinois RDS team proposed to the PI to document the project's data management implementation in the form of a case study. The PI vetted the idea with Co-PIs and all were supportive of the case study. As a result, the Illinois RDS team, represented as the authors of this paper, carried out semi-structured interviews with the PI, Co-PIs, and two project managers.

Background
The importance of data management in academic research is not a new concept. Mischner first described "information entropy" nearly three decades ago as the process by which information naturally degrades over time as human memory weakens (Michener 2006). Although Mischner's concept arose from the ecological sciences, the broad -and seemingly immortal -relevance speaks to the challenges across research as a whole. However, within recent years data management has garnered national visibility that has in some cases reflected negatively on academic research. For example, in addition to making appearances in academic news sources, the Vines et al. 2014 article in Current Biology on data loss was picked up by such disparate sources as The Telegraph and Smithsonian.com, and was even mentioned in Vanity Fair (Vines et al. 2014;Rennison 2016).
Likewise, U.S. government agencies have increasingly paid closer attention to data management. As mentioned above, NSF began mandating inclusion of DMPs for all grant applications in 2011. Following a memorandum from the Office of Science and Technology Policy (OSTP) in 2013, all U.S. agencies providing federal Research and Development funds in excess of $100 million were required to develop a public access implementation plan that would account for the management of digital data (Holdren 2013). Although the timeline for plan development and implementation is notably behind schedule, many other U.S. agencies, including the NSF, now require DMPs with all grant submissions, and given this trend, over a dozen more are expected to follow suit (Scholarly Publishing and Academic Resources Coalition 2016).
Reviews of multiple agency public access implementation plans reveal wide variation in depth and scope (Dietrich et al. 2012;Whitmire et al. 2015). More often than not researchers are left to their own devices as to how to meet requirements for tracking and sharing research. For example, the NSF FAQs give only vague advice to what even constitutes data by indicating that it will be "determined by the community of interest through the process of peer review and program management" (U.S. National Science Foundation 2010). Likewise, the Department of Energy leaves it up to the PI to "determine which data should be the subject of the DMP" (U.S. Department of Energy 2014). In comparison to the staggering variety of research and data that exists, few implementation plans include specific guidelines or standards for data format, description, metadata schemas, or data deposit.
With the national conversation as a backdrop, increased emphasis on research data management planning and sharing has catapulted data literacy to the forefront of e-science support efforts in academic libraries and their respective institutions. One major driver of this trajectory is that, while the requirements for DMPs have grown, researchers have struggled to provide such plans given the lack of clear guidelines, details, and standards (Dietrich et al. 2012;Steinhart et al. 2008). Efforts to assist researchers are multi-layered and include an array of data consultation and education programs and/or the development of data technologies and infrastructure. In some cases, these initiatives represent campus collaborations that variously include the libraries, offices of research, and technology centers (Soehner, Steeves, and Ward 2010). In other cases, libraries have taken highly proactive lead roles for data management support, data literacy education, and/or infrastructure tools and programs to support researchers.
It is widely acknowledged in the LIS literature that librarians are uniquely qualified as experienced knowledge management professionals to take on a variety of roles that are required for quality data management. Garritano and Carlson note that this skill set includes traditional library science expertise, subject specialization, the ability to foster collaboration both within and outside the institution, acumen in the area of grant-funded research, and the ability to balance one's workload (Garritano and Carlson 2009). Researchers can feel illequipped in their ability to adequately manage, share, and deposit data (Brandt 2007), all of which are activities that align well with LIS skill sets. Furthermore, librarians' strengths in information description, organization and access, position them to contribute significantly to the development of policies and procedures for managing institutional research data (Gabridge 2009;Salo 2010;Steinhart and Lowe 2007).
In many cases, librarians are already up to speed given "the ability to collect, organize, Successful DMP Implementation in a Multidisciplinary Project JeSLIB 2016; 5(1): e1101 doi: 10.7191/jeslib.2016.1101 describe, curate, archive, and disseminate data and information" (Brandt 2007). For example, subject specialists who are deeply engaged in research disciplines and can speak the researchers' language, may be able to provide appropriate descriptive ontologies and metadata standards which can be unique to a discipline (Gabridge 2009;Brandt 2007). Additional opportunities beyond those that naturally fall under LIS, including creation, processing, and use of data, may also be available (Brandt 2007). On an even broader scale, data literacy education and handling confidential and private data represent areas ripe for program development. Librarians are adept at gathering and evaluating resources and documentation that guides selection of data and metadata standards, and making preservation decisions contributes to researcher awareness and institutional policy development (Dietrich et al. 2012). However, despite the clear alignment between data management support and library programs, barriers to developing programs for data management within libraries are not insignificant. It takes strong commitment and investment in personnel and time to be able to meet the varied demands of research data support across the data lifecycle.
Given the commitment and investment required, libraries must consider the effectiveness of the support provided. Librarians and archivists have relied on researcher input to inform the development of data management tools and support (O'Meara 2008; Peters and Dryden 2011). Likewise, resources such as the Data Curation Profiles have provided a solid foundation by thoroughly documenting researcher practices and needs (Witt et al. 2009). However, gauging the return on investment is elusive. Case studies frequently discuss the outcomes and effectiveness of data management interactions, although usually from the perspective of librarians, not the researchers (Bracke and Fosmire 2015;Carlson and Stowell-Bracke 2013). The case study in this paper aims to build on these earlier works that explore the effectiveness of implemented data management strategies and support, but from the researcher perspective. While mandates, policies, and advocacy may influence behavior, the extent of data management success ultimately depends on researchers' level of commitment to the issue. We anticipate that documenting researcher-identified benefits during the successful implementation of data management strategies -regardless if those strategies came from our preliminary recommendations -will allow us to develop a more actionable RDS service model based on local researcher feedback.

Methodology
After providing a series of data management consultations and recommendations, this study was designed to gather insights and feedback from the research group about the group's data management strategies. Eight semi-structured interview questions were developed (see Appendix), which focused on the recommended data management strategies that were actually implemented by the group, and how those strategies affected the project in terms of cost, time, effectiveness, and long-term data use. All of the interviews were conducted in July 2015, approximately eight months after we provided formalized data management recommendations to the research group.
We contacted the five main research group members via email to request their participation in this study, and all agreed to participate. Three interviewees were faculty members, the PI and the two Co-PIs, and they were interviewed individually in their offices. Two additional interviewees were project managers; one was the outgoing project manager who had participated in the consultation meetings and was instrumental in implementing the Successful DMP Implementation in a Multidisciplinary Project JeSLIB 2016; 5(1): e1101 doi: 10.7191/jeslib.2016.1101 recommended strategies, and the other was the incoming project manager who was learning the data management strategies from the outgoing project manager. We initially interviewed the outgoing project manager individually, and later we interviewed the incoming project manager with the outgoing project manager in attendance.
Two members of the RDS team (not always the same two) participated in each interview. Both interviewers asked questions and took notes. We then independently reviewed merged notes from each interview and identified five overarching themes.

Staffing with Data Management Duties in Mind
In their meta-analysis of the research project management literature Brocke and Lippe propose four proactive measures for effective project management, including "Appoint a skilled technical/scientific project manager: The project manager should offer strong knowledgebroker and dialogue skills, a diplomatic attitude, an excellent degree of technical awareness, and a delegating and participative leadership style" (Brocke and Lippe 2015).
Indeed, through our interactions with the research group during consultations and the subsequent interviews it became clear how important it was that the collaboration had hired a dedicated project manager who, along with other duties such as managing participant registration, also had explicit responsibilities for data management. Based on experience with the pilot project that preceded the main project, the PI felt strongly that access to the data needed to be controlled and that one person needed to oversee data collection and file management. The phrase "too many cooks in the kitchen" came up independently in two of the three interviews with the PI and Co-PIs. For example, in the pilot project one individual student had independently and inadvertently disrupted the file directory organization which resulted in both hours of time lost hunting down the moved files and the corresponding stress associated with the potential of lost data.
The PI also emphasized the importance of hiring the right person, this being someone who listens well and demonstrates that they can be trusted with the responsibility of the project, a point echoed by the Co-PIs as well. As the PI noted, "a good project manager is worth their weight in gold." Notably, this project did not need staff dedicated solely to the management of data and at no point did data management constitute a full time job in and of itself, although the planning was front loaded as the project got off the ground (vide infra). Instead, they sought a project manager who took on many responsibilities and had prior experience in the research area, but during the hiring process they explicitly sought out someone with superior attention to detail and organizational skills. One Co-PI commented that if you need organized data, then you must find people who are good at organization. The out-going project manager who held the initial responsibility of leading implementation of the DMP estimated that they spent 15-20 hours at the beginning of the project developing organizational structure and associated documentation and, approximately 5 hours per week thereafter, implementing and maintaining data management processes and procedures.

Important Data Management Elements (Planning, Communication, Documentation, Checks)
Although inclusion was not uniform, Dietrich et al. identified seventeen distinct elements collected from major funders' data policies (Dietrich et al. 2012). Broad element areas such as standards, access and preservation, and publications exemplify the need for support to enable best practices and decision-making.
One critical element is thoughtful and collaborative project planning, the importance of which cannot be understated. Clearly defined processes are even more critical for joint projects that include multiple disciplines and multiple PIs. More than one interviewee commented that much of the data management effort was front loaded at the beginning of the project, but each felt that this was effort well spent. The project PI commented that initial workflow charting was helpful in ensuring that everything was in place from the beginning, mitigating the need for midstudy adjustments. As previously mentioned, hiring a highly organized and competent project manager was key, and it was essential that the project manager's job duties included responsibility for the overall quality of the data.
Regular communication throughout the project included weekly meetings about progress and process ("nitty gritty stuff"), which contributed to smooth execution of data collection and processing. Although each PI had clearly defined roles and delineated responsibilities, they communicated regularly to ensure everything was being done according to protocol and that no pieces were missing. One of the lessons learned from the previous pilot study was not having proper process documentation, and the failure on the part of one individual to record how files were processed resulted in hours of time spent figuring out what was done and when. With this failure in mind, one of the Co-PIs stressed the importance of establishing a welldefined workflow.
For this project, all protocols are documented and a clear hierarchy of personnel and data was established. Every spreadsheet has a ReadMe file -"what I am, where did my data come from, all variable data," and what data files were used when and by whom. In addition to an overall workflow table, the project team had specific workflow tables for each step, including a checklist. A data dictionary established all variables and was embedded in data documentation. One of the Co-PIs noted that project data analysis required "small real-time changes that have big consequences" and stressed the need to record what was done and how.
Failsafe measures are a necessity. As noted by one Co-PI, "you need to have double-checks in place." Regular "data audits" ensured adherence to file conversion processes, file naming conventions, storage and backup routines, and ongoing documentation. Using strict controls for who has access to specific data files contributed to overall data quality and accountability. In addition to the project manager's regular data audits, the PIs periodically examined project data files to assess the quality of data analysis and adherence to protocols and conventions. Checklists ensure that steps are completed no matter what the process.

Iterative Improvement of Data Management Enables Higher Data Quality
The quality of data management in and of itself was stressed by all three faculty members. The PI commented that "management is part of data quality," and all PIs relayed how high quality data management is important not only for the short term but also for long-term use of the data. According to a Co-PI, managing the data enables confidence that the data will be reusable internally and to others and noted that transparency is important to future science. The other Co-PI emphasized that it's not good science unless the data is managed well since "you are only as good as your data." Comparing the pilot data with the main study data, the PI noted that the quality of the main study data overall is better, although this is not entirely due to the quality of the data management per se, but that many facets of the experience gained from the pilot project improved the subsequent work. A Co-PI likewise recognized that the pilot study served as a training opportunity to learn how to acquire cleaner data, but noted lessons were learned across the entire data management spectrum -from data organization to data collection.
The need to iteratively improve upon the quality of data management came up repeatedly during the interviews. The project manager noted while the PI had excellent data management in place overall, the processes had improved over time. Reflecting on past experiences one Co -PI acknowledged that it can be easy to not get involved in data management and instead trust that good habits are in place. For example, prior to these two collaborative projects a Co-PI lost data due to a student's crashed computer in one case, and experienced momentary uncertainty when data was requested several years after a research publication in another case. Because of these experiences, the faculty member realized the need to proactively "force" better data management habits. While this started with mandatory back-ups, this Co-PI saw a need for increasing sophistication of procedures, as well as checks and balances, to ensure sufficient management is in place. The Co-PI noted that it's important for advisors to get their "fingers in there" and actually check the organization of the data on students' computers. This type of oversight enables assessment of the quality of the data management itself, although typically advisors focus solely on the veracity of the data.

Training and Mentorship for Good Data Management Practice
Data literacy education efforts are still new and researchers simply don't know what they don't know when it comes to best practices for managing and sharing research data (Schumacher and VandeCreek 2015;Sapp Nelson 2015). For many institutions the demands of data management support begin with internal training that can then extend to students and faculty (Brandt 2007). Gabridge advocates for "seminars and other support mechanisms (Web page, tutorials) to help student researchers understand what to do with their data and increase their awareness of library resources" (Gabridge 2009). For graduate students, having a PI as mentor for quality data management sets the stage for lifelong research practices.
The PI and Co-PIs all talked about the value of training and mentorship for good data management practice, but the Co-PIs especially emphasized its importance. One Co-PI noted that although a seasoned researcher, there was still a learning curve with this project, because the Co-PI's did not have a background in experimental science. This individual did not have the benefit of a mentor to provide guidance on how to manage heterogeneous data for an interdisciplinary research project. With exposure to a relative data management "enlightenment," this Co-PI anticipates beginning future projects with the good data management practices gained though this collaboration. Furthermore, the Co-PI felt the data and personnel management experience gained with this project could be recycled for future Successful DMP Implementation in a Multidisciplinary Project JeSLIB 2016;5(1): e1101 doi:10.7191/jeslib.2016.1101 grant projects.
The other Co-PI came into the project with more data management experience, although again with no formal training. As a graduate student, the Co-PI did not have the benefit of training to learn good practices for managing large-scale projects or data, but stressed that data management training should be part of graduate programs. Instead, this individual picked up skills by being "fortunate" to have collaborated with colleagues who manage data well.

Time, Energy and Peace of Mind
Good data management requires significant time and effort that researchers may be unwilling to invest without specific mandates or demonstrated ROI (Dietrich et al. 2012). Standardization, best practices, and the availability of institutional support for data management help to establish the value of planning.
Nearly all of the interviewees commented that the data management strategies saved time and energy and provided peace of mind. The PI, one of the Co-PIs, and a project manager noted that the time devoted to data management activities was concentrated in the early stages of the project (when file naming conventions and file folder organization were established), but this solid foundation saved effort later in the project. Thereafter, active management of the data took a fraction of the project manager's time, namely in the form of checking and enforcing adherence to conventions and protocols. The project managers did not need to think about where files were saved and rarely received questions about file locations, because everyone knew and understood the organizational framework. The outgoing project manager noted that without the data management plan, "I feel like I would have gone crazy" because of the number of protocols and files that were integral to the project.
Some of the interviewees acknowledged that it was difficult to quantify how much time was saved by the data management strategies. The PI said it was difficult to estimate the amount of time saved, however, the PI believed that a significant amount of time would have been wasted without the high-level of organization in place. One Co-PI commented that the actual return on the time invested in data management cannot be known, because as the individual said, it is a "mess that never happened." Even if the time saved is negligible, the Co-PI felt that the time spent resolving a data problem or crisis could be spent "more pleasantly" establishing strong data management processes.
For the PI, having data management strategies saved not only time but also provided peace of mind. The other Co-PI expressed the weight of pressures that many faculty members face and emphasized that the time saved and efficiencies gained through good data management strategies can help reduce those pressures. The Co-PI further remarked that anything that saves time matters.

Conclusion
The data management spotlight illuminates multiple challenges researchers encounter in complying with funding mandates for data sharing and preservation that contribute to new and faster discoveries and accelerated scholarly output (Steinhert et al. 2008). Data policies vary widely and many lack sufficient detail or guidance. Additionally, policies often stress access to research data at the expense of preservation (Dietrich et al. 2012). Researchers may fail to appreciate that deliberate ongoing management of research data contributes to the ability to share outputs. Furthermore, the current rhetoric about the benefit of data management often abstracts to the "tax payer" investment or the "greater good" of science instead of addressing tangible benefits of data management to those closest to the research.
The researcher point of view in this case study provides unique insights into the benefits of research support consultation for proactive data management. The case study highlights how a team of collaborators were able to build off of a pilot study and, with library support, implement more robust data management for a subsequent study. Notably, creation of the DMP for the main study proposal prompted the PIs to contact us early in the study to implement data management practices. PIs found the information provided through the RDS team consultation and subsequent report, including concrete examples, helpful in moving them from being more passively "aware" of data management best practices towards being able to identify and implement strategies for themselves.
As proactive and intentional data management becomes an explicit part of researchers' responsibilities, evaluation of data management outcomes from the perspective of the researchers is crucial to being able to understand -and convey to more reluctant researchers -the value and benefit of data management. Encouragingly, the researchers in this case study not only found our assistance useful, but the data management strategies they implemented were not arduous or difficult to maintain. By taking the time and devoting personnel to the development of realistic and functional strategies at the beginning, the project simply functioned more smoothly. Interestingly, while more tangible issues, such a reduction in incidents of lost data, were apparent, less tangible issues, such a reduced stress and anxiety, were also highly valued outcomes that emerged from the interviews. One potential avenue for future exploration may be to study data management in the context of stress management.
This study likewise provides insight into service development for RDS programs. In our own programming, as we approach researchers during outreach or as researchers approach us for consultations or DMP reviews, we now have these experiences to draw on. We have begun to blend these themes more intentionally into our outreach and consultations. For example, as DMP reviews come in we emphasize the need to define and assign roles, as we engage with new faculty we emphasize the need for mentorship in developing good practices, and when researchers come to us overwhelmed and frustrated with their data management mishaps we emphasize iterative improvement. While none of these themes struck us as shocking, having empirical evidence affords a new level of credibility and confidence -both for the researchers and ourselves -as to the importance and effectiveness of data management planning and training.
As anticipated, the juxtaposition between the pilot study and the main project clearly provided the PI and Co-PIs with markedly different experiences, which influenced their perspectives on the importance of data management. Indeed, we note that a key factor in the success of the implementation was the engagement of the faculty involved. Each recognized the importance of quality data management and went far beyond thinking of a DMP as a "check box" for compliance. For researchers who may be less eager to assume such a proactive approach, this case study provides a compelling example by showing how implementing a data management plan can provide very genuine benefits. Armed with this preliminary success, we hope that we, and others within the data community, are able to undertake more formal and intensive studies, perhaps via similar ad hoc vs. proactive scenarios, in order to further inform our understanding of what factors lead to data management successes.