Curriculum Data Deep Dive: Identifying Data Literacies in the Disciplines

Objective: Evaluate and examine Data Literacy (DL) in the supported disciplines of four liaison librarians at a large research university. Methods: Using a framework developed by Prado and Marzal (2013), the study analyzed 378 syllabi from a two-year period across six departments—Criminal Justice, Geography, Geology, Journalism, Political Science, and Sociology—to see which classes included DLs. Results: The study was able to determine which classes hit on specific DLs and where those classes might need more support in other DLs. The most common DLs being taught in courses are Reading, Interpreting, and Evaluating Data, and Using Data. The least commonly taught are Understanding Data and Managing Data skills. Conclusions: While all disciplines touched on data in some way, there is clear room for librarians to support DLs in the areas of Understanding Data and Managing Data. Correspondence: Chrissy Klenke: cklenke@unr.edu Received: June 29, 2019 Accepted: October 3, 2019 Published: February 3, 2020 Copyright: © 2020 Klenke, Schultz, Tokarz, and Azadbakht. This is an open access article licensed under the terms of the Creative Commons Attribution License. Data Availability: Data associated with this article is shareable upon request. Disclosures: The authors report no conflict of interest. Full-Length Paper Curriculum Data Dive: Identifying Data Literacies in the Disciplines Christina M. Klenke, Teresa Auch Schultz, Rayla E. Tokarz, and Elena Azadbakht University of Nevada, Reno, Reno, NV, USA


Introduction
Data is a major discussion topic and trend in academic institutions around the world. The Association of College & Research Libraries' (ACRL) Research Planning and Review Committee (2018) found that data-related topics were among the "2018 top trends in academic libraries" and higher education. The Committee stated that the trends and issues impacting academic libraries include "learning analytics, data collection, and ethical concerns" and "research datasets acquisitions, text mining, and data science." At the national level, the White House Office of Science and Technology Policy (OSTP) ordered most federal departments, including the National Science Foundation (NSF) and the National Institutes of Health (NIH), to require all awardees to make their research and data openly available (Holdren 2013). Prado and Marzal (2013) explain that "The Open Data movement, heir in part to the Open Source and Open Access movements, encourages the free publication of data from different domains under licenses and favor reuse" (123). Open access to research data has made it easier to access and use this data for new purposes. The creation, manipulation, use, and evaluation of data is key to making decisions on a global scale.
The data trend has placed considerable emphasis on the importance of data literacy (DL) and competencies across disciplines. Prado and Marzal (2013, 126) define DL "as the component of information literacy that enables individuals to access, interpret, critically assess, manage, handle and ethically use data." In many fields, educators and researchers must use data analysis or visualization in their work. Graduate students and even undergraduate students are often responsible for finding and obtaining data, reading and interpreting, managing, organizing and storing, and running statistical analysis on data. However, are students being taught data skills necessary to complete even the most basic tasks? If not, how can libraries support these DL needs?
In order to answer this question and to better understand DL needs at the academic institution, the authors-all liaison librarians-chose to conduct a syllabi review in order to study how the classes in their respective departments incorporated data in assignments and other classroom activities.
In December 2018, the authors' institution was elevated to the 'R1' Carnegie classification, which brought greater significance and necessity for research and big data support. "Big Data is predicted to transform almost everything about how we live our lives. From healthcare to finance, from advertising to entertainment, unprecedented amounts of data are being generated and stored, and skilled professionals are needed to analyze that data for important insights" (University of Nevada, Reno n.d.).
The university is a land-grant, public research institution that offers 72 undergraduate degrees and programs, 64 master's degrees, 44 Ph.D. programs, and a total of $188 million in research grants awarded in fiscal year 2018. The university had a student population of 19,911 (16,520 undergraduates and 3,011 graduate students) as well as 2,069 faculty (1,086 academic faculty and 983 administrative faculty) as of Fall 2018.

Literature Review
Data literacy is a complex term. Many studies note that the term has been interpreted in several ways (Bowler et al. 2017), has no standard list of proficiencies (Dechman and Syms 2014), and has no concrete definition (Borgman 2016). DL has been defined as "an application of information literacy in the context of research" (Carlson and Johnson 2015, 29), "the ability to understand, find, collect, interpret, visualize, and support arguments using quantitative and qualitative data" (Deahl 2014, 41), and "a specific skill set and knowledge base, which empowers individuals to transform data into information and into actionable knowledge by enabling them to access, interpret, critically assess, manage, and ethically use data" (Koltay 2017, 10). Each of these definitions focus on the application of data in a research context. Mandinach and Gummer (2013) broadly define it as an "ability to understand and use data effectively to inform decisions" (30). There are also discipline-specific DLs such as science DL that "adds an emphasis on scientific inquiry through collecting, transforming, managing, and using data" (Qin and D'ignazio 2010, 189).
Researchers have developed frameworks to create proficiencies in order to assess DL competencies. Developed as a "starting point for discussing national standards" the Data Literacy Competencies Matrix was created to aid in DL education (Ridsdale et al. n.d.). The matrix consists of 23 competencies and 64 tasks/skills. Prado and Marzal (2013) Maybee and Zilinski (2015) used different DL models to create "data informed learning," which "guides data-related course content, encourages coursework relevance, and supports lifelong learning" (3).
These frameworks have helped librarians and other researchers determine DL needs in general education. Several studies have used Prado and Marzal's DL framework, including a syllabi project identifying DL skills in research assignments (Brodsky 2017) and a project promoting the use of curriculum integrated data information literacy (Macy and Coates 2016). Other studies have also used Carlson et al.'s framework, including to pilot a DL program in an agriculture course (Carlson and Bracke 2015), to develop a new competency matrix for data management skills (Nelson 2017), and to create a data credibility checklist for STEM undergraduate students (Zilinski and Nelson 2014). These frameworks have helped further DL education at the college level.
Many of the framework studies have focused on the analysis of course syllabi to determine assignment and project needs. Brodsky (2017) analyzed syllabi of business courses, and  looked at information and DL needs in nutrition science and political science courses. Other studies have reviewed syllabi to look at broader information literacy needs, including Boss and Drabinski (2014), O'Hanlon (2007), and Dinkelman (2010). Another study reviewed all courses for undergraduate students in one semester to determine their alignment with the ACRL standards (McGowan et al. 2016).
While many of the previous syllabi studies focused on either a few disciplines or identifying information literacy needs, the purpose of this study is to look at disciplines that have been overlooked in known DL studies. By analyzing syllabi from six disciplines and focusing on DL needs, the authors aimed to examine DL in a different context and identify ways that subject librarians can refine and grow their support in data education.

Methodology
The authors conducted a syllabus review focusing on six University of Nevada, Reno departments: Criminal Justice, Geology, Geography, Journalism, Political Science, and Sociology. The authors approached the departments they each liaise to and requested all syllabi for classes taught from Summer 2016 through Spring 2018, including both undergraduate and graduate classes. Five of the departments shared digital copies with the authors; criminal justice shared some of their syllabi as digital copies. For those that did not have digital copies, they allowed the authors to view the physical syllabi in person. One department, journalism, only shared syllabi from faculty who specifically agreed to take part. The authors did not press the departments to ensure all syllabi for that time period were shared. This resulted in a total of 378 syllabi: 39 in Criminal Justice, 45 in Sociology, 47 in Geography, 66 in Geology, 90 in Journalism, and 91 in Political Science. In comparison, full-time enrollment (FTE) in these departments in Fall 2017 were: 172.9 students in Geography, 188.3 students in Geology, 228.8 students in Journalism, 237.3 students in Political Science, 277.9 students in Criminal Justice, and 348.5 students in Sociology.
The authors initially set out to evaluate the syllabi using the Data Literacy Competencies Matrix by Ridsdale et al. (n.d.). However, after further review of the framework, the authors agreed the frames were too specific and overlapped with each other too much for the purposes of this study. Tracking so many specific competencies proved difficult, especially as many syllabi approached DL at a general level.
The authors then looked at other possible frameworks and eventually settled on the one by Prado and Marzal (2013), focusing on five broad DL core competency categories (Table 1). Although the differences between the categories were not always clear (thus making it sometimes confusing as to which frame a specific component of a syllabi fell under), the authors found it was broad enough to allow for variances in the language that professors used.

DL Competency Definition
Understanding Data • What is data or what do we mean by data?
• The role of data in society -how generated, by who, possible applications.
Finding and/or Obtaining Data • Knowing sources of data, how to evaluate them, and select the most relevant one.
• Knowing when current data is not enough and how to collect data.
Reading, Interpreting, and Evaluating Data • Knowing the different forms data can take, their conventions, and how to interpret them.

Managing Data
• Using metadata and reference management tools to manage data, and knowing data management repositories and their policies.

Using Data
• Preparing data for analysis, analyzing data in terms with results sought, knowing how to use needed tools.
• Synthesizing and representing data analysis results in a format suitable for your audience.
• Using data ethically, acknowledging sources, and reporting results honestly.
Curriculum Data Deep Dive JeSLIB 2020; 9(1): e1169 https://doi.org/10. 7191/jeslib.2020.1169 The four authors sought to determine interrater reliability by evaluating the same 56 syllabi. Two or three syllabi from each department was chosen, working in batches of 12 or 21. After coding one batch individually, they would meet to discuss their differences and come to an agreement on how part of a syllabus should be coded. The authors went through four iterations of this, attempting to reach an interrater reliability rate of 90 percent. However, the authors found that despite continued efforts, their agreement rate topped out in the high 70 percent range. A large part of this was because of the vagueness of some of the syllabi and the different disciplinary knowledge levels among the authors, as they were often better versed with the language and habits of their own liaison departments than they were with others.
The authors then agreed to divide the remaining 322 syllabi evenly by random assignment, then rate them individually, and mark any possible questions for further group discussion. All four authors discussed and then came to an agreement on how to code any areas marked as questionable by an individual author.
The coder marked a syllabus as a "yes" or "no" for each frame based on if the syllabus indicated that frame was touched on in some way in the class. The authors did not count the number of instances per frame. For example, if geology students had a field assignment that involved using maps and compasses to gather data, that was counted as a "yes" for Finding and/or Obtaining Data. In most cases, where there was not enough information provided in a syllabus to make a determination, the frame was tagged as a "no." However, the authors had prior experience with some of the classes and had knowledge of the discipline. They used their expertise to provide clarity to the information in the syllabi and to inform the frame determinations.

Limitations
Data-and research data in particular-can be hard to define. The types of data generated and analyzed vary across and even within academic disciplines. Research data may be numerical in nature or come in the form of focus group transcripts, images, or laboratory specimens. At times, this fluid understanding of research data made the task of evaluating the syllabi more challenging than it appeared at the outset. Similarly, no existing DL framework is perfect. There is some overlap between the five overarching categories that Prado and Marzal delineate, which made it hard to determine where a particular lesson, classroom activity, or assignment fit within their framework. The authors regularly checked in with one another during the syllabi review process, but this likely did not eliminate all ambiguities.
Because there is no central university repository of syllabi, the authors relied on the individual departments for access to the syllabi. It is possible that some additional syllabi were unintentionally left out of the pool collected for this study.
Moreover, the syllabi themselves are not standardized and vary in length and style. Some instructors provided very detailed information about the course content and assignment requirements, while others were less explicit. Because the authors relied on what was written in a syllabus, this might have led the authors to rate several courses as less data-intensive than what was actually covered.
Likewise, each of the study's authors liaises with several departments across campus and, as a consequence, is familiar with the courses, assignments, jargon, and disciplinary conventions surrounding data in their areas. Although syllabi from various fields were distributed evenly among the four authors, this "insider knowledge" might have impacted how the authors analyzed syllabi language from, or closely related to, the disciplines they serve. What would have appeared as a vague assignment description or difficult-to-parse jargon to one author reviewing the same syllabus might make sense to another author, if she is the subject specialist for that discipline. Conversely, liaison librarians are not experts in the subjects they work with-or at least not all of them.

Results
To analyse the data, "yes" there is evidence of DLs and "no" there is no evidence of DLs were coded with a 1 or a 0. Yes = 1, No = 0. Each 1 and 0 was summed and added up for each category or topic. The percentage was calculated by dividing the total number of 1s by the total number of syllabi in that category. Over all 378 syllabi were reviewed. The Reading, Interpreting, and Evaluating Data (24.87%) core literacies and competencies had the highest percentage of mentions in syllabi, followed by Using Data (22.49%). Managing Data (2.66%) had the fewest mentions across disciplines (Figure 1). The amount of DLs per course Syllabi was counted (Table 2). About 64% of the Syllabi reviewed had no evidence of data as you can see in Figure 1. About 11% of the courses had evidence of at least one data literacy. The percent of literacies detected in each course went down slightly as more literacies were detected. Out of the total 378 syllabi received, only six courses (1.58%) had evidence of all five DLs.  (Figure 3).

By Discipline
The authors found that overall, Geology (69.7%) and Geography (53.2%) had the highest percentage rates of DLs mentioned in their syllabi (Figure 4). These two departments had a number of courses that hit all five core literacies. These disciplines regularly use a variety of data for analysis, both in research and traditional careers, and teach data-heavy courses like Geographic Information Systems (GIS). Sociology (2.1%) had the smallest overall percentage of courses that mentioned DLs and competencies in their syllabi.

By Semester
The spring semesters had a higher rate of courses offered with data needs ( Figure  5). Again, the top three prominent literacies found were Finding and/or Obtaining Data; Reading, Interpreting, and Evaluating Data; and Using Data. There was also a slight increase in Understanding Data. This could be the result of more courses being offered during those semesters than in the fall semesters. In addition, more high-level courses were offered during spring semesters, in which working with data was prevalent. Some syllabi did not specify a semester and were excluded from Figure 5.

Discussion
By far the most noticeable finding was the lack of incorporation of the competency Managing Data into classes. This was true even in classes that taught the other DLs. There could be several reasons for this. For instance, faculty might not be aware of best practices around research data management (RDM), which have been spurred in the past decade by funder mandates for data management plans (Gold 2010). Studies have shown that researchers self-report needing help with RDM (Barone, Williams, and Micklos 2017;Steinhart et al. 2012;Johnston, Lafferty, and Petsan 2012), indicating it is tacit knowledge and not something they learn in a formal setting. There is also the possibility that professors did touch on data management but do not include it in their syllabus, which could also reflect the possibility of how data management is an unspoken disciplinary norm.
Understanding Data was also underrepresented in many courses despite its focus on an introduction to data. Professors might assume that students already have a basic understanding of what data is and how it is used in society. Some classes are starting from the middle of the data cycle and might require a prerequisite data course that covered this competency. In other cases, professors might have unintentionally left this competency out of the syllabus, believing it was an obvious part of the course content.
Another issue was determining whether the course had an actual data need due to the use of the term "data." Because the term itself is interpreted in many ways, it was difficult to determine its meaning. Many syllabi used the term "data" but did not mean it in the way the framework had defined it. Some syllabi lacked sufficient information to clarify the use of the term. Other syllabi used the term in a more existential way, alluding to the notion that everything is data. The authors had hoped for more clarification than their syllabi provided. For example, classes could have focused on several DLs, but syllabi often left out detailed assignments,  /doi.org/10.7191/jeslib.2020.1169 projects, and lecture topics. This made it impossible to determine the full range of DLs addressed in the classroom.
When people think of disciplines that utilize data, they often think of the physical sciences. This assumption held true for this study as Geology classes proved to be the largest data users from selected disciplines, followed by Geography, which places a large emphasis on physical geography and tends to straddle the physical and social sciences. However, Criminal Justice, Political Science, and Journalism did show noticeable evidence of Using Data. On the other hand, it was sometimes difficult to determine if the type of "data" was in fact what the authors defined as "data." It is important to note that the authors did not conduct a qualitative assessment of the syllabi; that is, they did not judge to what extent a syllabus included any of the DLs or what part of a competency they included. For example, some of the classes that included Finding and/or Obtaining Data might have only briefly touched on it, while other classes went more in depth. This could affect how liaisons approach these classes, as some could use additional support in these competencies, while others might only need scaffolding support in the other competencies.

Data Literacies
With the results of this study in mind, the authors are brainstorming ways in which they and their colleagues might become more involved in DL in the classroom. Like many librarians across the country, the authors and their colleagues are already working to establish Research Data Management (RDM) services. One of the authors created a Canvas module-available to all faculty and staff at University of Nevada, Reno-on the basics of RDM. Two of the authors have offered workshops on RDM to graduate students and early-career researchers. Another routinely provides support for GIS-related research projects. Growing these initial offerings and adapting them for use in other contexts will be important to fostering students' DL skills across campus.
Most of these efforts have taken place outside of the classroom. In order to directly put to use the analysis and knowledge gained from the syllabi, the authors will need to reach out to course instructors within their liaison areas and persuade them to make time for librarian-led DL instruction that supplements what students are learning about data in these courses. Based on the study's outcomes, the authors will likely focus on teaching Understanding Data when working with undergraduate courses but will focus on Managing Data skills in upper-level courses. Pairing information from the syllabi with curriculum mapping activities is a potential first step to engaging in the type of outreach that is required. The authors can also use this study as a baseline to further track the growth of DL competencies being taught in these departments. Curriculum Data Deep Dive JeSLIB 2020; 9(1): e1169 https://doi.org/10.7191/jeslib.2020.1169

Conclusion
While this study's findings are perhaps most valuable to the University of Nevada, Reno Libraries and the liaison librarians who work with the departments the syllabi came from, they do highlight gaps in DL instruction that are likely common at other research institutions-namely that not enough attention is being paid to data management, and, to some extent, the "big picture" of data. Although DL was most present in the STEM fields under study, it was evident throughout the social science disciplines as well.
Even though they are time-consuming, syllabus review projects such as this one can help librarians see the extent of DL education at their institutions and within their disciplines so that they can tailor their outreach efforts to address in-demand data skills as well as any gaps in the curriculum. Future studies might examine how DL needs evolve over time or include a much wider range of disciplines and institutions so that any insight gained could be applied more broadly. Other future work could combine a syllabi analysis with faculty interviews to clarify assignment descriptions and to better understand DL gaps within courses.
As the creation, manipulation, analysis, and use of research data grows in prominence, DL will capture the attention of more and more librarians. DL frameworks, such as the one developed by Prado and Marzal, can help librarians address DL more holistically by defining the core competencies of a data-savvy learner. Librarians can also use one of the frameworks to guide discussions with instructors regarding the library's role in DL instruction. A formal document has the potential to increase faculty buy-in. Many different courses throughout the disciplines incorporate data to some degree, presenting librarians with an opportunity to serve their communities in new ways. To ensure that they capitalize on this, a group such as ACRL or Research Data Access and Preservation (RDAP) could develop and adopt a universally recognized DL.

Data Availability
Data associated with this article is located in an institutional Google Drive and is shareable upon request.