Data Infrastructure and Local Stakeholder Engagement with Biodiversity Conservation Research

Biodiversity research that informs conservation action is increasingly data intensive. Cutting-edge projects at large institutions use massive aggregated datasets to build dynamic models and conduct novel analyses of natural systems. Most of these research institutions are geographically distant from the highest-priority conservation areas, which are found in South America, Africa, and Southeast Asia. There, data is typically collected by or with the help of local residents hired as field assistants. These field assistants have few meaningful opportunities to participate in biodiversity research and conservation beyond data logging. The literature indicates the data revolution has increased demand for impersonal and integrated large-scale systems that aggregate biodiversity data across sources with minimal friction. In this study, interviews were conducted with six active conservation workers to identify elements of these data systems that create barriers to field assistants’ engagement with the projects they make possible. As both creators and consumers of data, all six relayed frustration with various aspects of their data workflows. Regarding field assistant interaction with digital data systems, they observed that their field assistants engaged only at the initial point of data entry or not at all. Some suggested mobile apps as a good solution for field data collection. Correspondence: Ali Krzton: alk0043@auburn.edu


Introduction
Global biodiversity, which lends integrity to the ecosystems that make human life possible, is in sharp decline. Estimates of the rates of species loss indicate that the sixth mass-extinction event in Earth's history has begun (Ceballos et al. 2015). Successful conservation of remaining species depends on a globally-coordinated interdisciplinary research effort. Fortunately, the data revolution has made that possible. Biodiversity conservation research has been transformed by the assemblage of worldwide aggregated datasets, permitting new analytical methods that take advantage of "big data"; examples include the Global Biodiversity Information Facility (GBIF; https://www.gbif.org), the Knowledge Network for Biocomplexity (KNB; https://knb.ecoinformatics.org), and the Map of Life (MOL; https://mol.org).
In contrast to dramatic changes in the pace and scale of biodiversity conservation research, the realities of field data collection remain largely the same. It is still necessary for trained individuals to venture out into natural areas and do surveys, observe plants and animals, and record other vital information about the environment. These field workers often collect heterogenous data using a variety of tools; just one study of prairie fens in Michigan involved GPS units, digital cameras, audio recorders, and paper notebooks (Hackett 2019). Despite advances in remote sensing, data-driven biodiversity research requires more in-person environmental monitoring than ever. The human effort dedicated to these projects must be respected and rewarded in the interests of both sound science and ethical behavior. However, biodiversity conservation research has historically fallen short in this regard, owing to the geographic, social, and cultural distance between the scientists who design studies, analyze data, and publish papers and the residents of the areas under study who enable data collection.
While ecosystems deserving of protection and study are found all over the world, the zones of highest priority for conserving biodiversity are found in South America, Southeast Asia, and Africa (Powers and Jetz 2019). Since regional scientific capacity still trails countries in North America, Europe, and East Asia in terms of research infrastructure and productivity, studies conducted in critical conservation areas are typically done by foreign researchers (Perez and Hogan 2018). These projects cannot succeed without the cooperation of local residents who are hired or volunteer as field assistants ( Figure 1). Field assistants are usually agriculturalists, herders, or others with intimate knowledge of the local landscape. When they are appropriately engaged as stakeholders in the research process, they can become fierce advocates for conservation, as in the case of former hunters who now protect Yunnan snub-nosed monkeys. (Long 2017). Unfortunately, researchers can also exploit local assistants and dismiss their contributions. Biodiversity conservation initiatives that are implemented without consulting residents are counterproductive, straining research assistants' relationships with their community or directly harming their livelihood.
Aspects of modern data workflows may contribute to the alienation of field assistants from the projects they support and the ecosystems they protect when systems used for analysis and reporting are outside of their frame of reference. When local people are not included in discussions based on information shared through these systems, the data they collect effectively disappears. This worsens miscommunication and mistrust between communities living in biodiverse areas and authorities who set policies on land use, resource extraction, and wildlife protection. This study explores how data infrastructure could be changed to facilitate Data and Local Engagement with Biodiversity Conservation JeSLIB 2019;8(2): e1174 doi:10.7191/jeslib.2019.1174 field assistants' participation in research beyond providing data. First, the current data landscape in biodiversity conservation is described using the existing literature. Qualitative data from informal interviews with several conservation professionals on the topic of data in biodiversity research are then reported. Finally, this information is synthesized to design more inclusive data practices and provide recommendations for improving the engagement of local field assistants, and their communities, with biodiversity research.

Literature Review
Traditional field studies, especially longitudinal ones, require years of consistent effort. At the same time, the data revolution has dramatically increased the pace of biodiversity research, creating sustained demand for original data that contributes to model-building. This is especially true when huge datasets must be analyzed quickly in the service of a particular problem or policy goal (White et al. 2015). Transformative projects like the Map of Life facilitate actionable conservation forecasts while preserving transparency and public engagement (Jetz, McPherson, and Guralnick 2012;Powers and Jetz 2019), but they depend on a web of distributed and interdependent data sources. The most trusted, high-quality data products rely on ongoing, dedicated data management efforts. Given the potential and the precarity of this data infrastructure, researchers are eager to discover efficiencies. Recently, major conservation stakeholders, e.g. the IUCN (Lacher, Boitani, and da Fonseca 2012), have targeted interoperability and access as two major areas where data infrastructure improvements would accelerate scientific progress.

Interoperability
Field data collection in the service of biodiversity conservation is frequently difficult and expensive; the resultant data products are complex, spanning a variety of types, formats, and spatial and temporal scales. For these reasons, there has been a shift towards born-digital field data wherever possible. Advantages of born-digital environmental data include improved accuracy, reduced cost, and ready compatibility with remotely-sensed landscape data (Travaini et al. 2007). As processing workflows for hand-collected data can be complicated and errorprone (Hackett 2019), reducing the steps needed to prepare data for analysis is desirable. Digital data formatted and organized at the time of collection can also be more easily ingested into other systems. One example is Cornell University's eBird project, which has contributed substantially to the GBIF (Groom, Weatherdon, and Geijzendorffer 2017).
Although mechanical data loggers and weather-resistant laptops have been used in field research for years, apps that run on smartphones, tablets, and other mobile hardware are now preferred for biodiversity monitoring. Mobile apps enable crowdsourcing of data on a scale that was once impossible by allowing "citizen scientists" to volunteer their time documenting biodiversity in their own neighborhoods. In wealthier countries, apps such as iNaturalist (https://www.inaturalist.org) and Map of Life (https://mol.org/mobile#) have proved popular with amateur observers. Icon-based apps have predominated in places where foreign scientists had to cope with language barriers or a lack of literacy on the part of their field assistants. Examples of successful deployment of icon-based apps include indigenous groups monitoring fisheries in the Brazilian Amazon (Oviedo and Bursztyn 2017) and residents of Congo discovering and reporting an outbreak of Ebola virus in wildlife before it spread to humans (Liebenberg et al. 2017). CyberTracker (https://www.cybertracker.org), an open-source project, allows researchers to build Android apps customized for particular environments and research goals. Data sourced via CyberTracker interfaces can also be automatically uploaded to remote servers.
The focus on maximal interoperability of data is a consequence of the increasing centralization of data repositories. Mandatory data publication in trusted central repositories has been hailed as a solution to patchwork communication of results via informal sharing networks (Costello et al. 2013). Large, aggregated datasets in standard machine-readable formats permit crawling and automated analysis, allowing gaps in species coverage to be more readily detected (Costello, Vanhoorne, and Appeltans 2015) and fine-grained global forecasts to be constructed (Powers and Jetz 2019). However, this approach risks the loss of important context as data is standardized to facilitate aggregation (Ganzevoort et al. 2017). Disagreements about terminology and classification can also hinder interoperability if large repositories or aggregators try to enforce a single, inflexible scheme on data providers (Franz and Sterner 2018).

Access
Owing to significant investment in their creation and their potential to be utilized in unforeseen ways, raw datasets are increasingly valuable in and of themselves, leading to concerns about their accessibility and long-term preservation (Costello et al. 2013;Renaut et al. 2018). Some biodiversity research is done by academics, who are motivated to publish peer-reviewed articles to advance their careers. Data-sharing mandates from journals, to the extent that they are enforced, can improve access to their results (Sholler et al. 2019). However, authors are reluctant to release data prior to formal publication (Huang et al. 2012), creating an uncertain future for data from unpublished studies.
The balance of environmental research is primarily carried out by government scientists or professionals working with conservation non-governmental organizations (NGOs). The results may be compiled into reports that are not released publicly or are technically available but difficult to find, a corpus referred to as gray literature. These kinds of access problems have hampered conservation planning for species such as the red panda (Thapa, Hu, and Wei 2018). While government data has become increasingly open as its problem-solving potential is acknowledged (De Giusti et al. 2017), agencies participate unequally and practices vary widely among countries. Conservation NGOs have also been known to withhold raw data (Groom, Weatherdon, and Geijzendorffer 2017), both to retain control and to raise additional funds.
Prompt online publication of stand-alone conservation data arguably mitigates such barriers to access (Costello, Vanhoorne, and Appeltans 2015). Online portals for biodiversity data offered by GBIF, KNB, MOL, and many others are designed to allow on-demand access to any interested party, but aggregate datasets introduce the problem of heterogenous license terms from different providers. The volunteers who make crowdsourcing a viable data collection strategy frequently place restrictive terms on their contributions (Groom, Weatherdon, and Geijzendorffer 2017). Even volunteers who collect data with the understanding that it is a public good are skeptical of unrestricted sharing, with many concerned about data usage that is inconsistent with the purpose (conservation) for which it was collected (Ganzevoort et al. 2017).
Discussions surrounding data ownership and ethical use have not kept pace with the sheer volume of data made available online. As a result, questions of access are less dependent on the capabilities of the data infrastructure than they are on the culture of research and conservation. Investigations of data collectors' data literacy and access to aggregate data products have so far focused on citizen scientists who participate as a hobby, not local assistants hired for field work in biodiverse areas that are relatively less developed.

Methods
To assess the impact of trends in the data infrastructure on communities in biodiverse areas that host outside researchers for ecological and conservation studies, professionals with the relevant experience were recruited to discuss the topic. Potential interviewees were selected based on the following criteria: 1) must be currently active in biodiversity conservation as researchers or practitioners, 2) have conducted long-term research (at least one year) in a remote location, and 3) have personally cooperated with local assistants to collect field data.
From January to June of 2019, conservation professionals who fit the above criteria were selected opportunistically and invited to share their thoughts on how changes in the data infrastructure affected their resident collaborators. A semi-structured interview with each participant who agreed to talk was conducted over the phone, in person, or via email.
Interviewees were asked about their data collection experiences in the field, what they had observed regarding local people's interactions with the data, and their thoughts about the data infrastructure of biodiversity conservation more generally.
In total, six conservation workers were interviewed. Participants represented a range of career stages, from junior program scientists to leaders in international conservation NGOs. Half had academic appointments, while the other half had non-academic professional roles. Most of the interviewees worked in East or Southeast Asia, but two also had experience working in the Americas, Africa, and Oceania. Due to the small sample size and the differences between written versus spoken interviews, systematic analysis of the responses was inappropriate. Instead, the interviews were studied qualitatively to gain insight into how local people and conservation professionals interact with data, and to generate ideas to improve data systems for the benefit of residents of biodiverse areas.

Results
All six conservation workers expressed frustration with data exchange using the systems currently in place ( When asked specifically about local field assistants' interaction with the data infrastructure, five interviewees agreed that data workflows were a barrier to local people's engagement with research and should be improved, while the sixth did not know (Yu 2019). Each of the five who critiqued these workflows focused on a different problem. Li (2019) said that field assistants did not know the purpose of the studies for which they were collecting data and that data collection forms did not capture important information about natural conditions, remarking, "It is rarely considered why the form should be formulated in this way." Fuentes (2019) thought that predetermined data classification schemes could get in the way of rigor and genuine understanding, suggesting that ideal tools would translate between local languages/folk taxonomies and standard terminologies. Kirkpatrick (2019) was concerned that most field assistants lacked the general education level to effectively interact with data once it was within a system, which also posed a problem for scientific accuracy as they could not use their experience to reality-check researchers' results and interpretations at an early stage. Lepczyk (2019) described a mismatch between questions of interest to local residents and the questions that outside researchers choose to study, with the latter driving the design of data systems. Finally, Cheng (2019) said that even when data exchange was reciprocal and reports or maps were shared with local communities, the scientific perspective itself hindered communication because they were so unfamiliar with it.
Regarding access, all six interviewees mentioned difficulties obtaining data and reports from other providers (for example, government bodies, conservation organizations, or individual researchers) when needed, in one case even when the work was done within the United States (Lepczyk 2019). When data was available, three acknowledged the difficulty of analyzing and interpreting it, expressing concern that it would not mean much to local nonspecialists (Cheng 2019;Kirkpatrick;Li 2019). Li (2019) noted that processing data is difficult even for those with advanced degrees. Lepczyk (2019) mentioned efforts by university professors to help laypeople analyze environmental data, but the example given was in a US context. Two suggested that regional residents could be sent to attend nearby universities or go overseas, depending on funding, but either way more people living in the area needed scientific degrees and training to access the data infrastructure (Kirkpatrick 2019;Lepczyk 2019). Finally, three interviewees explicitly raised the issue of justice in data collection, emphasizing the need to recognize local people's efforts and refrain from appropriating community knowledge. One respondent's conservation organization was trying to identify rewards, material and otherwise, that were meaningful to field assistants in order to compensate them fairly. This required both active listening and creative problem solving, as needs changed from year to year and were not always what researchers expected (Cheng 2019). To affirm the harmony between good science and equitable treatment, Kirkpatrick (2019) asked rhetorically, "Can data, collected through oppressive means, ever really be 'right'?" Lepczyk (2019) warned against scientists becoming "ecological colonialists" and reinforced the importance of helping qualified residents of countries with less research infrastructure get PhDs.

Discussion
On the basis of information shared by the six conservation workers, improved interoperability and access within the data infrastructure has not significantly changed how field assistants relate to the data they collect. In fact, to the extent that the system design is outside of their experience, exclusively digital data workflows may worsen the disconnect between field assistants and the rest of the research process. Apps on mobile devices undoubtedly provide advantages in data collection, improving speed and delivering pre-formatted, standardized data, but the experiences of the interviewees suggest that researchers realize most of these benefits, rather than field assistants.
Interoperability facilitated by born-digital data and integrations between providers is intended to increase access to data, especially resources hosted remotely. If, as indicated in both the interviews and the literature, researchers themselves still have problems accessing data, it is unreasonable to think that releasing data via apps or web portals meets the obligation to share knowledge back to residents of high-priority conservation areas. The "citizen scientists" who make use of these tools in wealthier countries have a different demographic profile than local field assistants, particularly in age and education level (Ganzevoort et al. 2017); most are not only digitally literate, but also positioned to hold institutions accountable if they fail to live up to open data mandates. What constitutes access for them does not translate to access for field assistants living more traditionally. This is not an argument against training field assistants to collect data with mobile apps, or processing data to be cross-compatible at the point of input, or any other innovation that integrates raw biodiversity data more tightly with the rest of the infrastructure. These practices can enhance research and conservation. When it comes to engaging field assistants in the projects they support, however, a recurring theme from the interviews was that technology alone cannot solve this problem. Instead, researchers must take the initiative to unlock the value of data for residents of the landscapes they hope to conserve. The three strategies recommended below provide a vision for such a shift while specifying concrete indicators of improvement.

Support a bi-directional flow of information between field assistants and researchers
Field assistants should not be passive contributors to a wholly unfamiliar data infrastructure. Instead, researchers should seek their input on how data is collected from the start. As Kirkpatrick (2019) noted, field assistants' experience with the species they observe has the potential to improve initial data quality. Researchers, including graduate students who go to the field, should make it a point to demystify all data collection instruments. Checking for at least a basic understanding of why each field on a form or in an app is there would help to address Li's (2019) concern. Another critical question for field assistants is whether they believe any important information is being left out. There is no sense asking for feedback if suggestions will be ignored, however, so researchers should first reflect on whether their data collection instrument truly cannot be improved. When choosing mobile apps for data collection, ease of interface modification while in the field should be an important consideration.
In order for field assistants to give meaningful feedback, they should know the purpose of data collection in a general sense, both from the standpoint of the project and for themselves. Bidirectional information exchange entails identifying data relevant to local people's concerns and communicating it back to them in a meaningful way. Internal commitments, such as connection to nature and desire to protect the landscape, are effective motivators for data collectors (Ganzevoort et al. 2017), a point also raised by three of the conservation workers, so researchers should find connections between field assistants' values and what the data can tell them. Discussing preliminary conclusions from "first pass" data that has been minimally cleaned up and organized provides an opportunity to do this. This also gives field assistants an opportunity to catch unrealistic or highly unlikely interpretations early so that data can be re-checked.

Foster project ownership by reinforcing the connection between data and decision-making
Researchers can show field assistants and their communities the practical value of data by explicitly tying it to decision-making. When residents of biodiverse areas have no connection to the policy development process, they have no reason to support the resulting conservation plans, especially if they interfere with subsistence and economic activity. Researchers must be willing to translate data products, be they analyses or reports, so they are intelligible to local people. Cheng (2019) shared an example of this in practice, as communities that monitor wildlife for the research team are given maps that result from their work. Maps and other data visualizations can help researchers overcome linguistic barriers and a lack of scientific background in their audience.
Most importantly, researchers should explain that conclusions drawn from the data translate into conservation action. Valencia Gunder, a community activist from Miami, discovered the powerful role of data in policy after she used it to prove to a city planning committee that poor households were being systematically cleared out of neighborhoods on flood-resistant higher ground; before she had data, she said, she was ignored (Gunder 2019). Conservation workers should find creative ways to share this insight with field assistants and their communities.
Maintaining a local focus is also key. Data collectors want their work to support conservation initiatives where they live and are less interested in feeding information to large-scale models or contributing to abstract discoveries (Groom, Weatherdon, and Geijzendorffer 2017). Moreover, community engagement is necessary for long-term biodiversity protection. Conservation initiatives throughout Africa made substantial progress once local communities were given a stake in data collection and project design (Abrams et al. 2009). A research team studying Sanje mangabeys in Tanzania became more effective once they used their data to educate nearby residents about the primate, then made a point to include local people in monitoring efforts (Fernandez, Ehardt, and McCabe 2019).

Build a human network around the data network
Several of the conservation workers interviewed expressed concern that there were too many obstacles between field assistants and the data infrastructure, but they also questioned whether that was the correct way to frame the problem. Instead, they suggested a slower, more long-term process of building scientific and data literacy in the populations from which field assistants are drawn. Unfortunately, countries in the Global South, including those where the conservation of biodiversity is a priority, still trail the Global North in analyzing data, building models, and publishing papers (Habel et al. 2014;Malhado et al. 2014). Open access to research, including datasets, has helped to narrow the gap, but not enough. The interviewees' experiences with field assistants suggest one possible reason.
Impersonal interaction with data infrastructure is not normal or natural for most of the world. Researchers and educated citizen scientists may be able to pick up tasks like cleaning their data or finding datasets of interest through an online portal with minimal training, but those tasks are far removed from the life experience of typical field assistants. Some who are younger could pursue formal education and bring research skills back to their community, but not everybody who participates in science needs to be a scientist. Instead of expecting local people to conform to the conventions of big data, which emphasize instant and machine-driven connection to information, researchers should accept that for most, meaningful access to data is not possible unless it is mediated and human-centric. Relationships with those who are data literate or otherwise comfortable navigating digital systems will be the preferred mode of engagement for most local people. Therefore, capacity building for biodiversity conservation should not neglect embedding such individuals within the community or, better yet, training local young people to fill that role.

Conclusion
Fast action is required to conserve the world's remaining biodiversity, but doing the wrong thing can be worse than doing nothing. Powerful new models made possible by the data revolution can propagate errors quickly if the quality of data inputs is not monitored. When it comes to field collection of biodiversity data, inclusive practices that grow human networks alongside data networks improve scientific accuracy by reducing bias and providing context to information. Just as important, knowledge empowers local communities to participate in conservation planning on a more equal footing, reducing the chances that landscapes will be managed without regard for the well-being of their human populations.