Data Curation Implications of Qualitative Data Reuse and Big Social Research

Objective: Big social data (such as social media and blogs) and archived qualitative data (such as interview transcripts, field notebooks, and diaries) are similar, but their respective communities of practice are under-connected. This paper explores shared challenges in qualitative data reuse and big social research and identifies implications for data curation.

Methods: This paper uses a broad literature search and inductive coding of 300 articles relating to qualitative data reuse and big social research. The literature review produces six key challenges relating to data use and reuse that are present in both qualitative data reuse and big social research—context, data quality, data comparability, informed consent, privacy & confidentiality, and intellectual property & data ownership.

Results: This paper explores six key challenges related to data use and reuse for qualitative data and big social research and discusses their implications for data curation practices.

Conclusions: Data curators can benefit from understanding these six key challenges and examining data curation implications. Data curation implications from these challenges include strategies for: providing clear documentation; linking and combining datasets; supporting trustworthy repositories; using and advocating for metadata standards; discussing alternative consent strategies with researchers and IRBs; understanding and supporting deidentification challenges; supporting restricted access for data; creating data use agreements; supporting rights management and data licensing; developing and supporting alternative archiving strategies. Considering these data curation implications will help data curators support sounder practices for both qualitative data reuse and big social research.


Introduction
Big social data (such as social media and blogs) and archived qualitative data (such as interview transcripts, field notebooks, and diaries) are similar, but their respective communities of practice are under-connected. Research with both types of data repurpose existing data to advance discoveries in social science. However, despite these similarities, big social research has not yet been widely framed as a form of qualitative data reuse, and qualitative data reuse has only begun to be discussed through a big social data lens. This paper explores six key issues that are present in both big social research and qualitative data reuse, and outlines implications for data curation practices related to each issue. This paper suggests that by understanding shared challenges and data curation implications, these communities of practice-and the data curators who work with them-can inform each other for mutual benefit.

Defining qualitative data reuse and big social research
This paper investigates the similarities between qualitative data and big social data, aiming to provide guidance for data curators to make connections between these types of data, thus enhancing our practice. This section defines qualitative data reuse and big social data research, then highlights the similarities between these definitions.

Qualitative data reuse
A key defining element of qualitative data is that they are non-numeric, although they may be analyzed to produce numeric results such as code counts and statistics (Kitchin 2014;DuBois, Strait, and Walsh 2018;Greener 2011). There are four main strategies for conducting qualitative research: 1. Unstructured, relatively open-ended interactions or information gathered from respondents, resulting in data such as solicited diaries and focus group videos.
2. Structured interviews or information solicited from respondents, resulting in data such as interview transcripts, and questionnaire responses.
3. Direct observations of behavior and environments, resulting in data such as field notes and observational records.
4. Examination of existing data such as autobiographies, found diaries, correspondence, historical documents, photographs, and home videos (Bernard et al. 1986).
The above list suggests that qualitative data can be defined by the process of creating or collecting them-that is, qualitative data are produced by qualitative research (Heaton 2004;Bernard, Wutich, and Ryan 2017).
For the purpose of this paper, taking into account definitions by Bernard (1986), Corti (1999), and Heaton (2004), I define qualitative data as follows: Qualitative data are physical objects, images, sounds, moving images, and texts that are collected and analyzed by researchers for the purpose of qualitative analysis.
The term "secondary analysis" has been used since the mid-20th century to describe a research methodology using existing data, with its earliest definitions encompassing both quantitative and qualitative data. Thorne defines qualitative secondary analysis as "the reexamination of one or more existing qualitatively derived data sets in order to pursue research questions that are distinct from those of the original inquiries" (Thorne 2004). When researchers use archived qualitative data, they repurpose previously created data, introducing new contexts, asking new research questions, and potentially gathering new data to augment the archived data. As data sharing and data publication become more common practice, the focus is not necessarily on the distinct methodology of secondary analysis, but rather on the idea of data reuse for future research of many different types. Scholars have therefore begun to increasingly use the broader term "data reuse." In 2017, Bishop and Kuula-Luumi suggest that "reuse provides an opportunity to study the raw materials of past research projects to gain methodological and substantive insights" (2017). van de Sandt et al. take a very broad view of reuse, concluding that reuse can be seen as equal to use. They define reuse as "the use of any research resource regardless of when it is used, the purpose, the characteristics of the data and its user" (2019).
Drawing on the preceding literature, this paper suggests the following working definition for qualitative data reuse: Qualitative data reuse is when researchers use existing qualitative data to gain new insights and produce new scholarship.

Big social research
Big social data are data derived from social media or other online environments where people share, contribute, and connect with one another. Big social data can reflect direct human interaction-usually unstructured or semi-structured data such as text, videos, and audio that are created and shared online (Olshannikova et al. 2017), or it can reflect indirect human interaction-usually structured metadata that reflects user behavior such as interactions with interfaces, or the spatial or temporal aspects of user behavior (Gandomi and Haider 2015).
Big social data can come in several formats: • Digital self-representation data: Login data, profile pictures, biographical information • Social interaction data: timeline posts, online forum posts, content sharing, commenting, direct messaging Data Curation Implications of Qualitative Data JeSLIB 2021; 10(4): e1218 https://doi.org/10. 7191/jeslib.2021.1218 • Digital relationships data: Follower/following data, "likes" • Metadata: Timestamps, geospatial data, type of operating system, type of device, application used to post (Adapted from Olshannikova et al. 2017) It is possible to use social media to recruit participants-conducting online ethnographies or directly contacting interview subjects via social media. However, this paper focuses on the use of big social data that is available online through web scraping, API access, or other methods that don't require direct contact with individual people.
Big social data research is most often conducted using computational social science methods. Computational social science blends theory and practice from computer science, statistics, and the social sciences, using computational methods to conduct research inquiry about society (Mason, Vaughan, and Wallach 2014, 257). Computational social science began in the 2000s, and uses methods such as topic modeling, sentiment analysis, network analysis, artificial intelligence, and deep learning techniques to support drawing conclusions from large corpora of text (Bankes, Lempert, and Popper 2002;Berkout, Cathey, and Kellum 2019).
Drawing from the preceding literature, this paper suggests the following working definition for big social research: Big social research is when researchers collect existing data from social media or other online social environments to gain insights and produce scholarship.
Qualitative data reuse and big social research: the connection As illustrated above, qualitative data reuse and big social research are distinct in terms of data sources and methods of data analysis, but the two types of data also share key similarities that have implications for data curation. This paper draws upon the above definitions of qualitative data reuse and big social research: Qualitative data reuse is when researchers use existing qualitative data to gain new insights and produce new scholarship.
Big social research is when researchers use existing data from social media or other online social environments to gain insights and produce scholarship.
These definitions help to illustrate the connection between qualitative data reuse and big social research. Both types of scholarship take data that has been created for one purpose, then repurpose the data to gain insights and produce scholarship. This paper highlights this similarity in its discussion of shared challenges. Data Curation Implications of Qualitative Data JeSLIB 2021; 10(4): e1218 https://doi.org/10.7191/jeslib.2021.1218

Methods
Using the methods outlined by Creswell (2009)

Search and selection
For the literature search, I searched the library catalog and online databases using the following strings: • "qualitative secondary analysis" • "qualitative data reuse" • "qualitative data archiving" • "social media data" • "social media data archiving" • "big social data" While reviewing initial articles, I identified further reading through backward and forward citation chaining (C. Cooper et al. 2017;Hu, Rousseau, and Chen 2011), a process of reviewing literature that have been cited in a particular article, as well as reviewing literature that cites that particular article. Articles were limited to those published in English.
I organized and coded approximately 300 articles. Publication dates ranged from 1934 to present, with most articles occurring in the past 30 years for qualitative data reuse, and the past 20 years for big social research.

Coding
I coded each article according to key themes, inductively creating the themes using Grounded Theory's constant comparative method (Glaser and Strauss 1967). My coding focused on (1) research objectives and methods; (2) discussions of theory, including epistemological and ethical issues; and (3) data curation practices. I focused on common themes between the literature on qualitative data reuse and the literature on big social data. Six central issues emerged in common between qualitative data reuse and big social data research-context, data comparability, data quality, informed consent, privacy and confidentiality, and intellectual property.

Benefits of data sharing
A key idea running through the literature is the emerging consensus that data sharing is beneficial to science and society. Benefits of data sharing can be grouped into three key categories: scientific, moral, and economic benefits (Mauthner 2012). Scientific benefits include building new knowledge, new hypotheses, new methodologies, comparative research, or strengthening existing theories; promoting interdisciplinary use of data; increasing citations and scholarly impact; and providing data for the purpose of teaching students. Moral benefits include reducing burden on research subjects; facilitating more research about rare, hard-to-reach, or inaccessible respondents; and supporting transparency and accountability-in order to foster trust from the public and other researchers and to share the results of public research funding. Economic benefits include conserving time and resources, therefore supporting a higher return on investment. Each of these benefits have been further discussed in the literature (e.g., Piwowar et al. 2008;Logan, Hart, and Schatschneider 2021;Levenstein and Lyle 2018;Fienberg, Martin, and Straf 1985).

Challenges and implications for data curation
This paper discusses six key challenges relating to data use and reuse that are present in both qualitative data reuse and big social data research, and discusses data curation implications.
Below, I describe each of the challenges. For each challenge, I also outline data curation implications that are discussed in the literature.

Context
Issues of context are similar for both qualitative data reuse and big social research. For both types of research, there is concern that data may not be able to be properly understood outside of their original context. When considering reuse of qualitative data, concerns center around whether data can be meaningful without the knowledge and expertise of the researchers who conducted the original research project. As Pasquetto, Borgman, and Wofford write, "removing data from their original context necessarily involves information loss" (2019, 23). Such loss includes small adjustments that may be made to the data during research, deep knowledge of the research that data creators hold but may not be able to communicate in a dataset description, and the de-contextualizing effects of deidentification efforts (Mauthner, Parry, and Backett-Milburn 1998;Fielding and Fielding 2000;Dale, Arber, and Procter 1988).
When conducting big social research, data often take the form of photos, videos, or short pieces of text, drawn from a larger context of personal and public life (Törnberg and Törnberg 2018;Boyd and Crawford 2012). This out-of-context effect is only compounded when data are amassed at a large scale. For big social Data Curation Implications of Qualitative Data JeSLIB 2021; 10(4): e1218 https://doi.org/10. 7191/jeslib.2021.1218 research, the researcher may never speak to the people who created the data, know their identities, or be aware of other broader contexts. Marwick and Boyd (2011) also refer to a "context collapse" in big social research, in which multiple audiences are flattened into one, making the context and viewpoint of big social research difficult to discern-to whom is a user speaking when they post on social media? This context collapse can also apply to archived qualitative data-while the original audience and context are generally more concrete, when qualitative data are published openly, the future audience is unknown.
For both big social research and qualitative data reuse, the literature suggests that full context and meaning may never be accurately understood by qualitative data reusers/big data researchers. However, using data curation strategies to communicate as much context as possible can help support meaningful data use and reuse.

Clear documentation
• For qualitative data: Data curators can encourage contextual documentation throughout the research process, to be published alongside qualitative data. This could include documentation about research methods and practices, consent form, IRB approval numbers information about the selection of interview subjects and interview setting, instructions given to interviewers, data collection instruments, steps taken to remove direct identifiers in the data, problems that arose during the selection and/or interview process and how they were handled, and interview roster (ICPSR 2012).
• For big social research: Data curators can encourage as much documentation as possible of the methods, communities, and platforms. Context can also be communicated through metadata such as geolocation, @-mentions, or hashtags.
• Initiatives such as Annotation for Transparent Inquiry (Karcher and Weber 2019), Open Context (Kansa and Kansa 2018), and the Data Curation Network (Johnston et al. 2018) all support researchers and data repositories in creating documentation to encourage contextual integrity for data reuse.
Archiving related data • Repositories may also choose to archive (or link to archived versions of) web URLs, images, and other resources (Thomson and Beagrie 2016).

Data quality and trustworthiness
While challenges related to data quality exist for both qualitative data reuse and big social research, the challenges are relatively distinct for each type of data. For Data Curation Implications of Qualitative Data JeSLIB 2021; 10(4): e1218 https://doi.org/10. 7191/jeslib.2021.1218 qualitative data, quality issues often relate to human error. Humans throughout the process could introduce errors through simple mistakes and inaccuracies. Errors can come at various stages in the research-from research subjects, reporters or recorders of field data, researchers, and data coders (Sherif 2018).
Data quality issues for big social research have additional complexities that introduce different types of errors. Because of the automated nature of data collection and analysis, there are fewer opportunities for simple mistakes in these phases. However, quality issues can result from the element of self-performance that is more present in big social research-users are not speaking directly to the researcher, but rather to a perceived online community (Hogan 2010;Manovich 2012). Other quality issues can result from the specific environment of online social platforms. Multiple accounts from one user, fake accounts, and bots can all introduce errors, bias, and distortion (Marwick and Boyd 2011;Shah, Cappella, and Neuman 2015;Varol et al. 2017). Additionally, big social data sampling is often biased because social media APIs may not return complete data, and users of social media platforms may not be representative of society as a whole (Burgess and Bruns 2012), and some social media platforms such as Facebook and Twitter tend to be overrepresented in big social research due to ease of access (Zimmer and Proferes 2014;Stoycheff et al. 2017) For both types of data, systematic errors can be introduced as a result of bias, and when scaling up by reusing qualitative data, combining datasets, or collecting big social data, these errors can compound (Bernard et al. 1986;Morstatter and Liu 2017;M. Hammersley and Gomm 1997;Hargittai 2015). While data curation is not a simple solution to these challenges, clear documentation, use of trustworthy repositories, and linking to related datasets are all discussed in the literature as strategies to support data quality and trustworthiness.

Clear documentation
• Data curators can support documentation of the research process when sharing data, including documenting any potential bias, errors, or missing data.

Trustworthy repositories
• Data repositories and academic libraries can contribute to data quality and trustworthiness by supporting data management, curation, and metadata (Frank et al. 2017;Giarlo 2013; Yoon and Lee 2019).
• Trust in data can be enhanced by trust in the repository where it is archived.
• Data curators may also refer to the recently developed TRUST Principles, which are designed to complement the FAIR Principles to support trustworthy data stewardship for archived data (Lin et al. 2020).

Related and combined datasets
• Some researchers have attempted to create more representative datasets by blending big social data with smaller social datasets, a strategy that helps include a broader range of perspectives than are present in a single dataset (Croeser and Highfield 2020). Data curators could provide links between related datasets to support future use. However, combining datasets comes with its own set of challenges (see Data comparability, below).

Data comparability
When combining qualitative and big social datasets, researchers must determine whether each dataset can be understood to be applicable to another-also referred to as data "fit." Because qualitative research tends to produce data sets that are relatively unstructured, complex, and heterogenous (Heaton 2004), it can be difficult to combine multiple qualitative datasets. Researchers can assess the comparability (or "fit") of the data by (1) identifying the extent of missing data; (2) identifying convergence of primary and secondary research questions; (3) assessing the methods used to produce the primary data (Heaton 2004;Hinds, Vogel, and Clarke-Steffen 1997;Thorne 1994).
Comparability of big social data is additionally affected by the issue of metadata interoperability. While standardized metadata such as Data Documentation Initiative (DDI Alliance 2019) are fairly commonly used for qualitative data, big social datasets have less standardized metadata. Social media platforms may use different metadata schemas, and it can be difficult and time-consuming to combine multiple big social datasets if the metadata are not interoperable. As Acker and Kriesberg note, "there are no data models for cross-walking or mapping like-with-like across platforms, for example a tweet, a Facebook post and a YouTube video that all link to the same content or event such as a townhall livefeed" (2017, 7). While the proprietary nature of many social media platforms may continue to impede metadata interoperability, there are some models for unified metadata schemas such as Schema.org (W3 2021) that could inspire similar community efforts for social media.
Comparability is an especially important issue for both qualitative data reuse and big social research. For both types of data, combining multiple datasets can support larger-scale studies, which is a particular focus for qualitative data, but can apply to both; combining data can be used as a strategy to better understand context and to enhance data quality, which is a particular focus for big social Data Curation Implications of Qualitative Data JeSLIB 2021; 10(4): e1218 https://doi.org/10. 7191/jeslib.2021.1218 research, but can apply to both (see Data quality, above). The literature suggests that data curators can support data comparability by helping researchers create clear documentation, and by advocating for interoperable metadata standards.

Data comparability: Data curation implications
Clear documentation • For both qualitative data reuse and big social research, data curators can support comparability by encouraging researchers who publish data to include clear documentation to address missing data, research questions, and methods.

Metadata standards
• For both types of data, data curators can adapt existing standards such as DDI to support better data comparability (DDI Alliance 2019).
• The research and data curation communities can advocate for interoperable metadata standards that can be adopted by social media platforms themselves, potentially including existing models such as Schema.org metadata (W3 2021).

Informed consent
The issue of informed consent is similar with qualitative data reuse and big social research. In the case of shared qualitative data, some researchers are now including consent for data sharing and archiving in consent agreements. In fact, the 2019 revision of the Common Rule includes the idea of broad consent, in which participants agree to "future storage, maintenance, or research uses" of their data (U.S. Department of Health and Human Services 2017), and some IRBs now suggest language to support data reuse (Cornell Research Services 2019;Elman, Kapiszewski, and Lupia 2018). However, broad consent is not a perfect solution, especially when viewed through the lens of feminist and post-colonial theories, which consider power structures between researchers and research subjects. There is concern that broad consent could expose respondents to risk and reduce their agency, since the data may be used to ask any number of future research questions (Mauthner and Parry 2013). Tiered consent models could provide a middle ground, supporting more granular consent options than broad consent. In the tiered consent model, participants are given choices about the specifics of data sharing. For instance, a consent form could allow participants to opt out of sharing any of their data-while still participating in the study; the consent form could give participants the option to share only a subset of their data; or the consent form could allow participants to share their data only with reusers who meet certain criteria (Meyer 2018).
In the case of big social research, social media terms of service may include user Data Curation Implications of Qualitative Data JeSLIB 2021; 10(4): e1218 https://doi.org/10. 7191/jeslib.2021.1218 agreements that address data availability for research purposes. However, users generally don't read terms of service (Obar and Oeldorf-Hirsch 2020), and even if they do, they are not informed of the nature and extent of research that may be conducted with their data. The U.S. Health and Human Services' Secretary's Advisory Committee on Human Research Protections suggested in 2015 that guidance should be developed regarding consent standards for big data research, including methods such as focus groups or community advisory boards that could help big data researchers identify representative concerns of participant populations (Secretary's Advisory Committee on Human Research Protections 2015). However, such guidance is not codified in the Common Rule. Some have suggested that IRBs should review big social research even it is not yet mandated by law (Schneble, Elger, and Shaw 2018). In practice, most big social research is classified as exempt by IRBs (Metcalf 2016).
Some projects have developed technology-mediated strategies to address the issue of consent for big social research. Two examples are pop-up messages gauging participants' willingness to share certain types of data on Facebook (Hutton and Henderson 2013), and software that provides structures to "ask participants (as normal procedure within qualitative and quantitative studies) if the researcher may retrieve and use the data in a specific research project" (Bechmann and Vahlstrup 2015). However, these strategies are rarely used. Such strategies are also made more difficult by the large scale and networked nature of big social data. For example, even if one user consents to their social media posts being used for research purposes, they may @-mention other members of their network or link to other profile or group pages; these other users would therefore be part of the research dataset, without having consented to the research (Mneimneh et al. 2021).
The literature suggests that if data curators can reach investigators early in the research process, they can help provide guidance for alternative consent strategies for qualitative data reuse and big social research.

Informed consent: Data curation implications
Alternative consent strategies for qualitative data reuse • If data curators can connect with researchers early in the research process, they can help researchers draft broad consent language to support data reuse (Kirilova and Karcher 2017).
• Researchers, curators, and IRBs can also work together to support tiered consent models, allowing research participants to select the level of data sharing with which they are comfortable.
Alternative consent strategies for big social research • If data curators can connect with researchers early in the research process, they can encourage strategies such as focus groups, community advisory boards, or software-supported strategies for obtaining individual informed consent within social media platforms.

Privacy and confidentiality
While privacy is a major issue for both qualitative data reuse and big social research, some specific elements of these concerns are distinct between the two types of data.
For qualitative data reuse, deidentification strategies are used to support data sharing. However, some argue that deidentification may compromise the integrity and quality of the data or remove important contextual information (Fielding 2004;Martyn Hammersley 1997;Stenbacka 2001). Moreover, deidentification may not be guaranteed to prevent deductive disclosure based on other contextual information-exactly the kind of contextual information that is necessary to understand and reuse the data in the first place (Mauthner, Parry, and Backett-Milburn 1998;Tsai et al. 2016). Qualitative researchers often study sensitive issues such as domestic abuse, substance use, and sexual practices (DuBois, Strait, and Walsh 2018); reidentification of such data could lead to additional social or physical harm for participants.
For big social data, some researchers argue that such data are public by nature, and deidentification is therefore unnecessary (Zimmer 2010;Wilkinson and Thelwall 2011). For example, in 2016, researchers scraped profiles from the online dating service OkCupid and released the data without any attempt at deidentification (Kirkegaard and Bjerrekaer 2016), asserting that the data were "already public" and required no special privacy considerations or user consent (Zimmer 2016). However, researchers are increasingly considering privacy when using big social data. Nissenbaum's theory of contextual integrity (2009), which suggests that expectations of privacy are context-dependent, has been widely used to understand privacy online. The literature suggests that people's strategies for protecting their privacy online are constantly changing and adapting, depending on a variety of factors, including physical environment, perceived audience, social status, motivation, and technologies or social media platforms in use (Palen and Dourish 2003). The idea of contextual integrity can explain why users might be fine with publicly sharing information in one context, but feel more protective of that same information in a different context (Reuter et al. 2019).
Even if researchers intend to deidentify shared big social data, the practice of deidentification may be difficult (Zimmer 2010;Schneble, Elger, and Shaw 2018). Comparing the identifiability of traditional qualitative research with that of big social research, Chu et al. point out that while it is common in qualitative studies to directly quote respondents in order to support key findings and highlight ideas of interest, the full-text indexing of social media platforms may cause any direct quotes to be easily identifiable (Chu et al. 2019).
Data Curation Implications of Qualitative Data JeSLIB 2021; 10(4): e1218 https: //doi.org/10.7191/jeslib.2021.1218 For both qualitative data reuse and big social research, privacy should be more carefully considered when the research involves vulnerable populations (Clark et al. 2018), for whom reidentification could be especially damaging. In 1991, Sieber wrote that surveillance "is not a legitimate use of shared data and may be damaging to science" (Sieber 1991, 148). However, the intervening decades have seen a rise in technology-mediated surveillance. In the case of big social data, advertisers track social media user activities (Oboler, Welsh, and Cruz 2012), employers review the online presence of potential hires (Duffy and Chan 2019), and social media may be used by law enforcement for surveillance purposes (Jules, Summers, and Mitchell 2018). In the European Union, the General Data Protection Regulation (GDPR) went into effect in 2018 and includes the "right to be forgotten"-that is the opportunity for internet users to request their data be removed from online spaces (Voigt and von dem Bussche 2017). While the GDPR is a step forward for ethical online data practices, the ramifications for big social research are still not fully clear (Greene et al. 2019;Vestoso 2018).
To address some of the privacy challenges reviewed above, data curation and data repository services have been developed to provide deidentification support, restricted data access, and data use agreements.

De-identification procedures
• Data curators can support deidentification procedures such as deleting names or replacing with pseudonyms; removing potentially identifying details about participants' lives and experiences; amalgamating or aggregating data.

Restricted access
• Data repositories may support data embargo for a period of time or restrict access to the data.

Data use agreements
• Data curators and repositories can provide customizable data use agreements that dictate the conditions required for other researchers to access and reuse the data. The data use agreement includes terms that the user must agree to follow if they download the data. For example, the agreement may stipulate that the data be used for academic research purposes, that the research be approved by an institutional review board, or that the researcher not attempt to reidentify the data (ICPSR 2018; QDR 2019).

Intellectual property and data ownership
Qualitative data are the shared intellectual property of the research participants and the researchers. For researchers to publish the text of participant responses, participants must either waive their rights or license their responses for use in the research study (Parry and Mauthner 2004). Participants may agree to data publication when signing the consent agreement; however, if the consent agreement did not specifically include data publication and reuse, publishing the data may not be allowable. In some cases, contacting participants for re-consent may be possible (Resnik 2009). Some also suggest that if data are sufficiently deidentified, it may be ethical to publish data without explicit consent from participants (DuBois, Strait, and Walsh 2018).
While universities generally claim ownership over research data created by affiliated researchers (Steneck 2007), strategies for addressing intellectual property and data ownership may vary according to how and where the data were collected. For example, when collecting data from Indigenous communities, additional considerations and guidelines come into play. Communities who participate in research are increasingly contributing to the development of protocols that inform the ethical use of data, "allowing contributors, as collectives, to have a say in how their data actually gets used" (Carroll et al. 2021 Big social data sharing is made more complex by the fact that these data are often controlled by private, for-profit companies. In 2018, Facebook CEO Mark Zuckerberg testified before Congress, saying, "every piece of content that you share on Facebook, you own, and you have complete control over who sees it and-and how you share it, and you can remove it at any time" (Washington Post 2018). However, under United States law, intellectual property on social media is still a gray area (Blank 2018;Bosher and Yeşiloğlu 2019). Even if the contents of social media posts are the intellectual property of the users who posted them, social media companies may still implement terms of service that govern the behavior of users, developers, researchers, and archivists (Puschmann and Burgess 2014). Some social media companies have tried to prevent web scraping on their sites by invoking the Computer Fraud and Abuse Act (Neuburger 2020), thus far unsuccessfully. Social media terms of service may also prevent sharing big social data in the same manner as other research data. One example of data sharing restrictions is the case of Twitter, whose Terms of Service dictate that only Tweet ID numbers may be openly shared. In response, tools have been developed such as Documenting the Now's Hydrator tool, which uses the Twitter API to pull complete metadata for shared Tweet IDs (Summers 2017).

Intellectual property: Data curation implications
Rights management for both big social research and qualitative data reuse • Data curators and data repositories can help researchers with rights management-understanding how they can and cannot reuse shared data.
• For big social research, data curators can help researchers navigate terms of service to collect, archive, and share data in accordance with these terms.
Data licensing for qualitative data • For qualitative data, data curators can encourage researchers to consider data licensing as part of initial consent agreements, and again at the point of data archiving and sharing.
Alternative archiving strategies for big social data • If raw data cannot be archived, data repositories can archive associated information such as data workflows and code that can allow future users to replicate the data collection and analysis process (Hemphill, Leonard, and Hedstrom 2018).
• Data repositories maybe able to archive representative metadata such as lists of TweetIDs.
• Data curators can encourage inclusion of tools such as the Twitter Hydrator as part of the data deposit, to support usability for the archived data (Kinder-Kurlanda et al. 2017).

Conclusions and Future Research
Big social research and qualitative data reuse both have the potential to reveal large-scale insights about human behavior. However, epistemological, ethical, and legal challenges arise when reusing qualitative data, conducting research with big social data, and sharing or archiving big social data. This paper outlines six key challenges gleaned from the literature: context, data quality and trustworthiness, data comparability, informed consent, privacy and confidentiality, and intellectual property. Data curators can benefit from understanding these six key challenges and examining data curation implications. Data curation implications from these challenges include developing strategies for: providing clear documentation; linking and combining datasets; supporting trustworthy repositories; using and advocating for metadata standards; discussing alternative consent strategies with researchers and IRBs; understanding and supporting deidentification challenges; supporting restricted access for data; creating data use agreements; supporting rights management and data licensing; developing and supporting alternative archiving strategies. These data curation practices can help mitigate some of the challenges that are present with both data types. Future research could be done interviewing qualitative researchers, big social researchers, and data curators to verify and further investigate the challenges that have been discussed here, and to support data curation strategies that can support shared challenges. By investigating issues in qualitative data reuse and big social research side by side, data curation practices can be extended to support sounder practices for both qualitative data and big social research.