Epstein-Barr Virus Epitope-Major Histocompatibility Complex Interaction Combined with Convergent Recombination Drives Selection of Diverse T Cell Receptor alpha and beta Repertoires

Recognition modes of individual T cell receptors (TCRs) are well studied, but factors driving the selection of TCR repertoires from primary through persistent human virus infections are less well understood. Using deep sequencing, we demonstrate a high degree of diversity of Epstein-Barr virus (EBV)-specific clonotypes in acute infectious mononucleosis (AIM). Only 9% of unique clonotypes detected in AIM persisted into convalescence; the majority (91%) of unique clonotypes detected in AIM were not detected in convalescence and were seeming replaced by equally diverse “de novo” clonotypes. The persistent clonotypes had a greater probability of being generated than nonpersistent clonotypes due to convergence recombination of multiple nucleotide sequences to encode the same amino acid sequence, as well as the use of shorter complementarity-determining regions 3 (CDR3s) with fewer nucleotide additions (i.e., sequences closer to germ line). Moreover, the two most immunodominant HLA-A2-restricted EBV epitopes, BRLF1109 and BMLF1280, show highly distinct antigen-specific public (i.e., shared between individuals) features. In fact, TCR CDR3 motifs played a dominant role, while TCR played a minimal role, in the selection of TCR repertoire to an immunodominant EBV epitope, BRLF1. This contrasts with the majority of previously reported repertoires, which appear to be selected either on TCR CDR3 interactions with peptide/major histocompatibility complex (MHC) or in combination with TCR CDR3. Understanding of how TCR-peptide-MHC complex interactions drive repertoire selection can be used to develop optimal strategies for vaccine design or generation of appropriate adoptive immunotherapies for viral infections in transplant settings or for cancer. IMPORTANCE Several lines of evidence suggest that TCR and TCR repertoires play a role in disease outcomes and treatment strategies during viral infections in transplant patients and in cancer and autoimmune disease therapy. Our data suggest that it is essential that we understand the basic principles of how to drive optimum repertoires for both TCR chains, and . We address this important issue by characterizing the CD8 TCR repertoire to a common persistent human viral infection (EBV), which is controlled by appropriate CD8 T cell responses. The ultimate goal would be to determine if the individuals who are infected asymptomatically develop a different TCR repertoire than those that develop the immunopathology of AIM. Here, we begin by doing an in-depth characterization of both CD8 T cell TCR and TCR repertoires to two immunodominant EBV epitopes over the course of AIM, identifying potential factors that may be driving their selection. Citation Gil A, Kamga L, Chirravuri-Venkata R, Aslan N, Clark F, Ghersi D, Luzuriaga K, Selin LK. 2020. Epstein-Barr virus epitope–major histocompatibility complex interaction combined with convergent recombination drives selection of diverse T cell receptor α and β repertoires. mBio 11:e00250-20. https://doi .org/10.1128/mBio.00250-20. Editor Jack R. Bennink, National Institute of Allergy and Infectious Diseases Copyright © 2020 Gil et al. This is an openaccess article distributed under the terms of the Creative Commons Attribution 4.0 International license. Address correspondence to Katherine Luzuriaga, katherine.luzuriaga@umassmed.edu, or Liisa K. Selin, liisa.selin@umassmed.edu. This article is a direct contribution from Katherine Luzuriaga, a Fellow of the American Academy of Microbiology, who arranged for and secured reviews by Paul Thomas, St. Jude Children's Research Hospital, and Immo Prinz, Hannover Medical School. Received 5 February 2020 Accepted 11 February 2020 Published RESEARCH ARTICLE Host-Microbe Biology

KEYWORDS repertoire, T cell receptor, TCR, Epstein-Barr virus, EBV, acute infectious mononucleosis O ver 95% of the world's population is persistently infected with Epstein-Barr virus (EBV) by the fourth decade of life. In the 30% of individuals who are EBV serologically negative upon entering college, primary infection can result in the syndrome acute infectious mononucleosis (AIM); the frequency of reported symptomatic disease has varied from 25 to 77% of these young adults (1,2). AIM symptoms can vary greatly in severity from a mild short flu-like illness to a more severe syndrome with sore throat, lymphadenopathy, splenomegaly, hepatomegaly, and debilitating fatigue, which may last for months (1,2). However, primary infection in the majority of individuals occurs in young childhood and is essentially asymptomatic, rarely developing into AIM. A rare 5% of the population appear to never acquire infection and remain EBV serologically negative; severe illness requiring hospitalization has been reported in individuals who acquire primary EBV infection late in life (3). A history of AIM has been associated with an increased risk of subsequent multiple sclerosis (MS) (4) or Hodgkin's lymphoma (5). EBV infection is also associated with Burkitt lymphoma, nasopharyngeal cancer, hairy leukoplakia in individuals with AIDS, and lymphoproliferative malignancies in transplant patients (5,6). EBV-associated posttransplant lymphoproliferative disorders can be prevented or treated by adoptive transfer of EBV-specific CD8 T cells (6)(7)(8). Defective CD8 T cell control of EBV reactivation may also result in the expansion of EBV-infected, autoreactive B cells in MS (9). Improvement of MS has followed infusion of autologous EBV-specific CD8 T cells in some patients but not others, suggesting that there may be qualitative differences in EBV-specific CD8 T cell responses that need to be better understood (4).
Altogether, these data indicate that EBV-specific CD8 T cells are important for viral control (10). The integration of computational biology and structural modeling approaches to identify T cell receptor (TCR) antigen specificity groups and TCR features associated with virologic control (11)(12)(13)(14)(15)(16) would facilitate our understanding of how EBV-specific CD8 T cells control EBV replication and contribute to the development of a vaccine to prevent or immunotherapies to modify EBV infection (7,8,17).
One of the hallmarks of CD8 T cells is epitope specificity, conferred by the interaction of the T cell receptor (TCR) with virus-derived peptides bound to host major histocompatibility complex (pMHC) (18)(19)(20)(21). The TCR is a membrane-bound, heterodimeric protein composed of ␣ and ␤ chains. Each chain arises from rearrangement of variable (V), diversity (D), joining (J), and constant (C) gene segments (22), resulting in a diverse pool of unique TCR␣ and TCR␤ clonotypes. Additions or deletions of N nucleotides at the V(D)J junctions, specifically at the complementarity-determining region 3 (CDR3) and pairing of different TCR␣ and TCR␤ segments further enhance the diversity of the TCR repertoire, estimated to range from 10 15 to 10 20 unique potential TCR␣␤ clonotypes (23,24). This diversity allows CD8 T cell responses to a myriad of pathogens.
The CD8 TCR repertoire is an important determinant of CD8 T cell-mediated antiviral efficacy or immune-mediated pathology (16,23,(25)(26)(27)(28). Defining the relationships between early and memory CD8 TCR repertoires is important to understanding structural features of the TCR repertoire that govern the selection and persistence of CD8 T cells in memory. Deep-sequencing techniques, combined with structural analyses, provide a high-throughput and unbiased approach to understanding antigen-specific TCR␣␤ repertoires. We (29) and others (30)(31)(32)(33) have recently reported that TCR␣␤ repertoires of CD8 T cell responses to common viruses (influenza virus, cytomegalovirus [CMV], and hepatitis C virus) are highly diverse and individualized (i.e., "private") but that "public" clonotypes (defined as the same V, J, or CDR3 amino acid sequences in many individuals) are favored for expansion, likely due to selection for optimal structural interactions (34).
Studies of influenza A virus (IAV) in mice (35) and simian immunodeficiency virus (SIV) in rhesus macaques (36) have shown that the efficiency with which TCR␤ sequences are produced via V(D)J recombination is an important determinant of the extent of TCR␤ sharing between individuals (35,37). Shared TCR␤ amino acid sequences required fewer nucleotide additions and were encoded by a greater variety of nucleotide sequences (i.e., convergent recombination). Both of these features are characteristics of TCR␤ sequences that have the potential to be produced frequently (35)(36)(37)(38)(39) and are also observed in many public TCRs (29,30,(38)(39)(40)(41).
To thoroughly evaluate molecular features of TCR that are important for driving repertoire selection over time following EBV infection, we used direct ex vivo deep sequencing of both TCR V␣ and V␤ regions of CD8 T cells specific to two immunodominant epitopes, BRLF-1 109 (YVL-BR) and BMLF-1 280 (GLC-BM), isolated from peripheral blood during primary EBV infection (AIM) and 6 months later in convalescence (CONV). Each TCR repertoire had a high degree of diversity. However, we noted that persistent clonotypes accounted for only 9% of the unique clonotypes and yet they predominated in both the acute and convalescent phases of infection. An interesting corollary of this finding was that 91% of the unique clonotypes expanded in acute infection were not expanded in convalescence, appearing to be replaced in 6 months by an equally diverse set of de novo clonotypes. Expanded clonotypes detected in AIM and CONV were more likely to be generated in part as a result of convergent recombination than nonpersistent or de novo clonotypes and had distinct public features (meaning they are shared between donors), which varied by the specific epitope.

RESULTS
Patient characteristics. Three HLA-A*02:01 ϩ individuals presenting with symptoms of AIM and laboratory studies consistent with primary infection were studied (see Table S1 in the supplemental material) at initial clinical presentation (AIM) and 6 months later (CONV). Direct tetramer staining of peripheral blood revealed that 2.1% Ϯ 0.5% (mean Ϯ standard error of the mean [SEM]) and 1.1% Ϯ 0.3% of CD8 T cells were YVL-BR and GLC-BM specific, respectively, in AIM and declined to 0.3% Ϯ 0.2% and 0.3% Ϯ 0.1%, respectively, in CONV. Mean blood EBV load was 3.8 Ϯ 0.9 log 10 genome copies/10 6 B cells in AIM and 2.6 Ϯ 0.7 log 10 genome copies/10 6 B cells in CONV.
Persistent dominant clonotypes represent a small fraction of unique clonotypes, with TCR␣ and TCR␤ repertoire diversity maintained by the development of de novo clonotypes. To examine features that drive selection of YVL-BR-and GLC-BM-specific TCRs in AIM and CONV, deep sequencing of TCR␣ and TCR␤ repertoires was conducted directly ex vivo on tetramer-sorted CD8 T cells at both time points (Fig. 1, Fig. S1 and S2, and Table S2). YVL-BR-and GLC-BM-specific CD8 TCR repertoires in AIM demonstrated interindividual differences and were highly diverse; the mean (ϮSEM) number of unique clonotypes (defined as a unique DNA rearrangement) was not significantly different in CONV (Fig. 1). Each unique TCR␣ or TCR␤ clonotype detected in AIM that was also detected in CONV was defined as a "persistent" clonotype. Clonotypes were regarded as "nonpersistent" or "de novo" if they were detected only during AIM or CONV, respectively. A high level of TCR diversity was maintained from AIM to CONV; however, the number of overlapping unique clonotypes detected in both AIM and CONV was small (Fig. 1, panels i). Only a small fraction of TCR␣ or TCR␤ unique clonotypes specific to YVL-BR (6.6% Ϯ 2.2%) and GLC-BM (9.1% Ϯ 4.2%) that were present in AIM were maintained in CONV (YVL-BR, 8.7% Ϯ 4.9%; GLC-BM, 18.5% Ϯ 5.6%). However, they comprised 57.5% Ϯ 26.2% (YVL-BR) or 75.5% Ϯ 12% (GLC-BM) of the total CD8 T cell response when including their frequency (sequence reads) in AIM and 35.8% Ϯ 10.2% (YVL-BR) or 55.8% Ϯ 13.4% (GLC-BM) in CONV (Fig. 1, panels ii). While the clonotypic composition of YVL-BR-and GLC-BM-specific CD8 T cells changed over the course of primary infection, dominant TCR clonotypes detected during AIM tended to persist and dominate in CONV. Altogether, these data indicate that persistent clonotypes made up only a small percentage of unique clonotypes but were highly expanded in AIM and CONV. Surprisingly, the vast majority (91%) of unique clonotypes were not detected following AIM and were seemingly replaced with de novo clonotypes in CONV.
Persistent public clonotypes have an increased probability of generation; convergent recombination contributes to the selection of the persistent TCR␣ and TCR␤ repertoire. In both the YVL-BR and GLC-BM TCR repertoires, the percentage of public clonotypes significantly increased (chi-square test: P Ͻ 0.0001) in the persistent (for YVL-BR, TCRAV, 34%, and TCRBV, 17%; for GLC-BM, TCRAV, 27%, and TCRBV, 22%) compared to the nonpersistent (for YVL-BR, TCRAV, 5%, and TCRBV, 2%; for GLC-BM, TCRAV, 4%, and TCRBV, 4%) or de novo (for YVL-BR, TCRAV, 5%, and TCRBV, 1%; for GLC-BM, TCRAV, 6%, and TCRBV, 7%) repertoire. This suggests that the persistent clonotypes may have TCR features that led to greater probability of generation. We tested this by directly calculating the generation probability of amino acid sequences in the CDR3 to determine if the public clonotypes are easier to generate than the private at both time points, acute and convalescent. This allowed a direct and rigorously quantitative test of whether the expanded persistent public clonotypes were of higher generation probability (39,42). The TCR sequences used by dominant public TCRAV of either GLC-BM-or YVL-BR-specific responses have a significantly greater probability of generation while only the GLC-BM TCRBV public but not the YVL-BR public repertoire has a greater probability of being generated (Fig. 2). This might suggest that TCRAV is dominant and important in the selection of YVL-BR TCR repertoire, while both TCRAV and TCRBV contribute to the GLC-BM TCR repertoire.
To further study this issue, we examined whether convergent recombination played a role in the generation of these public persistent TCRs (39). Examination of memory antiviral TCR␤ repertoires in humans, mice, and macaques suggests that convergent recombination plays an important role in the selection of public antigen-specific TCRs (i.e., those shared between individuals of the same haplotype) (35)(36)(37). Consistent with previous reports for epitope-specific CD8 TCR␤ (37,43,44), our group found that convergent recombination plays an important role in EBV-specific TCR␤ repertoire selection. We also demonstrated that convergent recombination plays a role in selection of persistent TCR␣ clonotypes specific for the two immunodominant EBV epitopes, YVL-BR and GLC-BM, during the course of a human viral infection. There was an increased usage of amino acids derived by multiple different nucleotide sequences in the CDR3␣ and CDR␤ regions of persistent clonotypes compared to nonpersistent and (i) Clonotypes that persist from the acute phase into memory represent only 6 to 18% of the unique clonotypes but contribute to 35 to 75% of the total CD8 T cell response. The highly diverse nonpersistent clonotypes are replaced by new (de novo) highly diverse clonotypes, which were not present in the acute response. The average frequency of unique clonotypes that persist into the memory phase (TRAV and TRBV) in total HLA-A2/YVL-BR-specific (A) and GLC-BM-specific (B) TCR repertoire is shown (i). The average numbers (ϮSEM) of unique clonotypes from the 3 donors are shown below the pie charts. Also shown in the pie charts is the percentage that these clonotypes contribute to the total CD8 T cell response in the HLA-A2/YVL-BR-specific (A) and GLC-BM-specific (B) TCR repertoire (ii). The average numbers (ϮSEM) of sequence reads are shown below the pie charts.
the TCR by predominantly germ line gene segments (39). This was indeed the case for YVL-BR-and GLC-BM-specific clonotypes (Fig. 4); the CDR3␣ of persistent YVL-BR-and GLC-BM-specific clonotypes had fewer nucleotide additions than nonpersistent clonotypes and an increased number of nucleotide additions in de novo clonotypes of EBV-BR. However, the CDR3␤ of persistent YVL-BR-and GLC-BM-specific clonotypes did not have fewer nucleotide additions than nonpersistent clonotypes ( Fig. 4A and D). Public clonotypes of each epitope-specific response also had fewer nucleotide additions than private clonotypes, except interestingly for YVL-BR CDR3␤, where the private clonotypes had fewer ( Fig. 4B and E). Interestingly, there was an increased usage of glycines in the longer CDR3 of the de novo TCR repertoire ( Fig. 4C and F), which has been reported to be a feature associated with greater TCR promiscuity (45,46). Overall, these results suggest the use of shorter CDR3s with fewer nucleotide additions in the persistent TCRAV but not in the TCRBV clonotypes. Curiously, consistent with probability generation data (Fig. 2), the public TCRBV of EBV-BR were actually significantly longer with increased nucleotide additions. CDR3 lengths are a major factor in the selection of the YVL-BR-and GLC-BMspecific TCR␣ and TCR␤ repertoires. Differences in dominant YVL-BR-and GLC-BMspecific CDR3␣ and CDR␤ lengths were also observed between the epitopes and from AIM to CONV and between persistent and nonpersistent or de novo clonotypes (Fig. 5). There were differences in preferential use of CDR3 lengths between YVL-BR and GLC-BM. For instance, the AIM YVL-BR-specific repertoire used more of the shorter 10-mer CDR3␤ than GLC-BM in both AIM and CONV (Fig. 5A, panel ii). Within the YVL-BR response, use of the shorter 9-mer CDR3␣ decreased from AIM to CONV (Fig. 5A, panel i). Persistent YVL-BR-specific clonotypes used significantly more of the shorter 9-mer CDR3␣ and 10-, 11-, and 12-mer CDR3␤ than the nonpersistent clonotypes. In contrast, the de novo clonotypes favored the longer 12-mer CDR3␣ and focused more on 11-mer CDR3␤ length (Fig. 5B, panels i and ii). Significant changes in the GLC-BM-specific CDR3 length were also observed between AIM and CONV. For example, the frequencies of the longer GLC-BM-specific 12-mer CDR3␣ and CDR␤ clonotypes significantly increased from 13.6% Ϯ 6% and 6 Ϯ 2.8%, respectively, in AIM to 24% Ϯ 5% and 17.9% Ϯ 8%, respectively, in CONV, while use of the shorter 11-mer CDR3␣ decreased (Fig. 5A, panels i and ii). The persistent clonotypes preferentially used 9-and 11-mer CDR3␣ while de novo clonotypes used longer 12-and 14-mer lengths (Fig. 5B, panels iii and iv). The persistent clonotypes also used 11-and 13-mer CDR3␤, while de novo clonotypes used 12-mer lengths.
Selection of the TCR␣ and TCR␤ repertoires was based on the features on the specific epitope. To further elucidate factors that are driving selection of TCR specific to the two immunodominant EBV epitopes, the characteristics of the TCR repertoires for each of 3 donors were elucidated by systematically analyzing preferential TCRAV or BV segment usage hierarchy as presented in pie charts, CDR3 length analyses, V-J pairing by Circos plots of the clonotypes with the dominant CDR3 lengths, and dominant CDR3 motif; the last determines if there was an enrichment of particular amino acid residues at specific sites potentially important for ligand interaction. Enrichment for certain characteristics would suggest that these features are important for pMHC interaction (11,29,(47)(48)(49)(50).
(i) The 9-mer TCR AV8.1-VKDTDK-AJ34 drives selection of YVL-BR-specific CD8 T cells. The YVL-BR-specific TCR␣ repertoire was focused on one dominant family, AV8,   Table S3A); 87% Ϯ 1.7% of the clonotypes using this motif were AV8.1, and 92% Ϯ 1.7% were AJ34. Interestingly, this motif was present in multiple other AV and AJ pairs, including AV12, AV21, and AV3. Obligate pairing of the dominant AV8.1 response to AJ34 containing the highly conserved motif VKDTDK was observed in all donors from AIM through CONV, suggesting that the 9-mer AV8.1-VKDTDK-AJ34-expressing clones were highly selected. There was a preferential usage of BV20-BJ2.7 pairing within the dominant 11-mer response (Fig. 6B, panel ii, and  Table S3B), the CDR3␣ motif LLGG was commonly used. Clonotypes with this motif were only a minor part of the overall responses in 2 donors (E1603, E1655) but composed 17.4% of the total YVL-BR TCR␤ repertoire in E1632. Altogether, these results suggest that the 9-mer AV8.1-VKDTDK-AJ34-expressing clones were highly preferentially selected by YVL-BR ligand during AIM and CONV and that this TCR␣ could pair with multiple different TCR␤s, as suggested by the fact that there was no such dominant TCR␤ clonotype. These findings have been independently confirmed using single-cell sequencing (51).
Overall, despite individual changes, the dominant TCRV gene families and CDR3 motifs that were identified in AIM to drive the selection of YVL-BR-or GLC-BM-specific CD8 T cells were predominantly conserved in CONV, suggesting the strength of these TCR features in driving selection of the repertoire (Fig. 6 and 7 and Table S3).
Persistent, nonpersistent, and de novo clonotypes differ in selection factors. To address whether clonotypes that persisted into memory show similar characteristics as those that dominate in acute infection, YVL-BR and GLC-BM TCR␣/␤ repertoires were compared between AIM and CONV. The TCR repertoire of persistent and nonpersistent clonotypes in AIM and de novo clonotypes in CONV were examined in order to identify selection factors that governed TCR persistence.
(i) YVL-BR persistent, nonpersistent, and de novo clonotypes have unique characteristics. Persistent YVL-BR clonotypes maintained the major selection factors that were identified in AIM (Fig. 8A, Fig. S3 and S4, and Table S4). Although some features were maintained in all 3 TCR subsets, there were significant structural differences in these repertoires.
The YVL-BR nonpersistent CDR3␣ clonotypes used AV8.1, but it was paired with many more AJ gene families (Fig. S3). Moreover, AV8.1-VKDTDK-AJ34 clonotypes, which were present in 42% Ϯ 20% or 19% Ϯ 11% of all persistent clonotypes during AIM or CONV, respectively, were present in the nonpersistent response at a much lower mean frequency (6% Ϯ 1%) (Fig. 8A and Table S4A and B). The clonal composition of the CDR3␤ nonpersistent response varied greatly in BV family usage between donors (Table S4D and E) and lacked identifiable motifs, suggesting that for YVL clones expressing AV8.1-VKDTDK-AJ34 to persist, there may be some preferential if not obvious TCR␤ characteristics that make them fit better. For de novo clonotypes, new selection factors appeared that may relate to either a decrease in antigen expression or a change in antigen-expressing cells over the course of persistent infection. For instance, in the YVL-BR 9-mer de novo clonotypes, the selection factor AV8.1-AJ34 was maintained in 2 of 3 donors and a new modified motif, VKNTDK, was identified ( Fig. 8A; Fig. S3A, panel i; and Table S4C). The de novo 11-mer CDR3␣ response had increased usage of AV12 in all 3 donors (Fig. S3A, panel ii). In de novo BV clonotypes, the pattern of BV-BJ usage changed compared to that observed in AIM. Similarly, de novo 13-mer CDR3␤ clonotypes were also totally different with usage of a new motif, SALLGX, in 2 of 3 donors (Table S4F).
(ii) GLC-BM persistent, nonpersistent, and de novo clonotypes have unique characteristics. The persistent GLC-BM TCR␣ clonotypes maintained the major selection criteria that were identified in AIM with the 9-mer EDNNA motif, which strongly associated with AV5-1-AJ31, being present in a mean 5% Ϯ 3.7% or 10% Ϯ 8.6% of all persistent clonotypes during AIM or CONV, respectively, in all 3 donors ( Fig. 8B and Table S4G). The fact that clonotypes using this motif were not present in nonpersistent clonotypes suggests that this motif, and not just the gene family, may be important in determining persistence of GLC-BM-specific clonotypes. The persistent GLC-BMrepertoire also maintained the major selection criteria that were identified in AIM, with the 11-mer SARD motif that strongly associated with BV20.1-BJ1 being present in a mean 16% Ϯ 9.9% or 24% Ϯ 13.7% of all persistent clonotypes during AIM or CONV, respectively, in all 3 donors. Two of the donors had the 11-mer SQSPGG motif (Table S4I) in a mean 40% Ϯ 8% and 30% Ϯ 25% of all persistent clonotypes during AIM or CONV, respectively.
Only the SARD motif clonotypes appeared in nonpersistent BV clonotypes during AIM but at a lower mean frequency of 3% Ϯ 1% (Table S4J). The de novo clonotype selection appeared to be driven by different factors than that of the persistent clonotypes. Although there were much greater diversity and more variation between patients in de novo clonotypes (each donor is private) with recruitment of private AV families such as AV41 or AV24 in E1632 and E1655, there was still a preferential usage by 2 of 3 donors of AV5.1 (Fig. S5, panel i) and the appearance in 2 of 3 donors of a new 11-mer CDR3␣ motif, ELDGQ, which associated with AV5.1-AJ16.1 (Fig. 8B and Table S4H). De novo clonotypes were also diverse and private using uncommon BV families like BV7 and BV3 but also using common BV families such as BV20 (Fig. S6) expressing the SARD motif in 5% Ϯ 2.9% of de novo clonotypes (Fig. 8B and Table S4K). In conclusion, the persistent clonotypes made up the vast majority of the AIM and CONV responses. For the most part, the nonpersistent clonotypes did not have a motif despite the observation that some of them used a public TCR␣ or TCR␤; this suggests that one of the strongest selection factors for persistence was the CDR3 motif. Additionally, the fact that persistent clonotypes retained features that were identified in AIM further supports their validity. Altogether, these results suggest that the HLA-A2-YVL-BR-or GLC-BM-specific structure contributes strongly to the selection of dominant persistent clonotypes.

DISCUSSION
This is the first study to use deep sequencing to comprehensively investigate the TCR␣ and TCR␤ repertoires to two different EBV epitope-specific CD8 T cell responses over the course of primary infection. We show that while epitope-specific TCR repertoires are highly diverse and vary greatly between donors, they are dominated by distinct clonotypes with public features that persist into convalescence. These persistent clonotypes have distinct features specific to each antigen that appear to drive their peripheral selection; they account for only 9% of unique clonotypes but predominate in acute infection and convalescence, accounting for 57% Ϯ 4% of the total epitopespecific response. Surprisingly, the majority of highly diverse unique clonotypes were not detected following AIM and are replaced in convalescence by equally diverse de novo clonotypes (43% Ϯ 5% of the total response).
The deep-sequencing results show a highly diverse TCR repertoire in each epitopespecific response with 1,292 to 15,448 and 1,644 to 7,631 unique clonotypes detected within the YVL-BR-and GLC-BM-specific TCR-repertoires, respectively. Such diversity has been underappreciated for the GLC-BM-specific TCR repertoire, with prior studies reporting an oligoclonal repertoire (52,53,55). Despite this enormous diversity, there was considerable bias. Although the TCR repertoire was individualized (i.e., each donor studied had a unique TCR repertoire), there was prevalent and public usage of particular TCRV families such as AV8 within the YVL-BR-specific responses and AV5, AV12, BV14, and BV20 within the GLC-BM-specific populations.
One mechanism which may lead to the dominant public usage and persistence of these clonotypes is that they have TCR features that increase their probability of generation, i.e., they are potentially easier to derive. One of these features, convergent recombination in both the TCR␣ and the TCR␤ CDR3, appears to play a major role in the selection of these persistent clonotypes for expansion and maintenance into long-term memory. This is evidenced by persistent clonotypes using more amino acids that have multiple ways of being derived. A second feature is the usage of shorter germ line-derived CDR3s with fewer nucleotide additions. The selection of unique public TCR repertoire features, such as CDR3 length and particular TCRAV or BV family usage and motifs, for each epitope in clonotypes that dominate and persist suggests that these clones may be the best-fit TCR to recognize the pertinent pMHC complex. In contrast, the broad repertoire of unique clonotypes that are activated in AIM, which is marked by a high viral load and increased inflammation, may not fit as well and perhaps does not receive a TCR signal that leads to survival into memory. Interestingly, 6 months after the initial infection, a completely new (de novo) and similarly diverse TCR repertoire has expanded. Continued antigenic exposure in persistent EBV infection may contribute to the evolution of the TCR repertoire over time.
Prior studies using similar techniques to study influenza A virus (IAV) (not a persistent virus) HLA-A2-restricted IAV-M1 58 -67 and cytomegalovirus (CMV)-pp65 epitopespecific memory responses showed a similar focused diversity of epitope-specific TCR repertoires, suggesting that this is a general principle of antigen-specific repertoire structure (29,30). Altogether, these studies suggest that the pMHC structure drives selection of the particular public featured dominant clonotypes for each epitope. The broad fluctuating private repertoires show the resilience of memory repertoires and may lend plasticity to antigen recognition, perhaps assisting in early cross-reactive CD8 T cell responses to heterologous new pathogens (28,56,57) while at the same time potentially protecting against T cell clonal loss and viral escape (58). It is, however, possible that this difference in the private diverse portion of the epitope-specific TCR repertoire between acute phase and convalescence may result from sampling error as we are not able to analyze the full blood volume of an individual. In order to at least partially address this, we have analyzed TCRAV and BV deep-sequencing data from tetramer-sorted influenza A-M1 58 -specific CD8 T cells (not a persistent virus and thus not influencing TCR repertoire evolution) from one healthy donor of a similar age from two time points 1 year apart. We compared the TCR overlap of this antigen-specific population at two time points to the donors with AIM in this paper. We calculated the overlap between clonotypes at two distinct visits (v1 versus v7) using the Jaccard similarity coefficient J, which is defined as the size of the intersection divided by the size of the union of two sets of clonotypes A and B. The mean Jaccard similarity coefficient for TCRAV including both EBV epitopes during AIM was 0.075 Ϯ 0.01 (n ϭ 6) and for TCRBV was 0.075 Ϯ 0.01 (n ϭ 6). A higher Jaccard similarity coefficient was observed in the healthy donor for TCRVA (0.172) and for TCRVB (0.208). The much higher Jaccard coefficients obtained for the healthy donor suggest that the low overlap between clonotypes observed for acute-versus convalescent-phase visits in EBV-infected individuals would not be due to sampling alone. Also, the significant differences in the characteristics of the TCR repertoires of the nonpersistent and de novo populations would suggest that these are different populations.
There have been limited reports of the importance of TCR␣ in viral epitope-specific responses. Biased TRAV12.2 usage with CDR1␣ interaction with the MHC has been observed with the HLA-A2-restricted yellow fever virus epitope LLWWNGPMAV (59). HLA-B*35:08-restricted EBV BZLF1-specific responses appear to be biased in both TCR␣ and TCR␤ usage, much like HLA-A2-restricted EBV-BR (60,61), with a strong preservation of a public TCR␣ clonotype, AV19-CALSGFYNTDKLIF-J34, which can pair with a few different TCR␤ chains. TCR␣ chain motifs have also been described for HLA-A2restricted influenza A virus M1 58 -67 (IAV-M1), but these appear to make minor contributions to the pMHC-TCR interaction, which is almost completely dominated by CDR3␤ (29,45,46).
The TCR repertoire of the HLA-A2-restricted IAV-M1 epitope is highly biased toward the TRBV19 gene usage in many individuals and displays a strong preservation of a dominant XRSX CDR3␤ motif. Crystal structures of TCR specific to this epitope have revealed that the TCR is ␤-centric with the conserved arginine in the CDR3␤ loop being inserted into a pocket formed between the peptide and the ␣2 helix of the HLA-A2 (29,62). The TCR␣ has little role in pMHC engagement, and this helps explain the high degree of sequence variability in the CDR3␣ and conservation in the CDR3␤. Similarly, previous studies using EBV-GLC-BM-specific CD8 T cells have documented that TCR-pMHC binding modes also contribute to TCR biases (63). The highly public HLA-A2restricted EBV-GLC-BM-specific AS01 TCR is highly selected because of a few very strong interactions of its TRAV5-and TRBV20-encoded CDR3 loops with the peptide/MHC.
The present TCR deep-sequencing studies thus reinforce our previous report of an underappreciated role for TCR␣-driven selection of the EBV-YVL-BR-specific repertoire (Fig. 6) (51). To the best of our knowledge, our combined studies are among the first to describe a TCR CDR3␣-driven selection of viral epitope-specific TCRs with minimal contribution by the TCRBV. The AV8.1 family was used by all individuals and dominated the conserved 9-mer response; it obligately paired with AJ34 and had a predominant CDR3 motif, VKDTDK, representing 42% and 19% of the total persistent response in AIM and CONV, respectively. In contrast, the BV response was highly diverse without evidence of a strong selection factor, suggesting that AV8.1-VKDTDK-AJ34 could pair with multiple different BV and still successfully be selected by YVL-BR-MHC. In contrast, we did not find any of these AV8.1-VKDTDK-AJ34-expressing TCRs in a survey of deep sequencing of sorted naive phenotype CD45RA ϩ CCR7 ϩ CD8 T cells from 3 agematched, healthy individuals (one EBV serologically negative and two EBV serologically positive). These results suggest that this clonotype is not inherently present at a high frequency in the naive repertoire but requires interaction with EBV-YVL-BR to be selected and expanded to these high frequencies.
In contrast, the selection of EBV-GLC-BM-specific TCR repertoire was driven by strong interactions with both chains of TCR, ␣ and ␤, such as AV5.1-EDNNA-AJ31, BV14-SQSPGG-BJ2, and BV20.1-SARD-BJ1, previously identified public features (43,52,53,55). In a recent study comparing TCR␣ and TCR␤ repertoires of various human and murine viral epitopes, none of the responses were primarily driven by interaction with TCR␣ alone; rather, they were predominantly driven by strong interactions with TCR␤ or a combination of TCR␣ and TCR␤ (11). This apparent preference of YVL-BR TCR repertoires for particular TCR␣s may create a large repertoire of different memory TCR␤s that could potentially cross-react with other ligands such as IAV-M1 58 , which predominantly interacts with TCR␤ (11,27,29).
Using single-cell paired TCR␣␤ sequencing of tetramer-sorted CD8 T cells ex vivo, we have previously reported that at the at the clonal level recognition of the HLA-A2restricted EBV-YVL-BR epitope is mainly driven by the TCR␣ chain (51). The CDR3␣ motif KDTDKL resulted from an obligate AV8.1-AJ34 pairing. This observation, coupled with the fact that this public AV8.1-KDTDKL-AJ34 TCR pairs with multiple different TCR␤ chains within the same donor (median 4; range, 1 to 9), suggests that there are some unique structural features of the interaction between the YVL-BR/MHC and the AV8.1-KDTDKL-AJ34 TCR that lead to this high level of selection. TCR motif algorithms identified a lysine at position 1 of the CDR3␣ motif that is highly conserved and likely important for antigen recognition. Crystal structure analysis of the YVL-BR/HLA-A2 complex revealed that the MHC-bound peptide bulges at position 4, exposing a negatively charged aspartic acid that may interact with the positively charged lysine of CDR3␣. TCR cloning and site-directed mutagenesis of the CDR3␣ lysine ablated EBV-BR-tetramer staining and function. Interestingly, we had previously used TCR structural modeling of the EBV-YVL-BR/MHC complex to predict the occurrence of this important protuberant lysine which might impact TCR interaction (64). Future structural analyses will be important to ascertain whether the YVL-BR TCR␣ contributes the majority of contacts with the pMHC.
Altogether, our data provide several insights into potential mechanisms of TCR selection and persistence. First, prior studies have revealed that selective use of particular gene families can be explained in part by the fact that the specificity of TCR for a pMHC complex is determined by contacts made between the germ line-encoded regions within a V segment and the MHC (63,65). We show here a highly unique observation of a viral epitope-specific response being strongly selected based not only on a particular TCRAV usage but a highly dominant CDR3␣ motif and AV-AJ pairing (i.e., the YVL-BR-specific AV8.1-VKDTDK-AJ34 clonotype), with very little role for the TCRBV. Second, it has been suggested that public TCRs represent clonotypes present at high frequency in the naive precursor pool as they may be easier to generate in part as a result of bias in the recombination machinery (66) or convergent recombination of key contact sites (35,37,43,63). Our data demonstrate that convergent recombination of TCR␣, as well as TCR␤, may play a dominant role in peripheral selection of clonotypes that are persistently detected through memory. As previously reported for TCR␤ (35,37,43,63), public clonotypes had a greater probability of being generated. They used more convergent amino acids than private clonotypes, not only in the CDR3␤ but also in the CDR3␣. YVL-BR TCR␤, which interestingly is not a strong selection factor for persistent clonotypes, did not have public clonotypes with features that led to greater probablity of being generated. Finally, we have previously reported that TCR immunodominance patterns also seem to scale with the number of specific interactions required between pMHC and TCR (29). It would seem that TCRs that find simpler solutions to being generated and to recognizing antigen are easier to evolve and come to dominate the memory pool (29). Consistent with this, our data demonstrate that the dominant persistent clonotypes used shorter predominantly germ line-derived CDR3␣.
Despite the apparent nonpersistence of the vast majority of the initial pool of clones deployed during acute infection, clonotypic diversity remained high in memory as a result of the recruitment of a diverse pool of new clonotypes. In a murine model, adoptive transfer of epitope-specific CD8 T cells of known BV families from a single virus-infected mouse to a naive mouse, followed by viral challenge, resulted in an altered hierarchy of the clonotypes and the recruitment of new clonotypes, thus maintaining diversity (67). A highly diverse repertoire should allow resilience against loss of individual clonotypes with aging (45) and against skewing of the response after infection with a cross-reactive pathogen (68)(69)(70)(71). The large number of clonotypes contributes to the overall memory T cell pool, enhancing the opportunity for protective heterologous immunity now recognized to be an important aspect of immune maturation (56,72,73). A large pool of TCR clonotypes could also provide increased resistance to viral escape mutants common in persistent virus infections (58). Finally, different TCRs may activate antigen-specific cell functions differently, leading to a more functionally heterogeneous pool of memory cells (74). In summary, our data reveal that apparent molecular constraints are associated with TCR selection and persistence in the context of primary EBV infection. They also show that TCR CDR3␣ alone can play an equally important role as CDR3␤ in TCR selection and persistence of important immunodominant responses. Thus, to understand the rules of TCR selection, both TCR␣ and TCR␤ repertoires should be studied. Such studies could elucidate which of the features of the epitope-specific CD8 TCR are associated with an effective response and control of EBV replication or disease.

MATERIALS AND METHODS
Study population. Three individuals of the age of 18 years (E1603, E1632, and E1655) who presented with clinical symptoms consistent with acute infectious mononucleosis (AIM) and laboratory studies indicative of primary infection (positive serum heterophile antibody and EBV viral capsid antigen [VCA]-specific IgM) were studied as described previously (27). Blood samples were collected in heparinized tubes at clinical presentation with AIM symptoms (acute phase) and 6 months later (memory phase). Peripheral blood mononuclear cells (PBMC) were extracted with Ficoll-Paque density gradient medium.
Ethics statement. The Institutional Review Board of the University of Massachusetts Medical School approved these studies (IRB protocol no. H-3698). All human subjects were adult and provided written informed consent.
Analysis of TCR␣ and TCR␤ CDR3s using deep sequencing. The total RNA isolated from a minimum of 10,000 tetramer ϩ CD8 T cells was reverse transcribed into cDNA and sent to Adaptive Biotechnologies for TCR␣ and TCR␤ chain profiling following the protocols and standards for sequencing and error correction that comprise the ImmunoSEQ platform. In summary, PCR amplification of the CDR3 is performed using specialized primers that anneal to the V and J recombination regions. Unique molecular identifiers are added during library preparation to track template numbers. After sequencing, CDR3 nucleotide regions are identified and clonal copy numbers are corrected for sequencing and PCR error based on known error rates and clonal frequencies. Sequences of CDR3s were identified according to the definition founded by the International ImMunoGeneTics collaboration. Deep-sequencing data of TCR␣ and TCR␤ repertoires were analyzed using ImmunoSEQ Analyzer versions 2.0 and 3.0, which were provided by Adaptive Biotechnologies. Only productively (without stop codon) rearranged TCR␣ and TCR␤ sequences were used for repertoire analyses, including sequence amino acid composition and gene frequency analyses. The frequencies of AV-AJ and BV-BJ gene combinations were analyzed with subprograms of the ImmunoSEQ Analyzer software and further processed by Microsoft Excel.
Circos plots and motif analysis. The V and J gene segment combinations were illustrated as Circos plots (76) across different CDR3 amino acid sequence lengths. Motif analysis was performed using the Multiple EM for Motif Elicitation (MEME) framework (77). Consensus motifs were acquired across different CDR3 lengths, and statistics on those motifs were computed with an in-house program called motif-Search and available at http://github.com/thecodingdoc/motifSearch. EBV DNA quantitation in B cells. B cells were purified from whole blood using the RosetteSep human B cell enrichment cocktail according to the manufacturer's recommendations (StemCell Technologies, Vancouver, BC, Canada). Cellular DNA was extracted using the Qiagen DNeasy blood and tissue kit (Valencia, CA). Each DNA sample was diluted to 5 ng/l, and the Roche LightCycler EBV quantitation kit (Roche Diagnostics, Indianapolis, IN) was used to quantify EBV DNA copy number in the samples as recommended by the manufacturer. Reactions were run in duplicate. B cell counts in each sample were determined using a previously described PCR assay to quantify the copy number of the gene encoding CCR5 (two copies per diploid cell) (78). Samples were normalized to B cell counts, and EBV DNA copy number was calculated as DNA copy per 10 6 B cells.
Convergence analyses. The number of unique nucleotide sequences encoding an amino acid sequence of TCRAV and TCRBV regions specific for YVL-BR and GLC-BM epitopes was calculated across the pooled repertoires of all individuals. The number of nucleotide additions required to produce a TCRAV or TCRBV sequence was determined by aligning the germ line V gene at the 5= end of the TCRAV or TCRBV sequence and then the J gene segment at the 3= end of the TCR sequence. The germ line D genes were subsequently aligned with nucleotides in the junction between the identified V and J regions. Nucleotides identified in the junctions between the V, D, and J gene segments were considered to be nucleotide additions. The significance values are based on multivariant two-way analysis of variance (ANOVA).
Statistics. GraphPad Prism version 7.0 for Mac OSX (GraphPad Software, La Jolla, CA) was used for all statistical analyses.
Data availability. Raw TCR deep-sequencing data are in immuneACCESS and can be accessed at https://doi.org/10.21417/AG2020MBIO.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only.