Genomic Diversity and phylogenetic relationships of human papilloma virus 16 (HPV16) in Nepal

Robert Makowsky1,2, Pema Lhaki3, Howard W Wiener 4, Madhav P Bhatta5, Michael Cullen6, Derek C Johnson4, Rodney T Perry4, Mingma Lama 3, Joseph F Boland6, Meredith Yeager6, Sarita Ghimire7, Thomas R Broker8, Sadeep Shrestha4


1Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Alabama, USA;

2 HTG Molecular Diagnostics, Arizona, USA (current);

3 NFCC International, Kathmandu, Nepal;

4Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, Alabama, USA;

5Department of Biostatistics, Environmental Health Sciences, and Epidemiology, College of Public Health, Kent State University, Ohio, USA;

6Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD;

7Cancer Screening Center, Nepal Cancer Care Foundation, Lalitpur Nepal

8Department of Biochemistry and Molecular Genetics, University of Alabama at Birmingham, Birmingham, AL, USA;


Objectives: HPV16 is the most common high-risk HPV that causes cervical cancer. Sequence variants in HPV16 confer difference in oncogenic potential; however, to date there have not been any HPV sequence studies performed in Nepal. The objective of this study was to characterize HPV16 viral genome sequence from Nepal and compare them with a reference sequence and determine the lineage. Additionally, we sought to determine if five High-grade Serous Intra-endothelial Lesion) HSIL subjects were genetically distinct from the non-HSIL subjects

Methods: DNA from exfoliated cervical cells from 17 individuals in Nepal who were previously identified to be HPV16. A custom HPV16 Ion Ampliseq panel of multiplexed degenerate primers was designed that generated 47 overlapping amplicons and covered 99% of the viral genome for all known HPV16 variant lineages. All sequence data were processed through a custom quality control and analysis pipeline of sequence comparisons and phylogeny.

Results: While there were high homologies across the genome, there were two indels. Compared to the PAVE reference HPV16 genome, there were up to 8, 3, 27, 23, 9, 6, 31, and 20 nucleotide variants in the E6, E7, E1, E2, E4, E5, Le, and L1 genes respectively . Based on sequence variants, HPV16 from Nepal can be classified into two lineages and we found no evidence of genetic distinctness between HSIL and non-HSIL subjects.

Conclusions: The evolutionary and pathological characteristics of the comprehensive HPV16 genome from Nepal provide the basis for further studies on the oncogenic potential.

Introduction

HPV is a double-stranded, 8 kb DNA virus with a diameter of 55 nm. It is a small virus, encoding only 9-10 genes, and it interacts with the host-cell proteins to complete a productive life cycle. The relative arrangement of 9-10 open reading frames (ORF) within the genome is conserved in all papillomavirus types. A unique feature of this virus is that the partly overlapping ORFs are arranged on only one DNA strand, so that all transcription is in the same direction. The viral genome is divided into three regions: the upstream regulatory region (URR), the early region (E), and the late region (L). Between different types, the size of the URR varies from 500-1000 bp. There are no ORFs in this area, but control elements regulate HPV-DNA replication and gene expression.1 Seven different genes in the “early region” have been designated E1, E2, E4, E5, E6, and E7. The early region encodes proteins predominantly involved in viral replication (E1 and E2), transcription (E2), and transmembrane up-regulation in cell cycling (E5). In high-risk types, this region also encodes proteins (E5, E6, and E7) that promote transformation of the host cell to cancer.2,3

HPV type 16 (HPV16) is the most prevalent HR HPV type worldwide and is found in the majority of cervical cancer cases. Since the first sequencing of HPV16 genome, accession K02718,4 various naturally occurring variants have been described. Phylogenetic analyses of primarily E6 and/or the LCR have described various variant lineages and sub-classifications. Some of these variants could result in differential viral persistence and development of cervical cancer. For example, the incidence of cervical cancer varies between 3.8 (Israel) and 48.2 (Columbia) per 100,000 women. 5 In the US, the rates are 14.2, 11.5 and 8.5 in Hispanics, African American and European Americans, respectively.6 However, African American women have the highest mortality from invasive cervical cancer, at 5.0 per 100,000 US women, compared to 2.4 for European Americans and 3.0 for Hispanics.6 Although, racial differences in incidence and mortality for cervical cancer can be viewed in relation to accessibility and use of screening, diagnosis, and treatment, different genetic makeups of both the host and virus and cellular mechanisms apparently regulate the biology of the disease. In particular, non-European HPV16 lineage contributes more to the persistence of infection and cervical cancer progression than European-lineage.7,8

The countries of South Asia contribute approximately one third of the world’s cervical cancer cases. With 24.2 cases per 100,000 individuals, Nepal has one of the highest rates of cervical cancer in South Asia. 9 Approximately 2,150 cases of cervical cancer occur in Nepal each year, resulting in over 1,100 deaths for a mortality rate of over 50%. 9 Despite the unknown yet potential epidemic of cervical cancer in Nepal, there is very little information on the prevalence of the prevailing HPV genotypes and lineage. To address the missing gap of HPV16 lineage in Nepal, we have sequenced the entire HPV in 17 individuals in two major regions of Nepal and examine the classification of HPV16 variant sublineages by using the known geographical references.

MATERIALS AND METHODS

Origin of clinical specimens . We have previously conducted health camps for HPV screening in two regions of Nepal, first in Accham district in far-west Nepal10 and second in Jhapa district in far-east Nepal. The protocols have been described in detail for Accham in previous report10 and same procedures were followed for Jhapa. Briefly, cervical specimen were a) self-collected using the APTIMA Cervical Specimen Collection and Transport (CSCT) kit (Hologic/Gen-Probe, San Diego, CA) collected by Auxiliary Nurse Midwives (ANMs) using ThinPrep® PreservCyt® medium (Hologic/Gen-Probe, San Diego, CA). Cervical specimens were tested for HPV genotype first using generic APTIMA® HR-HPV mRNA (APTIMA ® HPV) (Hologic/Gen-Probe, San Diego, CA), which detects the presence of E6/E7 mRNA for 14 types of HR-HPV (HPV 16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58, 59, 66, 68) and all positives were retested using APTIMA® HPV16 18/45 Genotype (Hologic/Gen-Probe, San Diego, CA) Assays.16 In Accham, 5/261 individuals tested positive for HPV 16 in both self-collected and clinician-collected samples.10 In Jhapa, xxx individuals tested positive for HPV16 in both, xxx in self-collected but not clinician-collected and xxx in clinician-collected but not self-collected samples. The remaining aliquots of all samples that tested positive for HPV16 were used for sequencing in this study. Cervical cytology was assessed for research purposes using clinician-collected ThinPrep® PreservCyt® medium (Hologic/Gen-Probe, San Diego, CA), with results classified according to the Bethesda criteria. In Accham, women had high-grade squamous intraepithelial lesion (HSIL) and one had squamous cell cancer (SCC).10,11 In Jhapa, only one individual had HSIL.

The geographic location of the husband’s migration was categorized as “Migrated within Nepal for work”, “Migrated outside of Nepal for work”, or “Has never migrated for work”. Further, in Jhapa since there is a Bhutanese refugee camp, several questions about immigration and place of birth was asked in the survey questionnaire to assess the refugee status of each individual. If the individual was born in Bhutan and answered that they lived in the Bhutanese refugee camp, they were considered “refugee”.

Institutional review boards from the University of Alabama at Birmingham and the Nepal Health Research Council approved this study.

DNA Isolation. DNA was extracted using QIAamp MiniElute Media Kit, following manufacturer’s manual (Qiagen, Valencia, CA).

Ion AmpliSeq Library Preparation and Sequencing. A custom Ion AmpliSeq HPV16 panel was employed to amplify the entire 7,906 bp HPV16 genome as 47 overlapping amplicons ranging in size from 181 bp to 375 bp, as previously described.12 Custom HPV16 degenerate primers were designed using a consensus sequence with ambiguity codes (IUPAC) derived from seven sequences representing major HPV16 lineages (2 European, 1 European/Asian, 1 African-1, 1 African-2 and 2 Asian).

Libraries were generated following the manufacturer’s Ion AmpliSeq Library Preparation kit 2.0-96LV protocol (Life Technologies, Part #4480441) with modifications as previously described12

PCR based viral sequencing was performed with 2 microliters of samples utilizing the Ion Torrent Personal Genome Machine (PGM; Life Technologies, Carlsbad CA). A custom HPV16 Ion Ampliseq panel, described above, covered 99% of the viral genome for all known HPV16 variant lineages. All sequence data were processed through a custom quality control and analysis pipeline of sequence comparisons and phylogeny.

Because the HPV genome is circular, one of the 47 amplicons was split bioinformatically to create 48 overlapping contigs that were mapped across the genome. Raw sequencing reads generated by the Ion Torrent sequencer were quality and adaptor trimmed by Ion Torrent Suite and then aligned to the HPV16R reference (7906bp) sequence13 using TMAP. To filter out human reads, all the reads were realigned to hg19-HPV16 hybrid reference genome and only HPV16 reads were kept.

Phylogenetic tree construction and statistical analyses . Before alignment, the following accessioned sequences were added: HQ644286, lineage A (European prototype (Ep)); HQ644251, sub lineage A-4 (European Asian (Ea)); HQ644238, lineage B (African 1 (AF-1)); AF472509, lineage C (African 2 (AF-2)); and HQ644255 lineage D (Asian American (AA)); and NC_001526 as the reference genome, all of which served as comparison sequences14 Maximum-likelihood was performed using MEGA software version 6.0 assuming the General Time Reversible modelwith Gamma rate distribution and invariant sites (GTR + I + G) using a maximum parsimony starting tree. Node support was assessed by bootstrapping with 1000 replications and nodes with corresponding values < 70 % were collapsed. A discrete Gamma distribution was used to model evolutionary rate differences among sites (5 categories (+G, parameter = 0.1000)). The rate variation model allowed for some sites to be evolutionarily invariable ([+I], 48.1926% sites). The tree is drawn to scale, with branch lengths measured in the number of substitutions per site.

Logistic regressions were utilized to determine if the five subjects (Achham 1, Achham 2, Achham 3, Achham 4, and Jhapa 11) with HSIL or SCC had any differences that separated them from the remaining subjects. To accomplish this, we tried to approaches. First, we analyzed each variable nucleotide site singly. Second, each gene was translated and we looked to see if any combination of variable amino acids could perfectly classify the two groups. Finally, we calculated the odds ratio for the 350 G/T variant with HSIL given its previously reported importance.

RESULTS

The aligned dataset consisted of 23 nucleotide sequences and a total of 7,912. The sequences aligned perfectly to the reference genome with the exception on Lineages B, C, and D. Specifically, between sites 4,179 and 4,180 of the reference genome, lineages B and C had a “AT” insertion while lineage D had a “TTTAT” insertion. Between sites 4,197 and 4,202 of the reference genome, lineages C and D had a 4 base pair deletion. All of these insertions were in non-coding regions between E5 and L2. The maximum overall pairwise distance observed was 1.876% and Table 1 depicts the number of pairwise nucleotide differences between each subject and the reference genome per gene. Across the alignment of 7, 912 nucleotides, there were 286 variables sites (3.6%). Table 2 depicts the observed amino acid differences between each subject and the reference genome per gene.

The best ML tree had a log likelihood of -13,470.1366 (Figure 1). After collapsing nodes containing less bootstrap support proportions < 0.70, we observed two primary clades; A, comprised of Lineage A and close relatives, and non-A, comprised of lineages B-D and close relatives. The five HSIL subjects did not comprise a single monophyletic lineage but were instead spread out across the phylogenetic tree (Table 3).

There were no variable nucleotides or combination of amino acids within genes that would perfectly separate the HSIL from non-HSIL subjects. Three of the 11 subjects with the T variant exhibited HSIL while 2 of the 12 subjects with the G variant exhibited HSIL (odds ratio of developing HSIL with a T compared to a G of 1.875; 95% CI of 0.25 to 14.08).

DISCUSSION

This study sought to explore the genetic diversity and phylogenetic signal across HPV samples from people living in Nepal. Whole viral genomes were sequenced for 17 individuals located throughout the country with varying travel histories. We observed that HPV samples feel across two primary clades, which correspond to the common “Lineage A” and other lineages. Finally, we were unable to find any genetic variants that could explain HSIL status.

We found that genetic variation was not constant across genes. At the nucleotide level, % variability per gene ranged from 2.02% to 4.86% for the E7 and E4 genes, respectively. At the amino acid level, % variability per gene ranged from 2.04% to 6.03% for the E7 and E2 genes, respectively. Compared to the Maver et al (2014) study that assessed sequenced L1, LCR (L2 in this study), and in E6 in Slovenia, the variability in Nepal was much higher regardless of the HPV strain (40, 42, 43, or 44). The results from the Yang et al (2014) study in Southwest China were found slightly higher variability than this study. In the E6 gene they observed 17 variable sites compared to 15 in this study; In the E7 gene they observed 10 variable sites compared to 6 in this study

Our phylogenetic analyses produced a tree with two primary lineages, “A”, which includes the reference genome, and all other lineages. All but one of the samples from Jhapa samples grouped with the “A” lineage, while the remaining sample (Jhapa 11) grouped with Lineage “C”. The Achham samples were more diverse, with 3 samples falling within the “A” lineage and 2 grouping with lineage “D”. The subject’s reported travels, spouse’s travels, or refugee status did not consistently offer an explanation for these patterns, meaning that multiple lineages must have been actively spreading within Jhapa and Achham. Compared to the Xiao et al (2016) study of full HPV Type 11 lineages in China, the diversity in Nepal is much higher. Specifically, all samples in the Xiao study were from lineage A and only 49 sites were variable (compared to 286 in this Nepalese study).

Finding one or more genetic variants that are strongly associated with HSIL status could have important clinical ramifications. For example, the 350G/T variant has been found to be associated with HSIL, but the variant that is associated with higher HSIL risk is population dependent (Cornet et al., 2013). In Nepal, Cornet et al (2013) only observed HSIL in subjects with the T variant while in neighboring India, Sathish et al (2005) found the G variant to increase the odds of HSIL. In this study, the odds ratio of developing HSIL with a T compared to a G was greater than 1 (1.875), but the 96% CI included 1 and the p-value was far from statistically significant (0.5412). Therefore, our study finds no evidence that either variant is associated with greater odds of developing HSIL.

Given our small sample size (5 subjects with HSIL), we opted to not conduct significance tests but to instead look for perfect separation between HSIL and non-HSIL subjects. This was performed at both the univariate nucleotide level as well as the multivariate amino acid level within genes. At both levels, we did not find and variants (combinations thereof within genes) that perfectly grouped HSIL and non-HSIL subjects. Additionally, we found that HSIL subjects did not contain HPV lineages that clustered together phylogenetically, but instead were spread out across the phylogenetic tree and together accounted for 167 of the 286 total variable sites. Specifically, two of the five HSIL subjects (Achham 1 and 4) were infected with strains from the “A” lineage, one subject (Jhapa 11) was infected with a strain most closely related to the “C” lineage, and the remaining two (Achham 2 and 3) with stains most closely related to the “D” lineage”

The results from this study contribute to the field of molecular epidemiology of HPV16 in two regions in Nepal. Interestingly, even in a small sample-size, we observed samples closely related to HPV lineages A, C and D. It is important to correlate HPV variants with severity of pre-cancer and cancer regionally to understand and develop reliable diagnostics and effective therapeutics that are specific for Nepal. Studies in larger sample size as well as HPV variants in other high-risk HPV are warranted to fully understand circulating lineage in the region. This information can significantly contribute to the knowledge on the role of HPV variants in cervical neoplasia and vaccines against HPV.


ACKNOWLEDGEMENTS

We thank the study participants and the staff at Nepal Fertility Care Center (NFCC) who helped with the sample collection, particularly Pankaj Bhattarai who was instrumental in the health camps. The Family Health Division, within the Nepal ministry of health also assisted with the logistics of the health camp and data collection. The study was supported in part by the pilot fund from the Department of Epidemiology, UAB School of Public Health. Derek Johnson was supported by a National Institutes of Health Cancer Prevention and Control Training Grant (R25CA47888). Hologic/GenProbe Inc. (San Diego) donated reagents and kits and provided testing of HPV and liquid cytology.

Figure Legend

Figure 1 – Phylogenetic tree based on variants in 17 complete genomic sequences from two distinct regions in Nepal. NC_001526 has been used as the reference genome and the lineages are based on the following sequences: HQ644286, lineage A (European prototype (Ep)); HQ644251, sub lineage A-4 (European Asian (Ea)); HQ644238, lineage B (African 1 (AF-1)); AF472509, lineage C (African 2 (AF-2)); and HQ644255 lineage D (Asian American (AA)).


Table 1. Total number of nucleotide variants by gene per individual compared to the reference genome ( NC_001526 ) with overall summaries.

E6

E7

E1

E2

E4

E5

L2

L1

Achham1

0

1

1

0

0

0

5

2

Achham2

6

2

24

19

7

4

28

15

Achham3

6

3

25

19

7

4

29

17

Achham4

0

1

1

1

1

0

5

2

Achham5

0

0

3

3

1

1

2

2

Jhapa1

1

0

2

2

1

2

2

1

Jhapa2

1

0

1

1

1

2

2

1

Jhapa3

1

0

4

2

1

1

2

2

Jhapa4

1

0

2

1

1

2

4

3

Jhapa5

1

0

1

1

1

2

3

1

Jhapa6

1

0

1

1

1

2

3

3

Jhapa7

0

0

0

1

1

2

3

4

Jhapa8

1

0

3

3

1

1

2

3

Jhapa9

0

0

4

2

1

1

2

2

Jhapa10

1

0

1

1

1

3

3

2

Jhapa11

6

2

23

18

5

4

25

20

Jhapa12

1

0

1

1

1

3

5

1

LineageA

1

0

0

1

1

2

4

1

LineageB

7

2

13

20

7

6

22

19

LineageC

8

3

26

22

9

6

25

18

LineageD

6

3

27

23

9

6

31

18

Sub-lineageA-4

2

2

5

9

4

3

7

2

# Total Sites

477

297

1950

1103

288

252

1422

1596

# Variable Sites

15

6

47

38

14

11

64

39

% Variable Sites

3.14%

2.02%

2.41%

3.45%

4.86%

4.37%

4.50%

2.44%


Table 2. Total number of amino acid variants by gene per individual compared to the reference genome ( NC_001526 ) with overall summaries (AA=Amino Acid).

E6

E7

E1

E2

E4

E5

L2

L1

Achham1

0

1

1

0

0

0

3

1

Achham2

3

0

10

13

2

1

10

8

Achham3

3

0

10

13

2

1

11

8

Achham4

0

1

1

1

1

0

2

0

Achham5

0

0

1

1

0

1

1

1

Jhapa1

1

0

2

1

0

2

1

1

Jhapa2

1

0

1

1

0

2

1

1

Jhapa3

1

0

2

1

0

1

1

1

Jhapa4

1

0

1

1

0

1

3

2

Jhapa5

1

0

1

1

0

2

2

1

Jhapa6

1

0

1

1

0

1

2

1

Jhapa7

0

0

0

1

0

2

2

2

Jhapa8

1

0

1

1

0

1

1

1

Jhapa9

0

0

2

1

0

1

1

1

Jhapa10

1

0

1

1

0

2

2

2

Jhapa11

3

0

6

12

2

1

10

8

Jhapa12

1

0

1

1

0

2

2

1

LineageA

1

0

0

1

0

2

2

1

LineageB

4

0

3

16

3

3

8

5

LineageC

3

1

8

15

3

3

9

8

LineageD

3

0

10

16

3

3

11

9

Sub-lineageA-4

2

1

1

8

0

2

5

1

# Total AA

158

98

649

365

95

83

473

532

# Variable AA

9

2

15

22

5

4

27

15

% Variable AA

5.70%

2.04%

2.31%

6.03%

5.26%

4.82%

5.71%

2.82%


Table 3. Migration and refugee status and cervical cytology of individuals with HPV16 infection at two screening sites

Tree Name

Clade

Female Migration

Husband Migration

Cytology

Refuge

Achham1

A

No

No

HSIL*

NA

Achham2

Non-A

No

India

HSIL*

NA

Achham3

Non-A

No

India

SCC§

NA

Achham4

A

No

India

HSIL*

NA

Achham5

A

No

No

normal

NA

Jhapa1

A

NA

No

normal

Yes

Jhapa2

A

No

No

normal

Yes

Jhapa3

A

No

No

normal

No

Jhapa4

A

NA

Qatar

normal

No

Jhapa5

A

No

No

normal

No

Jhapa6

A

No

No

normal

Yes

Jhapa7

A

No

Kathmandu

normal

Yes

Jhapa8

A

No

No

normal

No

Jhapa9

A

No

NA

normal

No

Jhapa10

A

No

Qatar

normal

No

Jhapa11

Non-A

No

NA

HSIL*

No

Jhapa12

A

NA

Kathmandu

normal

No

HSIL – high-grade squamous intraepithelial lesions; SCC – squamous cell carcinoma

References

1. Iftner T. Papillomavirus genomes sequence analysis related to functional aspects . Boca Raton Florida: CRC Press; 1990.

2. de Villiers EM, Fauquet C, Broker TR, Bernard HU, zur Hausen H. Classification of papillomaviruses. Virology. 2004;324:17-27.

3. Munger K. The role of human papillomaviruses in human cancers. Frontiers in Bioscience. 2002;7:d641-649.

4. Seedorf K, Krammer G, Durst M, Suhai S, Rowekamp WG. Human papillomavirus type 16 DNA sequence. Virology. 1985;145:181-185.

5. Haverkos H, Rohrer M, Pickworth W. The cause of invasive cervical cancer could be multifactorial. Biomed Pharmacother. 2000;54:54-59.

6. Sanchez ML, Katsumata K, Atsumi T, et al. Association of HLA-DM polymorphism with the production of antiphospholipid antibodies. Annals of the rheumatic diseases. 2004;63:1645-1648.

7. Freitas LB, Chen Z, Muqui EF, et al. Human papillomavirus 16 non-European variants are preferentially associated with high-grade cervical lesions. PLoS One. 2014;9:e100746.

8. Sichero L, Ferreira S, Trottier H, et al. High grade cervical lesions are caused preferentially by non-European variants of HPVs 16 and 18. Int J Cancer. 2007;120:1763-1768.

9. Centre) WIICoHaCCHI. Human Papillomavirusband Related Cancers in Nepal. Summary Report 20102010.

10. Johnson DC, Bhatta MP, Smith JS, et al. Assessment of high-risk human papillomavirus infections using clinician- and self-collected cervical sampling methods in rural women from far western Nepal. PLoS One. 2014;9:e101255.

11. Johnson DC, Lhaki P, Bhatta MP, et al. Spousal migration and human papillomavirus infection among women in rural western Nepal. Int Health. 2016.

12. Cullen M, Boland JF, Schiffman M, et al. Deep sequencing of HPV16 genomes: A new high-throughput tool for exploring the carcinogenicity and natural history of HPV16 infection. Papillomavirus Res. 2015;1:3-11.

13. G. M, C. B, K. M, et al. Human papillomaviruses: a compilation and analysis of nucleic acid and amino acid sequences. : Los Alamos, N. Mex: Theoretical Biology and Biophysics Division, Los Alamos National Laboratory; 1997.

14. Smith B, Chen Z, Reimers L, et al. Sequence imputation of HPV16 genomes for genetic association studies. PLoS One. 2011;6:e21375.