Endogenous viral elements (EVEs) are remnants of past viral infections that have been integrated into host genomes during a germline infection. Once integrated EVEs are passed on to progeny and can become fixed in a population. Long thought of as “junk DNA”, EVEs are ubiquitous throughout vertebrate genomes and show intriguing trends – of the >134 viral families known, only four are commonly found in mammals – Retroviridae, Bornaviridae, Filoviridae and Parvoviridae. Additionally, only select genes from these viruses are widely endogenised – nucleocapsid or replicase genes.
We analysed 35 RNA sequencing datasets from thirteen Australian marsupial species for the presence of transcribed EVEs and aimed to elucidate a potential function. We utilised a BLAST-based bioinformatics workflow to screen transcriptomes for the presence of transcripts resembling viral genes. Once identified, we mapped the transcripts back to the marsupial genome, where possible (n=5/13), to confirm their endogenous origin. For the koala (Phascolarctos cinereus), we also analysed small RNA sequencing data to identify if any EVEs give rise to small RNA. Using the identified EVEs as a reference, we mapped small RNA from ten koala datasets using HiSat2 and filtered the resulting hits by length (18-21 nt = siRNA-like, 23-29 nt = piRNA-like). Additionally, the integration time of each transcriptionally active EVE was estimated by identifying orthologues in other marsupial genomes using a BLASTn search on Ensembl.
We identified an overrepresentation of bornavirus replicase (54/188) and filovirus nucleoprotein (56/188) EVEs transcribed in all 13 Australian marsupials. The oldest of which is estimated to be 140 million years old, predating the divergence of South American and Australian marsupials. In the koala we identified that replicase and nucleoprotein EVEs give rise to piRNA-length molecules with characteristics of primary and secondary piRNAs. They are enriched in the testis tissue, and are predominantly antisense to EVE transcripts, suggesting a possible regulatory or protective function for these small RNAs.