Introduction to Endogenous Retroviruses
Advances in biochemical technology since 2000 have allowed us to determine the full DNA sequences for humans and other animals. This new information has illuminated our evolutionary history. A number of patterns in our DNA are consistent with a common ancestry of humans and other primates.
One such genetic feature is the distribution of endogenous retroviruses (ERVs) in our genomes. As most readers know, viruses work by introducing their RNA or DNA into a host cell, and hijacking the host cell’s genetic machinery to start making more copies of the virus. Some viruses, called “retroviruses”, do this by having their RNA transcribed into DNA, which then gets inserted into the cell’s DNA genome. (This is considered “retro”, because normally in a cell DNA is transcribed into RNA, not the other way around). The HIV virus that causes AIDS is an example of a retrovirus. Once the virus’s DNA has been integrated into the host’s DNA, the viral genome is known as a prototype retrovirus, or provirus.
A virus may end up killing its host, or it may cause little damage. Various types of cells in an animal’s body can become infected with viruses. Most of us have experienced the common cold, where the virus thrives in the cells lining the respiratory tract. In some cases, a virus can infect a cell in the “germ line”. Germ line cells include the egg and sperm, as well as cells that produce the egg and sperm. If a retrovirus inserts its genetic load into a germ line cell of an animal, this viral DNA will then be passed down to all descendants of that animal, appearing as an “endogenous” retrovirus (ERV) in their genomes.
If that animal happened to be a common ancestor to two or more future species, all of these species would show this ERV at the same place in their genome, i.e. in orthologous (homologous) locations. Genomes mutate over time, and sometimes whole chunks of DNA get moved around, but there is generally enough genetic context to determine whether a location is homologous among the various primates. The ERVs themselves accumulate mutations that make them non-infectious and further degrade their sequences with time. Nevertheless, thousands of ERVs retain enough genetic identity to be clearly identified in the human genome.
The genetic signature of a retrovirus in the genome is very distinctive. ERVs have common features such as the genes that code for the viral coat protein and for the reverse transcriptase that copies the viral RNA genome into DNA. The ERV DNA codes for three groups of proteins, known as “gag” (matrix, capsid, nucleoproteins), “pol“ (protease, reverse transcriptase, RNaseH, dUTPase, integrase) and “env” (subunit and transmembrane). This genetic core is flanked by long terminal repeats (LTR) sections. Finally, when the retrovirus tears open the host genome for insertion, some of the torn original host DNA is recopied on either side of the viral insert.
Here is what all this looks like for the insertion of a particular retrovirus from the CERV 30 family into the chimpanzee genome:
This happens to be an ERV that is in chimps, not in the human genome at that location. In the corresponding spot in the human genome, there is a sequence of DNA bases of: A T T A T. In the chimp genome, this sequence at the point of insertion has become duplicated on either side of the ERV, as discussed above. The ERV shows the usual features of the gag, pol, and env genes, with the LTRs on the ends. More details about retroviral insertion are found here .
These distinctive features make it relatively straightforward to search through the human genome sequence and identify ERVs. At least 275 full-length ERV’s can be observed. These ERVs are relatively recent (e.g. last ten million years) arrivals to the primate genomes. Older ERVs tend to get chopped up by the usual shuffling of genomes over time. About 200,000 entities in human DNA, constituting a full 8% of the genome, have been identified as being ERVs or chunks of ERV’s. Most of these chunks are solitary LTRs. [cf. Lander, et al. (2001) and Seifarth, et al. (2005) ].
All human ERV’s except for one are found in all humans, indicating that they entered the ancestral human genome before Homo sapiens became a distinct species. The exception is in the HERV-K(HML2) family. By examining the DNA from a diverse set of people, Belshaw, et al. identified 113 elements of the HERV-K(HML2) family in the human genome. Most of these elements occur in all people. However, at least 8, and perhaps 11, of these elements are insertionally polymorphic – – some human individuals have the insertion while other individuals have the empty, preinsertion site. This shows that this virus family has been transcriptionally active within the age span of the human race.
Where in the Genome Do Retroviruses Insert?
The human genome contains some three billion base pairs, but not every one of these sites is equally likely to be the place where a retrovirus inserts. For instance, Mitchell, et al. (2004) identified insertion sites in the human genome for three different retroviruses: human immunodeficiency virus (HIV), avian sarcoma-leukosis virus (ASLV), and murine leukemia virus (MLV). They looked at a total of analysis of 3,127 integration sites. Some preferred types of locations were observed. HIV tended to insert in gene-rich regions, MLV favored integration near transcription starts, while ASLV showed only a weak preference for active genes.
It should be noted that these preferences for types of regions does not mean that a specific virus favors insertion into any one particular spot in the genome. For each virus, there are many thousands of sites at which it could insert. The figure below shows where these three viruses inserted on the first three out of 23 human chromosomes. The blue “lollipops” are the HIV, the purple are MLV, and the green are ASLV. These insertions are spread broadly across the chromosomes (including the other 20 chromosomes), rather than being focused in just one spot or a few regions.
These are not the only possible insertion sites for these virus, but just the spots that showed up in this limited study. A more detailed study by Wang, et al. mapped 40,569 unique sites of HIV integration in the human genome. Thus, while the insertions of ERVs are not equally likely across all three billion sites of the genome, they may be characterized as quasi-random, since a given retrovirus will insert essentially randomly in one of many thousands of potential integration locations.
Effects of Retroviral DNA Insertions on Human Genetic Function
The DNA associated with retroviruses started out as functional genetic material, including the protein-coding gag, pol, and env genes, and the LTRs, which are rich in promoters. As these chunks of DNA get inserted at various spots in the human genome, they can have various effects on the human metabolism. Some of these effects are bad, and some are good.
The human genome is moderately tolerant towards mutations. At each generation, we inherit about 50 new mutations compared to our parents’ DNA, distinct from the usual allele rearrangement. If a generation is about 30 years, compared to people 3000 years ago each one of us has about 5000 mutations in our DNA. Often, these mutations are fatal. About 40% of fertilized eggs end up being spontaneously aborted as miscarriages, due in part to genetic defects. Of the babies who survive through birth, about 3% have genetic disorders such as congenital heart disease. However, the rest of us get along fairly well, and all the genetic shuffling occasionally produces a genius like Einstein, or a modified gene which gives resistance to cardiovascular disease.
If the insertion of a retrovirus in a particular spot in some human’s genome gave a very bad effect, that human would die without reproducing, and that particular genome would not be promulgated. However, sometimes the effects of the ERV are just moderately bad, producing disorders which are not immediately fatal. For instance, ERVs in humans have been tied to a number of cancers, including Hodgkin’s lymphoma, melanoma, and bladder and breast cancer. ERVs are also implicated in a number of autoimmune disorders, such as rheumatoid arthritis and lupus. (For more information see Katoh and Kurata, “Association of Endogenous Retroviruses and Long Terminal Repeats with Human Disorders“, 2013 ).
On the good side, the proteins expressed from the env genes of several retroviruses embedded in the human genome help with the development of the placenta. Barry Desborough discusses how the function of these genes in the human genome are similar to the function of these genes in the native retrovirus.
Also, many LTRs have retained regulatory activity, and have landed close to genes where they can influence the expression of proteins from those genes. Over 100 LTRs have been demonstrated to help control transcription of human genes, and several thousand other LTRs could potentially have that function as well.
The original viral replication functions of the ERVs found in humans have been disabled by mutations. The functionalities which now observed for human ERVs are in general what would be expected for some 200,000 quasi-random insertions of chunks of DNA into the genome over tens of millions of years: most ERVs have no known effort, some cause genetic disorders, and some have useful interactions with the rest of the genome.
Young Earth (YE) creationists point to these instances of functionality as evidence that ERVs were purposefully placed in the genome by God when He created the first humans a few thousand years ago. However, ERVs bear all the marks of having come from functioning viruses. For many ERVs, we can recognize all or nearly all of the components of a retrovirus (viral gag, pol, and env genes, LTR sections, etc.), which would have the capacity to integrate into the genome if they had not been disabled by additional mutations. Even the plain LTRs are distinctive.
Another sign that ERVs were actual insertions is the duplication of some original DNA on either side of the ERV, as discussed above:
The hallmark of an insertion is a displacement of chromosomal DNA, and the hallmark of insertion by integrase is the presence of target site duplication, due to the way it attacks the 5′ and 3′ phosphodiester bonds with an offset of a few base pairs. Since full-length ERVs are accompanied by target site duplications and DNA displacement, they are necessarily endogenized/fixed proviral insertions. So any functional components are necessarily post-insertion exaptations, and the fact that they are necessarily insertion means that they cannot be part of any ‘original design.’ The issue of functionality is simply a red herring.
Moreover, if endogenous retroviruses were divinely-created portions of Adam’s DNA, all humans would possess the same set of ERVs. But HERV-K shows that this is not the case: for several instances of this ERV family, some people have them in their genomes, and some have the empty pre-insertion site. This shows that retroviruses are in fact inserted into human genomes to form ERVs.
Comparison of Human and Chimp ERV Locations
The first drafts of the complete human genome were published in 2001. This achievement was followed by sequencing the DNA of other animals, including chimpanzees. Humans and chimpanzees are thought to have diverged from a common ancestor around 6 million years ago.
As discussed in Three Layers of Endogenous Retroviral Evidence for the Evolutionary Model, there are two broad approaches to comparing the genomes of two different species. One is to examine variations in insertions and deletions (“indels”), while the other is to analyze the whole genome. The “Three Layers” article describes these analyses in moderate detail. The conclusions from both approaches is the same: “Less than 100 ERVs are human-specific and less than 300 ERVs are chimpanzee-specific.” Thus, out of some 200,000 ERVs in the human genome, “The percentage of ERVs in identical loci is greater than 99.9%.” In other words, nearly all of the many thousands of ERVs in the human genome occur in the same locations in the chimpanzee genome.
To assess the implications of this, let’s start by considering a very simple case, where only one ERV insertion was found in both humans and chimpanzees. Suppose further that this particular retrovirus which we will call retrovirus A could randomly insert in any one of 10,000 locations in the human genome, and also in the same 10,000 locations in the matching chromosomes of chimpanzees. If retrovirus A integrated into an the genome of an ancestral chimp, and in a separate infection event also endogenized into the DNA of an ancestral human, there would be a 0.01% ( 1 /10,000) chance that the resulting ERV A would be found in the same location in both species.
Now, let’s extend this thought experiment to having two shared ERVs. If both species were independently infected with retrovirus B as well as with retrovirus A, the probability is only 1/100,000,000 that virus B, as well as virus A, would happen to end up in matching sites in humans and in chimps. This would constitute very strong evidence that these ERVs did not arrive at their locations through random, independent infection events in humans and in chimps. A more reasonable explanation is that humans and chimps both descend from a common ancestor, whose genome suffered the insertion of these two viruses in these two locations.
Moving now to the actual situation, there are at least 100,000 ERV insertions found in the same locations in humans and in chimps. There is essentially no chance that all these identical insertion points could have occurred by independent insertion events in the two lineages. Again, this shows that these insertions occurred in ancestors which are common to both humans and chimpanzees.
There are a few exceptions to this co-location of human and chimp ERVs, i.e. there are a few ERV families that appear in one species but not the other. For instance, out of 42 families of ERVs in chimps, 40 appear in the orthologous positions in the human genome and 2 do not. [Polavarapu, et al., 2006]. This is to be expected, since human and chimp lineages diverged some 6 million years ago. That is plenty of time for a few new ERV families to be introduced independently to humans and to chimps, or for some previously-shared ERVs to be lost from humans or from chimps due to well-known genetic processes such as genetic drift and incomplete lineage sorting. Anyone interested in understanding cases such as CERV 1/PTERV1 can google the subject to find valid scientific explanations of these issues.
Young earth creationists, of course, try to mount objections to the science described here. This article has answered some of the most common objections. Barry Desborough has answered additional questions.
Nested hierarchies of ERVs: More evidence for common ancestry
If all of today’s mammals evolved via a branching family tree from some common ancestral population, we would expect to find that species that are more closely related would share more genetic features of all kinds. This requirement of nested hierarchies is a mathematically rigorous test for evolution. I won’t go into it here, but these patterns show up with ERV’s, as discussed at Three Layers of Endogenous Retroviral Evidence for the Evolutionary Model, and also at VWXYNot .
Theological Implications of Endogenous Retroviruses
The distribution of ERVs in human and chimpanzee genomes is powerful evidence of common ancestry and macroevolution. As described in The Pope Speaks on Creation and Evolution, the Roman Catholic Church has largely made its peace with evolution, as have liberal Protestants. The more conservative evangelical Protestants hold a high view of the Bible as trustworthy, divinely-inspired revelation. I happen to share that view of Scripture. However, it is one thing to affirm the Bible as infallible, and it is a quite different thing to claim that any particular interpretation of that Bible is infallible.
A little reflection will show that even within the world of Bible-believing Protestants, there are many points of doctrine which are the subject of intense disagreement. As one example, Pentecostal Christians affirm that spiritual gifts like prophecy and praying in “tongues” are meant to continue in the church today, while cessationalists like John MacArthur denounce these practices as blaspheming the Holy Spirit. Both sides claim that Scripture is on their side. These disagreements show that it is possible for devout believers with the highest possible regard for the Bible to have fundamental disagreements in their interpretations of that Bible.
Unfortunately, many evangelicals in North America confuse their interpretation of God’s revelation with the revelation itself. Such is the case with Young Earth (YE) creationism. These folks hold that the only viable treatment of the Genesis creation narrative is a wooden literalism. Thus, the world was created in six 24-hour days about six thousand years ago, and Adam and Eve were specially created, not evolved from other primates. This view is promulgated by organizations such as Answers in Genesis.
Two Key Errors in Young Earth Creationism
YE creationism errs in several ways. First, it fails to take into account the pervasive use of figurative revelation throughout the Bible. In the Old Testament and in the book of Revelation, divine communication was often given in some indirect form, some picture or narrative which both concealed and revealed the underlying truth. The prophet in I Kings 20 confronting King Ahab and the prophet Nathan confronting King David both started off by telling a story which didn’t literally occur as though it were true. If one took a literalistic approach to interpretation like today’s YE creationists so, both of these prophets should have been reprimanded for speaking “error”. But to do so would be to completely miss the point of those narratives.
Likewise, telling stories that were not literally true was the primary teaching device of Jesus Christ: “He did not say anything to them without using a parable.” For most of Jesus’ parables, the hearer is expected to figure out that the story is not really about some son who ran away and fed pigs or about some unfortunate traveler who got mugged on the way to Jericho. The hearer needs to enter into the story and see that he or she is represented by one or more of the characters in it; that was the point of the parable, not whether the story itself ever actually happened. But all this is lost on YE creationists, who hold doggedly to simple literalism in Genesis as being somehow intrinsically more pious.
A second major theological error in YE creationism is its refusal to take seriously the evidence in God’s creation. Modern YE creationism stems from the publication of The Genesis Flood by Whitcomb and Morris in 1961. In the preface to the sixth printing, Whitcomb and Morris candidly reveal the basis of their thinking:
We believe that the Bible, as the verbally inspired and completely inerrant Word of God, gives us a true framework of historical and scientific interpretation, as well as of so-called religious truth. This framework is one of special creation of all things, complete and perfect in the beginning, followed by the introduction of a universal principle of decay and death into the world after man’s sin, culminating in a worldwide cataclysmic destruction of the “world that then was” by the Genesis Flood. We take this revealed framework of history as our basic datum, and then try to see how all the pertinent data can be understood in this context…the real issue is not the correctness of the interpretation of various details of the geological data, but simply what God has revealed in His Word concerning these matters.
On this telling, the authors KNOW that the earth was recently created, that decay and death only entered the world following Adam’s apple, and all terrestrial life was drowned apart from the humans and animals on Noah’s ark. Knowing this to be the case, they feel justified in distorting or ignoring whatever physical evidence points to an old earth – they know that old-earth evidence MUST be invalid, so they try to squash it into their young-earth model, and when that fails, simply ignore it: “We take this revealed framework of history as our basic datum, and then try to see how all the pertinent data can be understood in this context.”
As discussed in Exposing the Roots of Young Earth Creationism, this sort of solipsism runs counter to historic Protestant thought, which acknowledged the value of God’s revelation in His works as well as His word. Francis Bacon, who defined the modern scientific method, described this two-books approach: “There are two books laid before us to study, to prevent our falling into error; first, the volume of the Scriptures, which reveal the will of God; then the volume of the Creatures, which express His power.” In The Advancement of Learning (1605) Bacon wrote:
Let no man … think or maintain that a man can search too far, or be too well studied in the book of God’s word, or the book of God’s works, divinity or philosophy; but rather let men endeavor an endless progress or proficience in both; only let men beware that they apply both to charity, and not to swelling; to use, and not to ostentation; and again, that they do not unwisely mingle or confound these learnings together.
The Christian thinkers of the early 1800s followed Bacon’s advice to “not unwisely mingle or confound these learnings together”. Thus, when the physical evidence of the age of the earth contradicted their literal interpretation of Scripture, they did not try to suppress or distort those findings. Rather, they realized that their interpretation of Genesis was likely incorrect. As Davis Young notes, “Because the Christian naturalists of the era were unafraid of God-given evidence, they recognized that extrabiblical information provided a splendid opportunity for closer investigation of the biblical text in order to clear up earlier mistakes in interpretation.”
The reformer John Calvin wrote that in the Genesis creation narrative God accommodated the story to the limited understanding of common people, rather than giving a scientifically precise account. “He who would learn astronomy, and other recondite arts, let him go elsewhere” – – meaning, the Bible was not written for the purpose of telling us about the physical universe. In Calvin’s view, the way to understand the stars and the planets in a God-honoring manner was to go scientifically study them, not to rely on inferences from Biblical statements.
In their mistaken commitment to literalism, YE creationists overlook and minimize what the Bible does claim for itself. The clearest teaching of the Bible on the Bible is found in II Timothy 3:15-17:
… from infancy you have known the Holy Scriptures, which are able to make you wise for salvation through faith in Christ Jesus. All Scripture is God-breathed and is useful for teaching, rebuking, correcting and training in righteousness, so that the servant of God may be thoroughly equipped for every good work. (NIV)
The wording here is instructive: “wise for salvation”, “faith in Jesus Christ”, “for teaching, rebuking, correcting and training in righteousness.” This is all about doctrine and morals; nothing about geology or biology. Those who try to extend the range of the Bible’s authority to geology and biology think they are being faithful, but in fact are merely imposing their own fallible opinions on the infallible Word.
Various examples can be adduced which demonstrate that Scriptural statements about the physical world, which were appropriate and meaningful for the original audience, can be incorrect according to modern knowledge. To take a simple example, Jesus taught:
“What shall we say the kingdom of God is like, or what parable shall we use to describe it? It is like a mustard seed, which is the smallest of all seeds on earth. Yet when planted, it grows and becomes the largest of all garden plants, with such big branches that the birds can perch in its shade.” [Mark 4:30-32 NIV].
The literal statement here is that the mustard seed is the “smallest of all seeds on earth”. Jesus was speaking here proverbially, and the mustard seed is used elsewhere (e.g. Matt. 17:20) as an example of smallness. The context is sowing and growing. The mustard seed was the smallest seed that first-century Jewish farmers would sow in the earth, so this was an appropriate word picture for that audience to illustrate the growth of the kingdom from tiny beginnings. However, even in ancient Galilee folks were likely familiar with seeds from non-agricultural plants which were smaller than mustard seeds, and modern naturalists have found other seeds which are smaller yet.
If a Bible literalist were truly consistent, he should respond, “I don’t care what those godless scientists say, Jesus said that the mustard seed was the smallest seed, and that’s that. This is the infallible Word of God, so every statement regarding the natural world must be correct.” This would be to make the same mistake, of course, that Bible literalists make with Genesis 1. Most Christians understand that this parable was not really intended to teach horticultural facts; to obsess over whether Jesus taught “error” here would be to entirely miss the point of the passage.
The plain, literal meanings of a number of verses depict an unmoving earth and a moving sun (e.g. “He set the earth on its foundations; it can never be moved” Ps. 104:5; “…The world is firmly established; it cannot be moved” I Chron. 16:30; cf. Isa. 66:1, Eccl.1:5, and Josh. 10:13). In the time of Galileo, Catholic theologians held that the only faithful way to interpret these verses was that the earth was in fact stationary. According to Cardinal Roberto Bellarmine (1615), “…to affirm that the sun is really fixed in the center of the heavens and the earth revolves swiftly around the sun is a dangerous thing, not only irritating the theologians and philosophers, but injuring our holy faith and making the sacred scripture false.”
Astronomical observations eventually led Christians to conclude that the verses that speak of a stationary earth and a moving sun were not intended to be teaching science. Today some fundamentalists try to claim that these verses were not really teaching a stationary earth. But that is how nearly all Christians understood these verses, until science forced a reinterpretation.
These examples where the plain, literal meaning of Bible passages must be set aside due to modern science demonstrate that Whitcomb and Morris are utterly mistaken in their assertion that the Bible gives us a “true framework of … scientific interpretation.” The Bible does not do that, never claimed to do that, and could not possibly do that if it were to be an effective means of communication to an ancient people with a pre-scientific world view.
YE creationists hold to this failed mind-set, and thus are forced to ignore, deny, or misrepresent the physical facts. I have documented some of these maneuvers in Evidences for a Young Earth.
When it comes to ERVs, a look at the discussions of this topic on the internet shows a ongoing effort by YE creationists to deny the obvious. They claim that ERVs are not really viral insertions, or that each ERV could only insert in one site in the genome, and so on. It is tedious to pick through all the misrepresentations, so I won’t do that here.
It seems impossible to get a YE creationist to really engage with the evidence, as long as he fears that acknowledging common ancestry entails abandoning all that he holds dear. The most fruitful way forward is for him to re-examine his assumptions on Bible interpretation. I have sketched out my reconciliation of evolution with Scripture in Evolution and Faith: My Story, Part 2 . The Biologos article, “Why should Christians consider evolutionary creation?” includes a number of good links to further testimonies and articles on this subject.
Made in God’s Image
A key concern about common ancestry is that it might threaten our status as being created by God and being bearers of God’s image. This calls for careful thought, not defensive pronouncements. Traditionalists are offended at the thought that we came from monkeys, but the reality is even more humiliating. We come not from monkeys, but from single-celled eggs. Every human alive today came into existence as a fertilized egg, like a fertilized chimpanzee egg but with slight differences in the sequences of nucleotides along the strands of DNA. This raises a host of questions:
Is the unfertilized egg (a single, microscopic cell with only the mother’s DNA) the image of God? Does it become the image of God the instant that a sperm cell delivers the other half of the DNA to this single cell? After the fertilized cell has divided a number of times to form a hollow sphere? When the heart first beats, but there is no real consciousness? At birth? How is Adam’s nature passed down? Which genes in our DNA were mutated to make us into sinners? If an egg from a donor mother is fertilized in vitro and implanted in a second woman, is original sin transmitted through the donor mother or the birth mother? It cannot be the case that God simply assigns a soul to an egg as soon as it is fertilized: identical twins result from the division of an egg after it is fertilized, yet presumably they each have their own soul.
Until the answers to these questions are clarified, there is no place for dogmatic pronouncements on evolution being incompatible with a Biblical view of man. We, today, are all made from chemicals (starting from the complexly organized egg and sperm), under the superintending providence of God. This is true of all humans now living, and their parents and grandparents. Therefore, exactly how God made the first humans (from dust or from other primates) is completely irrelevant to the status of us today – – our humanity or value or image of God.