Phylogeography: the basics
The mtDNA and the male-specific part of the Y chromosome (or MSY) are the two uniparental, non-recombining genetic marker systems, which led the way for genealogical and phylogeographic studies.
For both, the genetic material is transmitted down the generations in a single block, along the maternal line of descent for the mtDNA and the paternal for the MSY. As they are passed down, mutations accumulate, highlighting the distinct branches within the genealogy as they appear and leading to the formation of related clusters that we call haplogroups, each of which can trace its descent back to a single common ancestor.In the early days of the molecular revolution, Avise and colleagues argued that mtDNA was ‘not just another molecular marker' because of the exceptional opportunities it afforded for estimating intra-specific gene trees - in themselves an estimate of the genealogy (or ‘coalescent tree') of the locus (or stretch of DNA) under study - and detecting geographic patterns in the distribution and ages of clusters within the trees.[14] The same soon became true for the MSY as well. Although some population geneticists might disagree, in the age of complete human mtDNA genomes (‘mitogenomes') we would argue that this remains true to a surprising extent. Avise was writing in the days of short mtDNA control-region sequences (a few hundred base-pairs from the non-coding part of the mtDNA genome) or slightly higher-resolution restriction maps of the whole-mtDNA genome. Controlregion sequences now number more than 150,000, but since about the year 2000 whole-mtDNA genomes have also begun to accumulate, and those publicly available already exceed 15,000.[15] Although many more will be needed in order to address all the issues that archaeogeneticists have been trying to get their teeth into since the 1980s, these are already providing an exquisite degree of resolution of the maternal genealogy.
An even greater level of genealogical resolution is starting to appear for the male line of descent,[16] since every new complete human genome includes a complete MSY sequence, ready-made.As far as the rest of the human genome goes, analytical techniques have barely kept pace so far with the astonishing progress made by sequencing technology, probably because the latter has been driven by medical science and archaeogenetics remains something of a minority (and low-budget) pursuit. The autosomes - the genes in the rest of the genome - are recombined and reshuffled with each other at every generation, and are therefore much more difficult to analyse genealogically than mtDNA and the MSY. Thus although the new level of detail is providing fascinating new portraits of human populations around the world, the flood of new data is often more difficult to interpret in terms of the questions that archaeologists might be interested in, although progress is rapidly being made.
As well as detailed phylogenetic reconstruction and an evolutionary history that broadly seems to reflect that of human populations as a whole, mtDNA retains the edge over other genetic systems in one other way: genetic dating. The so-called ‘molecular clock' has been around since the 1960s, and aroused intense controversy from the start. Different parts of the genome patently evolve at vastly different rates, rates change along lineages over time, and they are affected by numerous processes that are not particularly well understood, such as selection. Dating any particular region of the genome thus brings numerous challenges. Microsatellites (short repetitive regions in which the number of short tandem repeats varies up and down) have often been targeted, but the mechanisms by which they evolve are poorly understood and there is enormous variation from one to the next. On the other hand, stretches of unique autosomal sequence may have relatively little variation within sequence blocks that have not undergone recombination, since the autosomal mutation rate is very low.
For the mtDNA, however, there is a wealth of variation within the non-recombining 16.5 kb (kilo-base) unit, and the effects of selection have been investigated and can be corrected for when estimating time depths, despite some claims to the contrary.[17]Moreover, the mtDNA has to some extent broken free of its humanchimp split fossil calibration point, by incorporating known ages for island colonizations into the calibration. Although it is more difficult to find suitable archaeological calibration points than is generally appreciated, some do exist that can be used to help refine and corroborate rates calibrated on the human-chimp split (assumed to be around seven million years ago, on the basis of the estimated age of Sahelanthropus).[18] Again the high variation in mtDNA helps with this, because better-known relatively recent events, such as the settlement of the Remote Pacific, can be used.[19] Although we still await a full measure of agreement on the mtDNA clock, the consensus is increasingly broad.
Phylogeography utilizes three variables: the reconstructed phylogenetic tree of descent, or genealogy, the geographic distribution of the lineages, and the time depth of various clusters. It is based on the very simple notion that every new variant in a DNA sequence arises by mutation at a particular point in space and time, which can in principle (if not always in practice) be pinned down by examining the distribution of both the lineages descending from the newly derived sequence and those preceding it in the tree. For example, mutations defining a new genealogical branch of the mtDNA (which we call haplogroup L3) arose around 70,000 years ago in East Africa,[20] and subsequently gave rise to various descendant lineages, two of which are primarily found in non-Africans, throughout the rest of the world. This suggests a dispersal out of Africa which can be dated to around 50,000-60,000 years ago by measuring the mutational variation accumulated on top of each of the founders in each part of the world.
This approach forms the basis for ‘founder analysis'.[21]Founder analysis is an attempt to formalize a phylogeographic approach to identifying colonization events, but it exemplifies the approach more broadly. It is not designed for the standard population-genetics scenario of a single large population splitting into two daughters, but for more realistic situations where a minority founder group breaks away from the main (source) population to found a new one (the sink). This is likely to have been the driving process in the dispersal of modern humans around the world, although this assumption needs to be explored in each case. Founder analysis assumes that we can distinguish samples from a source region and a sink region (often on non-genetic grounds), and it then subtracts source from sink diversity as a proxy for arrival times in the sink.
Phylogeographic approaches receive more than their fair share of criticism within population genetics. However, critics often forget that the methodologies have been validated in relatively simple situations where the demographic history is quite well understood, such as Polynesia and southern Africa, and that many scenarios proposed on the basis of mtDNA evidence have subsequently been confirmed by genome-wide analyses - the most obvious being the origins and dispersal of modern humans. A simplistic view of how the analyses are carried out has led to the suggestion that phylogeographic interpretations are essentially ‘story-telling' - in contrast to ‘more robust' approaches such as ‘interpretation of population statistics' or ‘explicit modelling of population history'. But while procedures that better estimate the uncertainty of phylogeographic conclusions may be on the way, they have not yet arrived in a form useful for the kinds of question we are interested in here. In fact, as the authors of one major statistical critique of phylogeographic analyses point out, ‘phylogeographic analyses have been a tremendously powerful tool in the analysis of population genetic data' - perhaps, they suggest, because key assumptions (essentially the operation of the founder effect) may in practice have been largely correct.11
In any case, exploratory analysis (and even ‘story-telling') and hypothesis testing should not be seen as hard-and-fast opposites, and phylogeography has been used to test hypotheses and draw inferences in a number of ways.
We advocate an interdisciplinary (and even trans-disciplinary) approach in which hypotheses are evaluated within the framework of models supplied by archaeology, palaeoanthropology, and palaeoclimatology. The improvement in techniques for recovering DNA from archaeological remains can only enhance this approach. Unfortunately, this has so far only been achieved in Europe, on which we therefore focus in order to bring out the underlying methodological issues. We then discuss more briefly a few of the better studied situations in other parts of the world.