The Full Wiki

Race and genetics: Map


Wikipedia article:

Map showing all locations mentioned on Wikipedia article:

Since the work of Charles Darwin and Gregor Mendel, scientists have striven to understand human genetic variation and its relationship to human evolution. Race and genetics is a broad multidisciplinary set of studies that attempt to use the sciences of human genetics and evolution to inform our understanding of race.

Early history

Blood groups

geographic distribution of blood group A

Prior to the discovery of DNA as the hereditary material, scientists used blood proteins to study human genetic variation. Research by Ludwik and Hanka Herschfeld during World War I found that the frequencies of blood groups A and B differed greatly from region to region. For example, among Europeans, 15% were group B and 40% were group A. Eastern Europeans and Russians had higher frequencies of group B, with people from India having the highest proportion.

The Herschfelds concluded that humans were made of two different "biochemical races," each with its own origin. It was hypothesized that these two pure races later became mixed, resulting in the complex pattern of groups A and B. This was one of the first theories of racial differences to include the idea that visible human variation did not necessarily correlate with invisible genetic variation.

It was expected that groups that had similar proportions of the blood groups would be more closely related in racial terms, but instead it was often found that groups separated by large distances, such as those from Madagascar and Russia, had similar frequencies. This confounded scientists who were attempting to learn more about human evolutionary history. The next big advance in biological description of human variation would come with the discovery of more blood groups and proteins.

Blood proteins and molecular evolution

Techniques based on molecular evolution principles were used in early studies of presupposed racial differences. One major technique in the field is to use mutations in individual proteins or genetic sequences as a molecular clock indicating the evolutionary relatedness of various species or groups.

Luigi Luca Cavalli-Sforza and Anthony Edwards would then incorporate these techniques into the field of population genetics. Using computer based statistical analysis to average across the several blood group systems, they were able to produce a phylogenetic relationship of the various populations around the world.

In 1972 Richard Lewontin performed a statistical analysis of the data available on blood proteins. His results showed that the majority of genetic differences between humans, about 85%, were found within a population. 7% of genetic differences were found between populations within a race. Only 8% on average was found to differentiate the various races.

Defining race


The most widely used human racial categories are based on various combinations of visible traits such as skin color, eye shape and hair texture. However, many of these traits are non-concordant in that they are not necessarily expressed together. For example skin color and hair texture vary independently. This caused problems to early anthropologists who were attempting to classify race based on visible traits. Some examples of non-concordance include:
  • There are many people in Africa and all over the world affected by albinism who have very light skin.
  • Skin color varies all over the world in different populations. People from the Indian subcontinent are classified as Caucasian although most have dark skin.
  • Epicanthal fold are typically associated with East Asian populations but are found in populations all over the world, including many Native Americans, the Khoisan, the Saami, and even amongst some isolated groups such as the Andamanese.
  • Lighter hair colors are typically associated with Europeans, especially Northern Europeans, but blond hair is found amongst a limited, small number of the dark skinned populations of the south pacific, particularly the Solomon Islandsmarker and Vanuatumarker.

Genetic distance

The 0.1% genetic difference that differentiates any two random humans is still the subject of much debate. The discovery that only 8% of this difference separates the major races led some scientists to proclaim that race is biologically meaningless. They argue that since genetic distance increases in a continuous manner any threshold or definitions would be arbitrary. Any two neighboring villages or towns will show some genetic differentiation from each other and thus could be defined as a race. Thus any attempt to classify races would be imposing an artificial discontinuity on what is otherwise a naturally occurring continuous phenomenon.

However, other scientists disagree by claiming that the assertion that race is biologically meaningless is politically motivated and that genetic differences are significant. Neil Risch states that numerous studies over past decades have documented biological differences among the races with regard to susceptibility and natural history of a chronic disease. Effectively Neil Risch is attempting to redefine "race" for human populations to represent that small proportion of variation that is known to vary between continental populations. It is well established, that the level of differentiation between the continental human groups, as measured by the statistic FST is about 0.06-0.1 (6-10%), with about 5-10% of variation at the population level (that is between different populations occupying the same continent) and about 75-85% of variation within populations.(Risch et al., 2002; Templeton, 1998; Ossorio and Duster, 2005; Lewontin, 2005). Tempeton (1998) states that in biology a level of 0.25-0.3 (20-30%) of differentiation normally accepted in biological literature for a population to be considered a race or subspecies.
"A standard criterion for a subspecies or race in the nonhuman literature under the traditional definition of a subspecies as a geographically circumbscribed, sharply differentiated population is to have FST values of at least 0.25 to 0.3 (Smith et al. 1997).
Hence as judged by the criterion in the nonhuman literature, the human FST value is too small to have taxonomic significance under the traditional subspecies definition."(Templeton, 1998)
Indeed Neil Risch himself avoids defining race, when asked to respond to the comment "Genome variation research does not support the existence of human races.” he replied
What is your definition of races?
If you define it a certain way, maybe that's a valid statement.
There is obviously still disagreement....Scientists always disagree!
A lot of the problem is terminology.
I'm not even sure what race means, people use it in many different ways.(Gitschier, 2005)

Clusters controversy

A computer program called STRUCTURE is used by some scientists to determine clusters of human populations. It is a statistical program that works by placing individuals into one of two clusters based on their overall genetic similarity, many possible pairs of clusters are tested per individual to generate multiple clusters. These populations are based on multiple genetic markers that are often shared between different human populations even over large geographic ranges. The notion of a genetic cluster is that people within the cluster share on average similar allele frequencies to each other than to those in other clusters.(Edwards, 2003)

The results obtained by clustering analyses are dependent on several criteria:
  • The clusters produced are relative clusters and not absolute clusters, each cluster is the product of comparisons between sets of data derived for the study, results are therefore highly influenced by sampling strategies. (Edwards, 2003)
  • The geographic distribution of the populations sampled, because human genetic diversity is marked by isolation by distance, populations from geographically distant regions will form much more discrete clusters than those from geographically close regions. (Kittles and Weiss, 2003)
  • The number of genes used. The more genes used in a study the greater the resolution produced and therefore the greater number of clusters that will be identified.(Tang, 2005)

A study by Noah Rosenberg and Jonathan K. Pritchard, geneticists from the laboratory of Marcus W. Feldman of Stanford Universitymarker, assayed 377 polymorphisms (ie gene types) in more than 1,000 people from 52 ethnic groups in Africa, Asia, Europe and the Americas. They concluded that without using prior information about the origins of individuals, they were able to identify six main genetic clusters, five of which correspond to major geographic regions, and subclusters that often correspond to individual populations. The clusters corresponded to Africa, Europe and the part of Asia south and west of the Himalayas, East Asia, Oceania, the Kalash (of Pakistan) and the Americas. (Rosenberg, 2002 and Rosenberg, 2005)

study by Neil Risch in 2005 used 326 microsatellite markers and self-identified race/ethnic group (SIRE), white, African-American, Asian and Hispanic (individuals involved in the study had to choose from one of these categories), to representing discrete "populations", and showed distinct and non-overlapping clustering of the white, African-American and Asian samples. The results confirmed the integrity of self-described ancestry: "We have shown a nearly perfect correspondence between genetic cluster and SIRE for major ethnic groups living in the United States, with a discrepancy rate of only 0.14%." But also warned that: "This observation does not eliminate the potential for confounding in these populations. First, there may be subgroups within the larger population group that are too small to detect by cluster analysis. Second, there may not be discrete subgrouping but continuous ancestral variation that could lead to stratification bias. For example, African Americans have a continuous range of European ancestry that would not be detected by cluster analysis but could strongly confound genetic case-control studies." (Tang, 2005)

Additionally two studies of European population clusters have been produced. Seldin et al. (2006) identified three European clusters using 5,700 genome-wide polymorphisms. Bauchet et al. (2007) used 10,000 polymorphisms to identify five distinct clusters in the European population, consisting of a south-eastern European cluster (including samples from southern Italians, Armenian, Ashkenazi Jewish and Greek "populations"); a northern-European Cluster (including samples from German, eastern English, Polish and western Irish "populations"); a Basque cluster (including samples from Basque "populations"); a Finnish cluster (including samples from Finnish "populations") and a Spanish cluster (including samples from Spanish "populations"). Most "populations" contained individuals from clusters other than the dominant cluster for that population, there were also individuals with membership of several clusters. The results of this study are presented on a map of Europe. (Bauchet, 2007)

Race-based medicine

Because of the correlation between self-identified race and genetic clusters, medical treatments whose results are influenced by genetics often have varying rates of success between self-defined racial groups. For this reason, some doctors consider a patient’s race while attempting to identify the most effective possible treatment, and some drugs are marketed with race-specific instructions. However, because of the inexact nature of the correlation between self-defined races and genetic clusters, as well as because of the large amount of genetic variation within ethnic groups, the prevailing view among medical researchers is that when individual assessment of the relevant genes becomes available, it will probably prove more useful than race in medical decision-making.

Criticism of the clusters study

Though the authors of the study do not equate the clusters with race there are some who view the studies on clusters as evidence of the existence of biological races. Hence these studies have attracted considerable controversy. Critics argue that using genetic information to determine an individual's continent of origin is not a new concept. Using the ABO, RH and MNS blood groups, scientists in the 1950s could already determine continent of origin based on known frequencies of these traits.

Critics argue that any attempt to divide humanity will always produce artificial results. They point to the fact that in the study when six clusters were used an additional cluster (race) appeared which consisted solely of the Kalash of Pakistan. Several groups in the study also appeared in two races such as Ethiopians, Hazara of Pakistan, and Uyghur from Pakistan and western China. Joseph Graves argues that in the study the people sampled were from regions separated by large distances such as South African Bantu and Russians. He argues that if more people came from the regions that bridge the continents results may have been different. Examples such as Armenians would cluster both with Asia and Europe. Somaliansmarker or Yemenites may cluster both with Africa and Asia.

Others say the bulk of human variation is continuously distributed and, as a result, anycategorization schema attempting to meaningfully partition that variation will necessarilycreate artificial truncations. It is for this reason, they argue, that attempts to allocateindividuals into ancestry groupings based on genetic information have yielded varying resultsthat are highly dependent on methodological design.

Nicholas Wade, who often cites the work of clusters in articles for the New York Times, says that even if individuals can be assigned to continent of origin based on their genotype (genes), this is not an indication of phenotype. This is because the SNPs used in the clustering study are selectively neutral i.e. stretches of Junk DNA that have no known function. Since they do not code for any protein or have regulatory function, mutations can occur without interfering in normal cell function. Over time these mutations can accumulate much quicker in local populations and thus they can be used to identify continent of origin. Therefore these SNPS that can be used to differentiate continental populations are not known to influence intelligence, behavior, susceptibility to disease or ability in sports. Wade argues that it is possible that even though the sites used are nonworking sections of DNA, mutations in them may serve as a proxy for mutations in genes that influence intelligence and behaviour. However, he admits that at the moment there is no known relationship between mutations in junk DNA and mutations in genes.

Human genetic variation

Complexities of the human genome

Many human phenotypes are polygenic, meaning that they depend on the interaction among many genes. Polygeneity makes the study of individual phenotypic differences more difficult. Additionally, phenotypes may be influenced by environment as well as by genetics. The measure of the genetic role in phenotypes is heritability.

Different genes may also produce the same phenotype. For example the gene that causes light skin color in Europeans is different from the gene that causes light skin in East Asians. Europeans have a different version of the SLC24A5 than East Asians possibly indicating that they evolved light skin independently. A recent asthma study found that genes that defined susceptibility to asthma in blacks were different from the genes that defined susceptibility in whites, which were again different for the genes that defined susceptibility to asthma in Hispanics.

Epigenetic inheritance describes a phenomenon where traits are passed on to the next generation based on environmental effects or experience. These traits are inherited without being written into the DNA sequence. In some cases traits are passed on to the next generation by the switching off or on of various genes that are already present. The implication of this is that having the same genotype at a locus does not necessarily mean having the same phenotype.

Positive selection plays an important role in shaping genetic variation. Most notably is its role in influencing physical appearance. Dark skin appears to be under strong selection because the protein that causes it varies very little in African populations but is free to vary in populations found outside Africa. This indicates that dark skin was selected to protect against the harmful effects of UV radiation that cause birth defects due to destruction of vitamin b folate. UV radiation also causes sunburn and skin cancer. When people left the sun-intensive regions of Africa, the protein was free to vary and as a result, lighter skin color emerged in populations around the world.

Light skin color was probably an advantage in very cold and wet climates, for the manufacture of vitamin D by sun light, in the skin.

Immunoglobulins or antibodies are also under strong selection in response to local diseases. For example people who are duffy negative tend to have higher resistance to malaria. Most Africans are duffy negative and most non-Africans are duffy positive.

Native Americans are almost exclusively Blood group O at about 98%. Some scientists believe this widespread distribution indicates strong selection, possibly resistance to syphilis. During the European invasion of the Americas, millions of Native Americans were decimated because of diseases they were not immune to such as smallpox and influenza. Europeans had become resistant to these disease after suffering several series of deadly plagues (such as the Plague of Justinian and the Black death). In turn the Europeans contracted syphilis to which they had no immunity.

Other factors include genetic drift and founder effect.

Human to human total genetic variation is approximately 0.5%. Single-nucleotide polymorphisms (SNPs) are single base-pair DNA differences accounting for 0.1% variation. Of this 0.1% difference, 85% is found within any given population, 7% is found between populations within a continent and only 8% is found on average between the various continental populations. Based on this observation, evolutionary biologist Richard Lewontin has claimed that accurate racial classification of humans is impossible and can have no taxonomic utility. However, this view has been rejected by geneticist A. W. F. Edwards in his paper entitled Human Genetic Diversity: Lewontin's Fallacy (2003). Edwards argues that accurate classification of humans is possible because most of the data that distinguishes populations occurs in correlations between allele frequencies, although these classifications vary depending on a number of criteria, such as sampling strategy, type of locus, distribution of loci around the genome and number of loci. Nonetheless, Witherspoon et al. (2007) demonstrate that even when accurate classification of human populations is achieved, often individuals classified into different groups are more genetically similar to each other than to members of their own group. This seems to be due to the fact that multi-locus clustering does not take into account the genetic similarities between individuals, and only uses population level traits for comparison. They conclude that accurate classification of individuals drawn from a continuously varying human population may be impossible. Compared with most other species, the amount of genetic diversity among humans is relatively small. For example, two random chimpanzee are expected to differ by about 1 in 500 DNA base pairs, equivalent to double the diversity amongst humans. This may indicate that chimpanzees have existed as a species much longer than humans.

Ancestry-informative markers (AIMs) are stretches of DNA which have several polymorphisms that exhibit substantially different frequencies between different populations. Using AIMs, scientists can determine a person's ancestral continent of origin based solely on their DNA. AIMs can also be used to determine someone's admixture proportions.

Genetic distance

There are several methods used to model human genetic variation. Genetic distance is a measure used to quantify the difference between two populations in relation to the frequency of a particular trait. It is based on the principle that trait frequency indicates relatedness, and is measured by the difference in frequencies of a particular trait between two populations. For example, the frequency of Rh negative alleles is 50.4% among Basques and 41.2% among the French. Thus, the genetic distance between the Basques and the French in terms of the Rh(D) trait is calculated as 9.2%.

When the relative freqencies of any one trait are compared, the results often demonstrate no significant genetic difference between populations. For example, the frequency of the blood group B allele in Russia is the same as in Madagascar, yielding a 0% genetic distance. To offset these inexpressive results, average values of several polymorphic traits are compared together as clusters to estimate both genetic distances and phylogenetic relationships between populations.

Tree analysis

Linkage tree and distance matrix for 9 population clusters.
analysis attempts to reconstruct population separations and movements over time through the comparison of genetic distances for one or more traits. A landmark study by Cavalli-Sforza evaluated the genetic distances between 42 native populations from around the world based on 120 blood polymorphisms. These 42 populations can be grouped into 9 main clusters, which Cavalli-Sforza termed African (sub-Saharan), Caucasoid (European), Caucasoid (extra-European), Northern Mongoloid (excluding Arctic populations), Northeast Asian Arctic, Southern Mongoloid (mainland and insular Southeast Asia), New Guinean and Australian, and American (Amerindian). Though the clusters evidence varying degrees of homogeneity, the 9-cluster model represents a majority (80 out of 120) of single-trait trees and is useful in demonstrating the historic phylogenetic relationship between these populations.

Geographic analysis

Geographic analysis attempts to identify the places of origin of specific mutations and the possible selective factors involved in their spread. Genetic distance significantly correlates to geographic distance between populations, a phenomena referred to as "isolation by distance". Genetic distance can also be the result of physical boundaries which naturally restrict gene flow, such as islands, deserts, mountain ranges or dense forests.

In Cavalli-Sforza's geographic analysis of the above mentioned 42 populations, some admixed populations such as those of North Africa and West Asia (Non-European Caucasoid) were omitted for the purpose of simplicity.

The largest genetic distance between any two continents is between Africa and Oceania at 24.7. Based on physical appearance this may be counterintuitive, since Australians and New Guineans resemble Africans with dark skin and sometimes frizzy hair. This resemblance is probably an example of convergent evolution. This large figure for genetic distance reflects the relatively long isolation of Australia and New Guinea since the end of the last glacial maximum when the continent was further isolated from mainland Asia due to rising sea levels.

The next largest genetic distance is between Africa and the Americas at 22.6. This is expected since the longest geographic distance by land is between Africa and South America. The shortest genetic distance at 8.9 is between Asia and the Americas indicating a more recent separation.

Africa is the most genetically divergent continent, with all other groups being more related to each other than to Sub-Saharan Africans. This is expected in accordance with the Recent single-origin hypothesis. When the Non-European Caucasoids of Northern Africa and Western Asia are omitted from the analysis, Europe demonstrates the shortest genetic distance of all continents to Africa. However, this short distance is possibly the result of significant interaction and gene exchange between Africa and Europe in the not so distant past. Europe has a genetic variation in general about three times less than that of other continents, and the genetic contribution of Asia and Africa to Europe is thought to be 2/3 and 1/3 respectively.

Linguistic analysis

Linguistic analysis reveals a very strong correlation between populations and language families. As a general rule, the degree of genetic similarity between populations which belong to the same linguistic family is high. The notable exceptions to this rule are Lapps, Ethiopians and Tibetans, who are genetically associated with populations which speak languages belonging to different linguistic families. For example, the Lapps speak a Uralic language yet are genetically associated with populations which speak Indo-European languages. This kind of situation is thought to be a result of hybridization.

See also


  1. The Seven Daughters of Eve By Sykes, Bryan Chapter 3 ISBN 0393020185
  2. RACE - The Power of an Illusion . Background Readings | PBS
  3. "Genetic Similarities Within and Between Human Populations" (2007) by D.J. Witherspoon, S. Wooding, A.R. Rogers, E.E. Marchani, W.S. Watkins, M.A. Batzer and L.B. Jorde. Genetics. 176(1): 351–359.
  4. Racial Differences in the Response to Drugs — Pointers to Genetic Differences. New England Journal of Medicine, Volume 344:1393-1396, May 3, 2001.
  5. Bloche, Gregg M. Race-Based Therapeutics. New England Journal of Medicine, Volume 351:2035-2037, November 11, 2004.
  6. Drug information for the drug Crestor. Warnings for this drug state, "People of Asian descent may absorb rosuvastatin at a higher rate than other people. Make sure your doctor knows if you are Asian. You may need a lower than normal starting dose."
  7. Jordge, Lynn B. and Stephen P. Wooding. "Genetic Variation, classification and 'race'". Nature, Vol. 36 Num. 11, November 2004.
  8. Back with a Vengeance: the Reemergence of a Biological Conceptualization of Race in Research on Race/Ethnic Disparities in Health Reanne Frank
  9. understanding human genetic variation
  10. Cavalli-Sforza (1997:7721).
  11. Cavalli-Sforza (1994:80).
  12. Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa
  13. Cavalli-Sforza (1997:7720).
  14. Cavalli-Sforza (1997:7722).


  • Bauchet, M., Brian McEvoy, Laurel N. Pearson, Ellen E. Quillen, Tamara Sarkisian, Kristine Hovhannesyan, Ranjan Deka, Daniel G. Bradley, Mark D. Shriver. (2007) Measuring European Population Stratification using Microarray Genotype Data. Am J Hum Genet. 80(5):948-56. Abstract Full Article
  • Gitschier J (2005) The Whole Side of It—An Interview with Neil Risch. PLoS Genet 1(1): e14
  • Jackson, F. L. C. (2004). Book chapter: Human genetic variation and health: new assessment approaches based on ethnogenetic layering British Medical Bulletin 2004; 69: 215–235 DOI: 10.1093/bmb/ldh012. Retrieved 29 December 2006.
  • Keita, S. O. Y. (1993) The subspecies concept in zoology and anthropology, a brief historical review and test of a classification scheme. Journal of Black Studies 23:416-445.
  • Keita, S. O. Y., Kittles, R. A., Royal, C. D. M., Bonney, G. E., Furbert-Harris, P., Dunston, D. M., and Rotimi, C. M. (2004). Conceptualizing human variation: Nature Genetics 36, S17 - S20 (2004)
  • Kittles and Weiss RACE, ANCESTRY, AND GENES: Implications for Defining Disease Risk Annu. Rev. Genomics Hum. Genet. 2003. 4:33–67
  • Lewonin, R. C. (2005). Confusions About Human Races from Race and Genomics, Social Sciences Research Council. Retrieved 28 December 2006.
  • Leroy, A. M. (2005). A Family Tree in Every Gene. Published: March 14, 2005, The New York Times, p. A23. [111761]. Retrieved 8 January 2006.
  • Long and Kittles (2003). Human genetic variation and the nonexistence of human races: Human Biology, V. 75, no. 4, pp. 449–471. PDF. Retrieved 10 January 2007.
  • Miththapala, S., Seidensticker, J., O’Brien, S.J. (1996). "Phylogeographic Subspecies Recognition in Leopards (Panthera pardus)": Molecular Genetic Variation. Conservation Biology 10:1115-1132.
  • Ossario and Duster (2005). Race and Genetics: Controversies in Biomedical, Behavioral, and Forensic Sciences. American Psychologist 60(1):115–128.
  • Parra, Kittles and Shriver. (2004) Implicatins of correlations between skin color and genetic ancestry for biomedical researchNature Genetics Supplement 36: 11 S54-S60
  • Pigliucci, Kaplan On the Concept of Biological Race and Its Applicability to Humans [111762]
  • Rohde, Olson and Chang (2004) Modelling the recent common ancestry of all living humans Nature 431: 562-566 . Retrieved 5 March 2007.
  • Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, (2002) Genetic structure of human populations. Science 298: 2381–2385. Abstract
  • Rosenberg1, N. A., Mahajan, s., Ramachandran, S., Zhao, C., Pritchard, J. K., Feldman, M. W. (2005) Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure. PLoS Genet 1(6): e70:
  • Seldin, Michael F., Russell Shigeta, Pablo Villoslada, Carlo Selmi, Jaakko Tuomilehto, Gabriel Silva, John W. Belmont, Lars Klareskog, Peter K. Gregersen (2006): European Population Substructure: Clustering of Northern and Southern Populations. PLoS Genetics 2 (9): e143
  • Serre, D and Pääbö, S. (2004) Evidence for Gradients of Human Genetic Diversity Within and Among Continents Genome Research 14:1679-1685, 2004 . Retrieved 8 January 2006.
  • Tang, Hua., Tom Quertermous, Beatriz Rodriguez, Sharon L. R. Kardia, Xiaofeng Zhu, Andrew Brown, James S. Pankow, Michael A. Province, Steven C. Hunt,
  • Eric Boerwinkle, Nicholas J. Schork, and Neil J. Risch. (2005) Genetic Structure, Self-Identified Race/Ethnicity, and Confounding in Case-Control Association Studies. Am. J. Hum. Genet. 76:268–275. PDF
  • Witherspoon DJ, Wooding S, Rogers AR, Marchani EE, Watkins WS, Batzer MA, Jorde LB. (2007) Genetic similarities within and between human populations. Genetics. 176(1):351-9. Full Text

External links

Embed code:

Got something to say? Make a comment.
Your name
Your email address