Genetics of origin. Population composition

We continue our series of materials about genetics of origin. In a previous article, we talked about what haplogroups are, how they are studied, and how migration routes are built on them. Today, the Atlas will share information on populations: how they differ from peoples, why it is difficult to study their gene pool and how to determine membership in a population by a genetic test.



Why genetics use the concept of population


Instead of national or ethnicity, the concepts of population and population composition are used in genetics of origin. This is due to the fact that the concept of nationality refers more to political identification than to ethnicity.

Ethnicity or nationality is determined more by cultural norms, rather than genetics. Therefore, a person who grew up in a certain cultural environment can relate to one nation, and the actual origin of his ancestors may be different. Therefore, scientists talk about populations - groups that have existed for many generations and in which more than half of the marriages were within the group.

The population is easier to identify by geographical and ethnic characteristics, because usually marriages are made with nearby residents of the same group. Most peoples are concurrently a population. However, there are peoples in which more than 50% of marriages are with representatives of another group. They can not be attributed to the population.


Table with differences between people and populations. Source .


What is the difficulty of determining populations


Genetic differences between different groups of people are low compared to other primates. The genomes of chimpanzees in East and West Africa differ more from each other than the genomes of two other people on the planet, wherever they live. This is the difficulty in determining which population a person belongs to.

Another complication is that throughout history, people, especially Europeans, constantly migrated, entered into marriages with representatives of other populations, and the genes of parents were mixed. And the more people of one population moved and mixed with another, the more diverse DNA was obtained in the next generations and the more difficult it was to find model DNA for them.



Due to recombination, children may not inherit some variants of genes that are characteristic of a particular population.

Human DNA can carry information about populations of ancestors of different times. And those that lived recently, and those that lived hundreds of years ago. To determine what these populations were, sections of the user's chromosomes are compared with samples of representatives of different groups.


How do populations study


For sampling, principal component analysis (PCA) is used. This algorithm independently searches for patterns in genotyping data and allows you to split the samples into clusters in the N-dimensional space, usually two-dimensional. An example of visualization of such an analysis can be found here . With it, we sift out intermediate samples and select only those that are characteristic of a particular population.



In this example, you can see how clusters of different populations are formed. Intermediate options that fall into the zone between different clusters are eliminated.

Thus, we obtain clusters whose sizes and boundaries depend on the similarity of the samples within the group. With them, we compare the data of genotyping or genome-wide sequencing and attribute them to the most similar cluster.

Not all DNA is compared, but its individual pieces. For each of them, the closest sample from the base is selected. Since some sites may be similar in different populations, the data undergoes additional verification. We check all nearby sites. For example, if among several sites that belong to populations of Northern Europe, we find a sample from East Asia, then we will check it again.



The more samples a cluster contains, the more accurately the algorithm will determine populations. In addition, the accuracy depends on the source data. With genome-wide sequencing, we get more information for comparison than after genotyping. However, it is important to remember that even the Full genome does not give a 100% result. Samples collected from different populations most often contain genotyping data, since it is much cheaper. The algorithm for determining populations of origin will become more accurate as samples with the results of genome-wide sequencing appear.

There is an erroneous opinion that only the Y chromosome and mitochondrial DNA are analyzed to determine the population composition. That is, for women, the population composition can only be determined by the maternal line. This is not true. According to these data, only information on haplogroups can be obtained, but they do not apply to specific populations. For example, the haplogroup R1a, which is often found among Russians, is widespread among Western and Eastern Slavs, as well as among populations of North India.

A population cannot be associated with one haplogroup, because, as a rule, others are also common in it. However, they help to understand the history of the formation of the population as a whole. Read more about haplogroups in our previous article .


How does it look in your account


In the Geography section, the user sees the percentage ratio between parts of the world. For example, between Europe, Asia and Africa.



When you click the More button, the user goes to the population page. Here is a detailed percentage of each population. The map also shows the approximate area of ​​each group.



In the next article, the Atlas will tell you in detail about Neanderthals: what percentage of their genes is found in different populations, how their genes affect the health of modern man, and also what other ancient people lived at that time.

All Articles