Family DNA Results

I posted my genetic ancestry results. Now, we’ve got my parents, my sister and my wife tested with 23andme. So I thought a comparison would be interesting.

Here’s the ancestry painting from 23andme which uses three reference populations: Yoruba from Nigeria, Chinese and Japanese, and Utahns of Northwestern European descent.

Dad Mom Sister Me Wife
African 0.56% 0.95% 0.96% 0.34% 0.00%
Asian 8.68% 6.63% 8.00% 6.58% 10.18%
European 90.76% 92.42% 91.04% 93.09% 89.82%

You can basically use my wife as a sort of reference for Punjabi ancestry here (which is 3/4th of our ancestry too). Also, my wife and I are unrelated.

As you can see, while our results are close, my mom and sister have more African and I have the least.

And here are the similarity numbers for us with different reference populations.

Dad Mom Sister Me Wife
Central & South Asians 67.13 67.09 67.05 67.12 67.12
Northern Europeans 66.97 66.92 66.92 66.91 66.94
Southern Europeans 66.97 66.88 66.92 66.90 66.85
Near Easterners 66.85 66.76 66.81 66.79 66.72
Siberians 66.59 66.50 66.48 66.52 66.77
Eastern Asians 66.52 66.41 66.42 66.45 66.70
North Americans 66.48 66.40 66.38 66.44 66.69
South Americans 66.46 66.37 66.40 66.40 66.76
Oceanians 66.39 66.41 66.39 66.35 66.62
Northern Africans 66.17 66.10 66.15 66.13 65.94
Eastern Africans 64.08 64.06 64.11 64.10 63.89
Southern Africans 63.96 64.00 64.06 64.00 63.77
Central Africans 63.93 63.93 64.00 63.97 63.74
Western Africans 63.91 63.91 63.97 63.94 63.70

As compared to my wife, we are closer to Africans and farther from Eastern Asians, Native Americans (who are really a branch of East Asians) and Oceanians.That’s expected because of the 25% Egyptian ancestry we have.

Finally, here are our Dodecad Project results.

Dad Mom Sister Me Wife
East_European 4.96% 5.71% 4.59% 4.19% 6.28%
West_European 7.43% 9.59% 8.98% 8.97% 11.10%
Mediterranean 11.10% 9.28% 10.99% 9.24% 5.77%
Neo_African 1.36% 1.12% 1.45% 1.15% 0.26%
West_Asian 23.86% 22.41% 22.88% 23.88% 19.81%
South_Asian 33.94% 37.24% 33.15% 36.57% 45.64%
Northeast_Asian 2.53% 1.64% 1.79% 1.95% 3.22%
Southeast_Asian 3.04% 2.85% 3.95% 2.61% 3.35%
East_African 1.86% 2.18% 3.06% 2.30% 0.00%
Southwest_Asian 7.49% 5.57% 5.75% 6.57% 4.56%
Northwest_African 1.90% 1.49% 2.32% 1.57% 0.00%
Palaeo_African 0.53% 0.92% 1.10% 1.01% 0.00%

Similar results but interesting differences.

My Biogeographical Ancestry

There are several different ways to figure out your genetic ancestry. One way that 23andme shows your ancestry is by comparison with reference populations of the HGDP (Human Genome Diversity Project) dataset. I have listed how similar I am to various groups in the table below:

Reference Population Similarity Groups Included
Central & South Asians 67.14 Pathan, Makrani, Kalash, Hazara, Balochi, Sindhi, Brahui and Burusho
Northern Europeans 66.94 western Russia, France, Orkney Islands
Southern Europeans 66.93 northern Italy, Tuscany, Sardinia, French Basque
Near Easterners 66.82 Palestinian, Druze, Bedouin
Siberians 66.55 Yakut
Eastern Asians 66.48 Japan, Cambodia, China (Dai, Daur, Han, Hezhen, Lahu, Miaozu, Mongola, Naxi, Oroqen, She, Tu, Tujia, Uygu, Xibo, Yizu)
North Americans 66.47 Pima, Maya
South Americans 66.43 Surui, Karitiana, Piapoco, Curripaco
Oceanians 66.38 Papuans, Melanesians
Northern Africans 66.16 Mozabite
Eastern Africans 64.13 Kenya
Southern Africans 64.04 San, Bantu speaking South Africans
Central Africans 64.01 Biaka, Mbuti Pygmies
Western Africans 63.98 Mandenka, Yoruba

My numbers are not too different from anyone from the northwestern part of the South Asian subcontinent.

One thing to consider over here is that you are being compared to a specific set of populations. As you can see, there is no Indian references here. Similarly, Near Easterners are represented only by samples from Israel and North Africans by one Algerian population. I wonder what the case would be if they had Egyptians or Ethiopians etc in their reference.

Another way to look at your genetic ancestry is with a PCA (Principal Component Analysis) plot. With the same reference populations mentioned above, 23andme calculated the two dimensions of largest variation among that data. These two axes don’t completely describe the variation across the samples, but being the two largest components they can be used to project your genetic data in that space. At the world level, I am the green marker in the middle of the Central/South Asian cluster.

In the South Asian PCA plot, I am in the middle of the Pathan cluster and right at the top edge of the Sindhi one.

Now this doesn’t make me a Pathan. For one thing, 23andme’s reference populations do not have any Punjabis. I am sharing with a number of North Indians and Pakistanis, including several Punjabis, and they all lie around me in the plot.

There is another problem with a PCA plot though. We are looking at the two most significant dimensions, but there are other dimensions too and they combined together could account for a lot of the variation among people’s genomes. Also, let’s say we have someone who is a child of a European and an East Asian parent. Now that person, who is 50% East Asian and 50% European, would be placed about midway between the East Asian and European clusters. That’s where the Uygur and Hazara clusters are. So we can’t say that someone is Uygur just because they are placed in the Uygur cluster in a PCA plot.

There are other ways to look at your genetic ancestry and I have been exploring a bunch of them. We’ll talk about them next.

Ancestry Painting

23andme has a feature called ancestry painting which gives you the percentages of different populations you are admixed from. In my case, I got the following:

European 91.22%
Asian 8.69%
African 0.09%

Ignore the precision (to 2 decimal places). I showed that because I wanted to highlight the nonzero African ancestry percentage which I will talk about in more detail some other day.

The ancestry painting also shows you the segments on your chromosomes and which ancestral group you inherited them from. Here’s an image showing mine:

My 23andme ancestry painting

Does this mean I am 91% European and 9% Asian? Not quite! My results are about typical for someone from Punjab.

Also, the results depend on which reference populations were used as the exemplar European, Asian and African populations.

23andMe takes advantage of publicly available data for four populations studied extensively via the International HapMap project (hapmap.org). That project obtained the genotypes for 60 individuals of western European descent from Utah, 60 western African individuals from Nigeria, and 90 eastern Asian individuals, 45 from each of Japan and China. Because the two eastern Asian populations are geographically near one another and relatively similar at the genetic level, 23andMe combines these to form a single eastern Asian reference population. For more information on why these regions were used, please see (Why are these three populations used?)

So they are comparing your DNA segments to those of the three populations from the HapMap dataset. Using more reference populations would give you more fine-grained results (which is something I plan to do in my Harappa Ancestry Project).

Using the technique described by Eurogenes, you can check which chromosomal segments are classified as European (C), Asian (A) or African (Y). Here are my results for chromosome 9:

Chromosome Segment Ancestry
9 36587, 97974029 CC
9 97976425, 99363907 AC
9 99367419, 100530260 CC
9 100536329, 104679442 AC
9 104680472, 106598880 CC
9 106602625, 108990980 AC
9 108993234, 133447401 CC
9 133447580, 138437690 AC
9 138443022, 140147760 CC

The number of Asian segments on my homozygous chromosome 9 makes me doubt that it comes from my Egyptian great-grandmother. May be it’s from my great-grandfather.

The African segments are on chromosome 8:

Chromosome Segment Ancestry
8 154984, 4074371 CY
8 140917074, 142173290 CY

I need to do ancestry painting on my own in more detail.

Paternal and Maternal Lines

We men inherit the Y chromosome from our fathers who got it from their fathers. So the Y chromosome can be used to trace your paternal lineage. Different sequences of alleles and mutations can be assigned to haplogroups where a haplogroup signifies common descent on the uniparental line.

According to my 23andme results, I belong to the paternal haplogroup R1a1a. This group is very common in Eastern Europe as well as South Asia. The distribution of R1a1a can be seen in the map below.

Similarly, we all inherit mitochondrial DNA from our mothers. The sequence of alleles and mutations on the mitochondrial DNA (mtDNA) is also organized into phylogenetic tree.

I can trace my maternal line to Egypt (my great-grandmother) and thus I expected a maternal haplogroup common in the eastern Mediterranean. It turns out I belong to haplogroup H, which everyone and their mother belong to in Europe as can be seen in this map.

According to Wikipedia,

Haplogroup H is the most common mtDNA haplogroup in Europe. About one half of Europeans are of mtDNA haplogroup H. The haplogroup is also common in North Africa and the Middle East. The majority of the European populations have an overall haplogroup H frequency of 40%–50%. Frequencies decrease in the southeast of the continent, reaching 20% in the Near East and Caucasus, 17% in Iran, and <10% in the Persian Gulf, Northern India and Central Asia.

Since 23andme didn’t tell me which subgroup of H I belonged to, I used mthap by James Lick:

Your rCRS differences found:

HVR2: 263G
CR: 750G 1438G 4769G 15326G
HVR1: (16519C)

Best mtDNA Haplogroup Matches:

1) H
2) H26
2) H(16192)
2) H35
2) H24
2) H10
2) H25
2) H(195)
2) H33
3) H19

Amber’s maternal haplogroup is M4a, which is mainly found in South Asia.

You can see the Y-DNA haplogroup tree and the mtDNA tree online.

Harappa Ancestry Project

I have become interested (some would say obsessed) with genetics recently. I wrote about getting my DNA test done and there’s a lot more about my own results that I plan to bore you with.

One fun application of genetic testing is inferring ancestry: Which ancestral group are you descended from? Can we estimate the admixture of the different population groups you are descended from?

Most DNA testing companies provide information about ancestry and genetic genealogy has taken off. With several genome databases (HapMap, HGDP, etc) and software (like plink, admixture, Structure) publicly available, the days of the genome bloggers are here. And I am trying to be the latest one.

In starting this project, I have been inspired by the Dodecad Ancestry Project by Dienekes Pontikos and Eurogenes Ancestry Project by David Wesolowski. The catalyst for this project was my friend Razib who I bug whenever I need to talk genetics.

What is Harappa Ancestry Project?
It is a project to analyze (autosomal) genetic data of participants of South Asian origin for the purpose of providing detailed ancestry information. So the focus of the project is on South Asians: Indians, Pakistanis, Bangladeshis and Sri Lankans.

The project will collect 23andme raw genetic data from participants to better understand the ancestry relationships of different South Asian ethnicities.

I have named it after Harappa, an archaeological site of the Indus Valley Civilization in Punjab, Pakistan.

Participation
People of South Asian origin, or from neighboring countries, are eligible to participate. The list of countries of origin I am accepting are as follows:

  • Afghanistan
  • Bangladesh
  • Bhutan
  • Burma
  • India
  • Iran
  • Maldives
  • Nepal
  • Pakistan
  • Sri Lanka
  • Tibet

Right now, I am only accepting raw data samples from people who have tested with 23andme.

Please do not send samples from close relatives. I define close relatives as 2nd cousins or closer. If you have data from yourself and your parents, it might be better to send the samples from your parents (assuming they are not related to each other) and not send your own sample.

If you are unsure if you are eligible to participate, please send me an email (harappa@zackvision.com) to inquire about it before sending off your raw data.

What to send?
Please send your All DNA raw data text file (zipped is better) downloaded from 23andme to harappa@zackvision.com along with ancestral background information about you and all four of your grandparents. Background information would include where they were born, mother tongue, caste/community to which they belonged, etc. Please provide as much ancestry information as possible and try to be specific. Do especially include information about any ancestry from outside South Asia.

Data Privacy
The raw genetic data and ancestry information that you send me will not be shared with anyone.

Your data will be used only for ancestry analysis. No analysis of physical or health/medical traits will be performed.

The individual ancestry analysis published on this blog will be done using an ID of the form HRPnnnn known to only you and me.

What do you get?
All results of ancestry analysis (individual and group) will be posted on this blog under the Harappa Ancestry Project category. This will include admixture analysis as well as clustering into population groups etc.

I suggest you read about Dienekes’ analysis on South Asians for an idea about what to expect.

You can access all blog posts related to this project from the Harappa Ancestry Project link on the navigation menu on every page of my website. You can also subscribe to the project feed.

Personal Genomics: DNA Test

Last year in April, 23andme were having a sale for DNA Day, selling their 550,000 SNP test with ancestry and health information for $99 instead of its regular price of $499 at that time. So I decided to take the plunge and sent my spit from the East coast to the West to be analyzed.

Then 23andme had another sale ($99 again but with the catch of a minimum of a year of $5/month subscription in addition), I got my wife and my sister to do it on 23andme’s new version 3 genotyping chip with more than a million SNPs.

I got my results in May 2010 and have been having fun with them since. So let’s take a look.

There are reports for your genetic risk of a bunch of diseases. Those are interesting and useful in some cases, but there is still a lot of work to be done in the area of genetic associations of diseases and for now except for a few important discoveries, family history is probably a better predictor of your disease risk than genetic testing. Oh yeah and there are a couple of scary-looking numbers in my reports.

The health reports also show carrier status and drug response.

In terms of other traits, it’s mostly information I already knew like:

  • I can taste bitter tastes
  • I have wet earwax
  • My eye color is brown
  • I have curlier than average hair

One thing that was a surprise was that I am likely to be lactose intolerant. It’s possible I am somewhat tolerant due to environmental reasons.

Since I wanted more analysis than the 23andme reports gave, I downloaded Promethease which is a freeware software which uses all the information at SNPedia to create a report about your SNPs and what features, traits and health factors are influenced by them. The report it generates is long and interesting, though not formatted very well.

PS. Yes, this is the sort of topic I alluded to in my return announcement.

While there is more navel-gazing coming (mostly about ancestry and genetics), there’s going to be posts of more general interest. Let me just go ahead and say that the friend Razib mentioned is me.