Genome Similarity

23andme has a feature where you can find out how similar your genes are to your friends and family (who you are sharing with at the site). The result is a bar list with percentages showing similarity.

The number next to each person in the bar list is a measure of similarity. Specifically, it is the percentage of matching genotypes for all of the SNPs on our chip that are located in the genes or regions of interest.

If person 1 has AA and person 2 has AA at a particular position, they are 100% similar. If person 1 has AA and person 2 has AG, they are 50% similar at that position. If person 1 has AA and person 2 has GG, they are 0% similar at that position. We then average the percent similarity over all positions included in that comparison.

You can also calculate these similarity measures (IBS, Identical by State, distances) using plink if you have the genetic data for someone.

Discussing the expected similarity percentages, I figured that siblings and parents generally have similarity measures around the mid-80s. Usually for South Asians, it seems like their similarity percentage with other unrelated South Asians is close to 74%, especially for similar ethnic or geographic groups. Please note that this specific number 74% is valid for the specific set of SNPs included in the 23andme v2 chip. For v3 chip, people are getting higher numbers.

However, I looked far and wide and shared data with 81 people. My highest similarity percentage came out to be 73.22% with Amber, followed closely by a Bihari guy at 73.2% and a couple of other Punjabis. While most of my top matches are South Asian, with a large number of Punjabis, there is no particular pattern with several South Indians and Biharis matching highly too. My top non-South Asian matches are Iranians.

I expected my similarity percentages to be lower like they turned out to be due to my quarter non-South Asian ancestry. So that wasn’t a surprise.

I asked my parents and uncles and aunts about my great-grandmother’s ancestry. I knew she was from Egypt, but I found out that her ancestors had arrived in Egypt with Muhammad Ali Pasha. Since Muhammad Ali Pasha was an Albanian who worked for the Ottoman Sultanate, my relatives deduced that we have some Turk and/or Balkan ancestry.

So I asked a number of Turks, Southern Europeans and North Africans to share. I expected my similarity with them to be less than my similarity with South Asians, since 3/4 ancestry is more than 1/4. But I found something strange.

Let’s take any random person I am comparing my genes to who is not South Asian. If I compare how similar that person is to me against how similar he is to any South Asian, it turns out he’s more similar to the South Asian. This turns out to be true for similarity measures between me and all non-South Asians vs similarity measures between that non-South Asian and all South Asians. In brief, all East Africans, North Africans, Southern Europeans, Northern Europeans, Turks and Iranians (that are in my friends list) are more similar to all the South Asians among my friends list than they are to me.

I found this weird. If I had someone in my friends list who belonged to my great grandmother’s ethnicity or a closely related one, then I should be more similar to that friend than any random South Asian, instead of being the least similar like I was in all cases.

Clearly there was a need for better ancestry analysis in my case.

By Zack

Dad, gadget guy, bookworm, political animal, global nomad, cyclist, hiker, tennis player, photographer