Tag Archives: 23andme

Outbreeding Works in a Single Generation

As I have mentioned before, cousin marriages were fairly common among my family. My parents are first cousins. So are my father’s parents. My mother’s parents are second cousins once removed. So instead of 32 great-great-great grandparents, I have only about 18.

Since my wife and I are not related, I wondered how my inbred genome had transmitted to our daughter.

Using David Pike’s ROH utility, I computed the regions of homozygosity for my parents, me, my wife, and my daughter, all tested by 23andme.

I used the default settings for the utility. The total Mb gives the total size in megabases of the long autosomal regions where both alleles are the same. The longest ROH gives the size of the longest such region. Percent Homozygous is the percentage of the genome where the two alleles are the same.

I included the worst chromosome column because of my chromosome 9, which is beyond crazy. This column gives the percent homozygosity of the worst chromosome.

Person Total Mb Longest ROH (Mb) % Homozygous Worst chromosome (%)
Dad 297.45 57.4 72.498% 76.921%
Mom 112.13 22.99 70.662% 79.802%
Me 402.78 71.38 73.588% 93.542%
Wife 37.33 9.64 70.003% 72.411%
Daughter 42.40 8.82 69.936% 71.759%

As you can see, my Dad has higher levels of homozygosity than my Mom as expected and I have the highest levels. My wife is not inbred at all and our daughter has ROH results about the same as my wife. So one generation of marrying someone unrelated, even if from the same/similar ethnicity, has removed all the long runs of homozygosity bred over generations. Good news!

Related Reading

Zack Ajmal Phased Genome

A few months ago, I made my DNA genotyping results from 23andme public.

Since I got results for both my parents as well, I have now used BEAGLE to phase my genetic data. In simple words, I have been able to separate the contribution of my Dad and my Mom on my DNA.

I am making my phased genome public too. It’s in Plink format.

I haven’t made much use of the phased genome yet. So if you have any ideas about what can be done with a phased genome, please let me know.

I have also pledged to make my full sequenced genome public when genome sequencing becomes cheaper and I get it done.

Related Reading

Family DNA Results

I posted my genetic ancestry results. Now, we’ve got my parents, my sister and my wife tested with 23andme. So I thought a comparison would be interesting.

Here’s the ancestry painting from 23andme which uses three reference populations: Yoruba from Nigeria, Chinese and Japanese, and Utahns of Northwestern European descent.

Dad Mom Sister Me Wife
African 0.56% 0.95% 0.96% 0.34% 0.00%
Asian 8.68% 6.63% 8.00% 6.58% 10.18%
European 90.76% 92.42% 91.04% 93.09% 89.82%

You can basically use my wife as a sort of reference for Punjabi ancestry here (which is 3/4th of our ancestry too). Also, my wife and I are unrelated.

As you can see, while our results are close, my mom and sister have more African and I have the least.

And here are the similarity numbers for us with different reference populations.

Dad Mom Sister Me Wife
Central & South Asians 67.13 67.09 67.05 67.12 67.12
Northern Europeans 66.97 66.92 66.92 66.91 66.94
Southern Europeans 66.97 66.88 66.92 66.90 66.85
Near Easterners 66.85 66.76 66.81 66.79 66.72
Siberians 66.59 66.50 66.48 66.52 66.77
Eastern Asians 66.52 66.41 66.42 66.45 66.70
North Americans 66.48 66.40 66.38 66.44 66.69
South Americans 66.46 66.37 66.40 66.40 66.76
Oceanians 66.39 66.41 66.39 66.35 66.62
Northern Africans 66.17 66.10 66.15 66.13 65.94
Eastern Africans 64.08 64.06 64.11 64.10 63.89
Southern Africans 63.96 64.00 64.06 64.00 63.77
Central Africans 63.93 63.93 64.00 63.97 63.74
Western Africans 63.91 63.91 63.97 63.94 63.70

As compared to my wife, we are closer to Africans and farther from Eastern Asians, Native Americans (who are really a branch of East Asians) and Oceanians.That’s expected because of the 25% Egyptian ancestry we have.

Finally, here are our Dodecad Project results.

Dad Mom Sister Me Wife
East_European 4.96% 5.71% 4.59% 4.19% 6.28%
West_European 7.43% 9.59% 8.98% 8.97% 11.10%
Mediterranean 11.10% 9.28% 10.99% 9.24% 5.77%
Neo_African 1.36% 1.12% 1.45% 1.15% 0.26%
West_Asian 23.86% 22.41% 22.88% 23.88% 19.81%
South_Asian 33.94% 37.24% 33.15% 36.57% 45.64%
Northeast_Asian 2.53% 1.64% 1.79% 1.95% 3.22%
Southeast_Asian 3.04% 2.85% 3.95% 2.61% 3.35%
East_African 1.86% 2.18% 3.06% 2.30% 0.00%
Southwest_Asian 7.49% 5.57% 5.75% 6.57% 4.56%
Northwest_African 1.90% 1.49% 2.32% 1.57% 0.00%
Palaeo_African 0.53% 0.92% 1.10% 1.01% 0.00%

Similar results but interesting differences.

Related Reading

Genome in the Wild

I tested with 23andme in April 2010 and then upgraded to their version 3 chip with almost a million SNPs last Christmas.

Now I am releasing my personal genome in the public domain.

CC0
To the extent possible under law, Zack Ajmal has waived all copyright and related or neighboring rights to Zack Ajmal 23andme v3 Genome. This work is published from: United States.

You can download my genome data in zipped files:

Razib has a list of people who have made their 23andme genomes public.

When Blaine Bettinger released his genome into the public domain, he issued a challenge:

So, I’m challenging everyone who reads this to download my data and analyze it to find the most interesting or surprising results. For example, you could use my most recent 23andMe V3 data.

I’ve already done a fair amount of analysis myself, including the Promethease reports above (and see here), and a recent blog post about my vastly increased Type 2 Diabetes risk. However, perhaps there’s a recent but relatively study that applies, or perhaps there’s a story you can weave with a handful of SNPs. Or, even better, what can you tell me about my ancestry other than mtDNA and Y-DNA haplogroups? Don’t worry about the strength of the study, reproducibility, etc. – I’m aware of the uncertainties associated with this type of research, and my goal here is to make people aware of possibilities.

Please post your findings in the comments below, and in two weeks I’ll pick the most surprising or interesting findings and make them the focus of a new blog post.

Can you surprise me with my own genome?

I have done a fair amount of analysis on my genome. For example, here’s my Promethease report. My ID is DOD128 in Dodecad, PKEG1 in Eurogenes and HRP0001 in Harappa.

My challenge for you would be to find interesting information about my chromosome 9 which is 93% homozygous.

If you analyze my genome, it would be great if you could let me know about what you found as I am always hungry for more information.

Related Reading

Dodecad Oracle

Dodecad has come up with a new version (v3) of its admixture results. Here are my results:

South Asian 37.4%
West Asian 23.3%
Mediterranean 9.8%
West European 9.6%
Southwest Asian 6.2%
East European 3.5%
Southeast Asian 2.4%
East African 2.2%
Northeast Asian 1.9%
Northwest African 1.5%
Neo African 1.1%
Palaeo African 1.0%

Dodecad also has a fun tool to check one’s results against different population averages. My closest populations are:

Population Distance
1 Pathan 7.2021
2 Bene Israel Jews 8.6822
3 Sindhi 10.0479
4 Punjabi Arain 10.0926
5 Kashmiri Pandit 10.5778
6 Burusho 11.179
7 Balochi 11.6705
8 Brahui 13.0208
9 Makrani 15.6735
10 Cochin Jews 18.1403

If I make use of mixed mode, the tool tries to find a combination of two ethnic groups with differing percentages that fits my results best.

Two Population Mix Distance
1 17.3% Palestinian + 82.7% Sindhi 3.0122
2 17% Morocco Jews + 83% Sindhi 3.1181
3 17.3% Palestinian + 82.7% Punjabi Arain 3.1228
4 17.2% Egypt + 82.8% Punjabi Arain 3.1846
5 82.9% Sindhi + 17.1% Egypt 3.288
6 17% Lebanese + 83% Sindhi 3.4994
7 16.7% Jordanians + 83.3% Sindhi 3.5238
8 16.7% Jordanians + 83.3% Punjabi Arain 3.5608
9 15.8% Samaritians + 84.2% Sindhi 3.6356
10 16.9% Ashkenazi + 83.1% Sindhi 3.7077

This actually fits reasonably well with my actual ancestry (75% Punjabi + 25% Egyptian).

Related Reading

Genetics and Health

When the doctor told me I had Ureterolithiasis, I logged into 23andme to check my genetic risk. There was only one SNP (rs4293393) listed there. The G allele increased the risk 14% but I have AA, so typical odds.

Next step was checking SNPedia where I found 8 SNPs, of which some are given below.

rs219780 (23andme): The high risk is CC but I wasn’t genotyped at this location. Also, CC is the most common, so it is quite likely that I have that.

rs219778 (23andme): Carriers of TT have a slightly increased risk and that’s what I have.

rs9310709 (23andme): Risk allele is C and I have CC.

rs10941694: I was not genotyped.

rs13070584: I was not genotyped.

More important than these though is the simple fact that my Dad had it too. Thus if there is a genetic association, I am likely to be higher than typical risk.

Related Reading

Punjabi and What?

Looking at the admixture analysis for my Harappa Ancestry Project, I count 6 Punjabis (not including my sister and I). Let’s average those six and compare them to me and my sister.

Ancestral Pop Average Punjabi Me My sister NSA1 NSA2
South Asian 46% 38% 35% 7% 7%
Balochistan/Caucasus 35% 35% 34% 32% 30%
Kalash 5% 5% 2% -1% 0%
Southeast Asian 1% 0% 1% 1% 1%
Southwest Asian 0% 10% 12% 43% 40%
European 11% 6% 7% -7% 0%
Papuan 0% 0% 1% 2% 2%
Northeast Asian 0% 0% 0% 0% 0%
Siberian 2% 2% 2% 4% 3%
Eastern Bantu 0% 0% 0% 0% 0%
West African 0% 1% 2% 6% 6%
East African 0% 3% 3% 12% 11%

As you can see, the major difference between the Punjabis and me is my Southwest Asian and African percentages.

Now, I know that a quarter of my ancestry is not South Asian. So let’s try to estimate the admixture percentages for that ancestry. Being three-quarters Punjabi (I think), let’s average my sister’s and my results and then subtract 3/4th of the average Punjabi results from them. Then, multiply by 4 and you get NSA1 in the table above, which is supposedly my non-South-Asian ancestor.

Since some of the percentages are negative, I set those to zero and rescaled all the others so that they summed to 100%. That is NSA2.

I computed the average admixture results for a bunch of reference populations. Let’s compare NSA1 and NSA2 to those populations.

The top five populations most similar to NSA1 are:

  1. Yemenese
  2. Jordanians
  3. Palestinian
  4. Egyptians
  5. Syrians

For NSA2, we get the same five populations but the Egyptians and Palestinians exchange places.

This shows roughly that my quarter ancestry is most likely from the Middle East region of Egypt, Arabia or the Levant.

Let’s look at it another way. If I know that I have Punjabi and Egyptian ancestry, I can use the average Punjabi and average Egyptian admixture percentages to calculate my percentage of both ancestries since that has to sum to 100%. So:

Zack = p * Punjabi + (1-p) * Egyptian

And we solve for p using least squares.

I got 81.3% Punjabi for myself and 75.8% for my sister. On average, that’s 78.5% Punjabi and 21.5% Egyptian, which is pretty close to our genealogical information.

Related Reading

Harappa Clustering

As I had computed the admixture percentages for myself, my sister and other participants in my Harappa Ancestry Project, I decided to do some clustering analysis on them to see which persons clustered together. The resulting tree for a hierarchical clustering is as follows. It shows which persons are the most similar.

I am HRP0001 and my sister is HRP0035. As expected, we cluster together and then with a half-Sindhi half-Balochi guy and finally with all the Punjabis.

Since I have a lot of reference populations in my data, I did the same cluster analysis using the average admixture results for each reference analysis. Here’s the section of the tree containing my sister and me.

This time we cluster with the Bene Israel, a Jewish tribe from Bombay, India, though our similarity with them is not that great. Then with the Punjabis, Sindhis and Pathan.

Doing the same analysis with individual samples from my references,

A weak clustering with Burusho!

If I use PCA (Principal Component Analysis) results to compute hierarchical clusters, you can see that I am an outlier among the South Asian participants.

If you look at my PCA coordinates, you’ll realize that among the 365 South Asians I used in the analysis, I am one of the five complete outliers.

However, when I use model-based clustering on the PCA results, I end up in a really weird, loose cluster (CL9) with a Kashmiri, 5/21 Balochis, 2 Bene Israel Jews, 3/23 Brahui, 1/25 Burusho, 8/17 Makranis, 2/21 Pathans and 3/22 Sindhis. This is mostly a group of outliers and those who have some African admixture.

Related Reading

My Harappa Project Results

I have been blogging up a storm on Harappa Ancestry Project with more than 50 posts since I last linked to it.

Let’s see what I have found out about myself there. Here are the admixture results for me and my sister:

Me My sister
South Asian 37.9% 34.8%
Balochistan/Caucasus 34.7% 34.2%
Southwest Asian 9.8% 12.3%
European 5.9% 7.2%
Kalash 4.5% 2.3%
East African 3.1% 3.0%
West African 1.4% 1.8%
Siberian 2.4% 1.8%
Papuan 0.2% 1.0%
Northeast Asian 0.2% 0.3%
Southeast Asian 0.0% 1.3%
Eastern Bantu 0.0% 0.0%

You can see the results of all the participants in a spreadsheet or in a nice interactive bar chart. I am HRP0001 and my sister is HRP0035.

Interestingly, both McDonald and 23andme ancestry painting show my sister to have more African admixture than me, but here I have about the same East African as her and even her West African percentage is only a tiny bit higher.

To figure out what these ancestral populations mean, do read the post about the reference population analysis.

Related Reading

Your Genes, Regulated?

The FDA had a meeting the last two days:

FDA is convening this two-day meeting to seek the Panel’s expert opinion and input on scientific issues concerning Direct to Consumer (DTC) genetic tests that make medical claims.

This meeting is focused specifically on issues regarding clinical genetic tests that are marketed directly to consumers (DTC clinical genetic tests), where a consumer can order tests and receive test results without the involvement of a clinician.

The American Medical Association of course wants to limit genetic testing so that you would need a doctor to supervise everything.

We urge the Panel to offer clear findings and recommendations that genetic testing, except under the most limited circumstances, should be carried out under the personal supervision of a qualified health care professional, and provide individuals interested in obtaining genetic testing access to qualified health care professionals for further information.

23andme had two presentations at the meeting which they have posted on their blog.

In our presentations, we take the position that all genetic testing services, whether ordered by a physician or offered through direct access, should adhere to the same standards. We simultaneously request that the FDA consider redefining and establishing regulatory standards, including some fundamental definitions, to accommodate large-scale genetic testing and support innovation of its technologies and applications. We also request that regulation be based upon evidence and not fear of potential harm to individuals which, to date, has not been demonstrated. In fact, growing numbers of participating individuals and independent studies focused on this issue provide preliminary evidence that the vast majority of people understand the information presented and experience no significant negative effects.

Genomics Law Report had an overview of the issues beforehand as well as a Twitter roundup of the meeting. Here are his thoughts after the first day:

First and foremost, I fully expect the MCGP (Molecular and Clinical Genetics Panel) to note, likely more than once, that given the complexity of the questions put to it by the FDA it should be afforded far more time to deliberate and research prior to making any recommendations.

If taking time out for further debate isn’t an option, what is the MCGP likely to recommend? Based on today’s deliberations, I think it’s a safe bet that the MCGP will advise the FDA to (1) demand clear proof of analytical and clinical validity for all genetic tests and (2) require that most, or perhaps even all, genetic tests with demonstrated or potential clinical significance be (to use the FDA’s terminology) “routed through a clinician.”

In other words, I think the odds strongly favor an MCGP recommendation to the FDA that clinical (as defined by the FDA, which is itself a separate issue) direct-to-consumer genetic testing, when offered without a requirement that a clinician participate in the ordering, receipt and interpretation of the test, be removed from the marketplace. At least for the time being.

If you read my blog, you probably know my politics as being quite liberal. I do, however, think that any regulations have to be shown to have actual tangible benefit and prevention of harm. Simple misinterpretation of genetic results by a regular joe causing hypothetical harm is not enough justification.

So what can you do? Razib Khan is already on the task.

1) I am going to release my own 23andMe sequence into the public domain soon. I encourage everyone to download it. I would rather have someone off the street know my own genetic information than be made invisible by the government. That is my right. For now that right is not barred by law. I will exercise it.

2) Spread word of this video via social networking websites and twitter. The media needs to get the word out, but they only will if they know you care. Do you care? I hope you do. This is a power grab, this is not about safety or ethics. If it was, I assume that the “interpretative services” would be provided for free. I doubt they will be.

3) Contact your local representative in congress. I’ve never done this myself, but am going to draft a quick note. They need to be aware that people care, that this isn’t just a minor regulatory issue.

4) The online community needs to get organized. We’re not as powerful as a million doctors and a Leviathan government, but we have right on our side. They’re trying to take from us what is ours.

5) Plan B’s. We need to prepare for the worst. Which nations have the least onerous regulatory regimes? Is genomic tourism going to be necessary? How about DIYgenomics? The cost of the technology to genotype and sequence is going to crash. I know that the Los Angeles DIYbio group has a cheap cast-off sequencer. For those who can’t afford to go abroad soon we’ll be able to get access to our information in our homes. Let’s prepare for that day.

Here are the links to contact your House Representative and your Senators.

Related Reading