Punjabi and What?

Looking at the admixture analysis for my Harappa Ancestry Project, I count 6 Punjabis (not including my sister and I). Let’s average those six and compare them to me and my sister.

Ancestral Pop Average Punjabi Me My sister NSA1 NSA2
South Asian 46% 38% 35% 7% 7%
Balochistan/Caucasus 35% 35% 34% 32% 30%
Kalash 5% 5% 2% -1% 0%
Southeast Asian 1% 0% 1% 1% 1%
Southwest Asian 0% 10% 12% 43% 40%
European 11% 6% 7% -7% 0%
Papuan 0% 0% 1% 2% 2%
Northeast Asian 0% 0% 0% 0% 0%
Siberian 2% 2% 2% 4% 3%
Eastern Bantu 0% 0% 0% 0% 0%
West African 0% 1% 2% 6% 6%
East African 0% 3% 3% 12% 11%

As you can see, the major difference between the Punjabis and me is my Southwest Asian and African percentages.

Now, I know that a quarter of my ancestry is not South Asian. So let’s try to estimate the admixture percentages for that ancestry. Being three-quarters Punjabi (I think), let’s average my sister’s and my results and then subtract 3/4th of the average Punjabi results from them. Then, multiply by 4 and you get NSA1 in the table above, which is supposedly my non-South-Asian ancestor.

Since some of the percentages are negative, I set those to zero and rescaled all the others so that they summed to 100%. That is NSA2.

I computed the average admixture results for a bunch of reference populations. Let’s compare NSA1 and NSA2 to those populations.

The top five populations most similar to NSA1 are:

  1. Yemenese
  2. Jordanians
  3. Palestinian
  4. Egyptians
  5. Syrians

For NSA2, we get the same five populations but the Egyptians and Palestinians exchange places.

This shows roughly that my quarter ancestry is most likely from the Middle East region of Egypt, Arabia or the Levant.

Let’s look at it another way. If I know that I have Punjabi and Egyptian ancestry, I can use the average Punjabi and average Egyptian admixture percentages to calculate my percentage of both ancestries since that has to sum to 100%. So:

Zack = p * Punjabi + (1-p) * Egyptian

And we solve for p using least squares.

I got 81.3% Punjabi for myself and 75.8% for my sister. On average, that’s 78.5% Punjabi and 21.5% Egyptian, which is pretty close to our genealogical information.

Harappa Clustering

As I had computed the admixture percentages for myself, my sister and other participants in my Harappa Ancestry Project, I decided to do some clustering analysis on them to see which persons clustered together. The resulting tree for a hierarchical clustering is as follows. It shows which persons are the most similar.

I am HRP0001 and my sister is HRP0035. As expected, we cluster together and then with a half-Sindhi half-Balochi guy and finally with all the Punjabis.

Since I have a lot of reference populations in my data, I did the same cluster analysis using the average admixture results for each reference analysis. Here’s the section of the tree containing my sister and me.

This time we cluster with the Bene Israel, a Jewish tribe from Bombay, India, though our similarity with them is not that great. Then with the Punjabis, Sindhis and Pathan.

Doing the same analysis with individual samples from my references,

A weak clustering with Burusho!

If I use PCA (Principal Component Analysis) results to compute hierarchical clusters, you can see that I am an outlier among the South Asian participants.

If you look at my PCA coordinates, you’ll realize that among the 365 South Asians I used in the analysis, I am one of the five complete outliers.

However, when I use model-based clustering on the PCA results, I end up in a really weird, loose cluster (CL9) with a Kashmiri, 5/21 Balochis, 2 Bene Israel Jews, 3/23 Brahui, 1/25 Burusho, 8/17 Makranis, 2/21 Pathans and 3/22 Sindhis. This is mostly a group of outliers and those who have some African admixture.

My Harappa Project Results

I have been blogging up a storm on Harappa Ancestry Project with more than 50 posts since I last linked to it.

Let’s see what I have found out about myself there. Here are the admixture results for me and my sister:

Me My sister
South Asian 37.9% 34.8%
Balochistan/Caucasus 34.7% 34.2%
Southwest Asian 9.8% 12.3%
European 5.9% 7.2%
Kalash 4.5% 2.3%
East African 3.1% 3.0%
West African 1.4% 1.8%
Siberian 2.4% 1.8%
Papuan 0.2% 1.0%
Northeast Asian 0.2% 0.3%
Southeast Asian 0.0% 1.3%
Eastern Bantu 0.0% 0.0%

You can see the results of all the participants in a spreadsheet or in a nice interactive bar chart. I am HRP0001 and my sister is HRP0035.

Interestingly, both McDonald and 23andme ancestry painting show my sister to have more African admixture than me, but here I have about the same East African as her and even her West African percentage is only a tiny bit higher.

To figure out what these ancestral populations mean, do read the post about the reference population analysis.

Your Genes, Regulated?

The FDA had a meeting the last two days:

FDA is convening this two-day meeting to seek the Panel’s expert opinion and input on scientific issues concerning Direct to Consumer (DTC) genetic tests that make medical claims.

This meeting is focused specifically on issues regarding clinical genetic tests that are marketed directly to consumers (DTC clinical genetic tests), where a consumer can order tests and receive test results without the involvement of a clinician.

The American Medical Association of course wants to limit genetic testing so that you would need a doctor to supervise everything.

We urge the Panel to offer clear findings and recommendations that genetic testing, except under the most limited circumstances, should be carried out under the personal supervision of a qualified health care professional, and provide individuals interested in obtaining genetic testing access to qualified health care professionals for further information.

23andme had two presentations at the meeting which they have posted on their blog.

In our presentations, we take the position that all genetic testing services, whether ordered by a physician or offered through direct access, should adhere to the same standards. We simultaneously request that the FDA consider redefining and establishing regulatory standards, including some fundamental definitions, to accommodate large-scale genetic testing and support innovation of its technologies and applications. We also request that regulation be based upon evidence and not fear of potential harm to individuals which, to date, has not been demonstrated. In fact, growing numbers of participating individuals and independent studies focused on this issue provide preliminary evidence that the vast majority of people understand the information presented and experience no significant negative effects.

Genomics Law Report had an overview of the issues beforehand as well as a Twitter roundup of the meeting. Here are his thoughts after the first day:

First and foremost, I fully expect the MCGP (Molecular and Clinical Genetics Panel) to note, likely more than once, that given the complexity of the questions put to it by the FDA it should be afforded far more time to deliberate and research prior to making any recommendations.

If taking time out for further debate isn’t an option, what is the MCGP likely to recommend? Based on today’s deliberations, I think it’s a safe bet that the MCGP will advise the FDA to (1) demand clear proof of analytical and clinical validity for all genetic tests and (2) require that most, or perhaps even all, genetic tests with demonstrated or potential clinical significance be (to use the FDA’s terminology) “routed through a clinician.”

In other words, I think the odds strongly favor an MCGP recommendation to the FDA that clinical (as defined by the FDA, which is itself a separate issue) direct-to-consumer genetic testing, when offered without a requirement that a clinician participate in the ordering, receipt and interpretation of the test, be removed from the marketplace. At least for the time being.

If you read my blog, you probably know my politics as being quite liberal. I do, however, think that any regulations have to be shown to have actual tangible benefit and prevention of harm. Simple misinterpretation of genetic results by a regular joe causing hypothetical harm is not enough justification.

So what can you do? Razib Khan is already on the task.

1) I am going to release my own 23andMe sequence into the public domain soon. I encourage everyone to download it. I would rather have someone off the street know my own genetic information than be made invisible by the government. That is my right. For now that right is not barred by law. I will exercise it.

2) Spread word of this video via social networking websites and twitter. The media needs to get the word out, but they only will if they know you care. Do you care? I hope you do. This is a power grab, this is not about safety or ethics. If it was, I assume that the “interpretative services” would be provided for free. I doubt they will be.

3) Contact your local representative in congress. I’ve never done this myself, but am going to draft a quick note. They need to be aware that people care, that this isn’t just a minor regulatory issue.

4) The online community needs to get organized. We’re not as powerful as a million doctors and a Leviathan government, but we have right on our side. They’re trying to take from us what is ours.

5) Plan B’s. We need to prepare for the worst. Which nations have the least onerous regulatory regimes? Is genomic tourism going to be necessary? How about DIYgenomics? The cost of the technology to genotype and sequence is going to crash. I know that the Los Angeles DIYbio group has a cheap cast-off sequencer. For those who can’t afford to go abroad soon we’ll be able to get access to our information in our homes. Let’s prepare for that day.

Here are the links to contact your House Representative and your Senators.

Eurogenes

Davidski of Eurogenes is also a genome blogger. In his admixture analysis for West, South and Central Asians, I am PKEG1 and my results are as follows:

European 4%
Siberian 1%
Caucasus 32%
Sub-Saharan African 4%
Middle Eastern 9%
East Asian 1%
South Asian 50%

Here’s a chart showing some of the reference samples and Eurogenes participants closest to me. As initially sorted, the list goes from most similar to me to least similar from top to bottom.

You can sort the bar chart by the different ancestral components by clicking on the legend on the right.

As you can see, Pathans and Punjabi Jatts are most similar to me in their admixture results.

Eurogenes also did a supervised admixture analysis by choosing 11 reference populations as the ancestral populations. Here are my results:

Pathan + Sindhi 86.39%
Middle Eastern (Jordanian + Palestinian) 10.78%
Sub-Saharan African (Mandenka + Yoruba) 2.82%
Anatolian + Caucasus (Armenian + Georgian) 0.00%
North Slavic (Polish + Belorussian) 0.00%
Western/Southern European (French) 0.00%
Balochi + Brahui + Makrani 0.00%
Burusho 0.00%
North Kannadi + Sakilli + Selected Gujarati 0.00%
East Asian (Han Chinese) 0.00%
Koryak + Nganassan + Yakut 0.00%

From these results, it doesn’t look like there is any Turkic, Turkish or Balkan ancestry in my past. I was also surprised at the really high Pathan + Sindhi percentage and the lack of so many of the others.

Dodecad Project II

I talked about the Dodecad Project last time. Dienekes also did some cluster analysis using mclust.

When he classified everybody into 48 clusters, I showed up almost all alone in cluster 21. Only one other member who is a Bihari Brahmin had a 50% chance of belonging in my cluster.

With 56 clusters, I am classified with 9 Sindhis (out of a reference population total of 24) and the same Bihari guy (who now has 99% chance of belongign in this cluster).

It looked like I was an outlier and when Dienekes tested for outlier data samples he found me among them.

With 64 clusters, I am again an outlier, though I am classified with a few Punjabis and 20/24 reference Sindhis and 10/22 reference Pathans. I am likely making their cluster not a good tight fit.

For 63 cluster analysis, the outlier status remains and the story is about the same as with 64 clusters.

More interesting was when Dienekes analyzed just South Asians. In his cluster analysis, I was classified with the 3 Punjabis in his project as well as the following reference population samples: 2 out of 25 Singapore Indians, 1 out of 24 Balochi, 18 out of 24 Sindhi, and 9 out of 22 Pathan.

His admixture results for me in this South Asian analysis were:

Pakistan 39.8
Indian 22.4
West Asian 16.3
Dagestan 11.8
European 2.8
North Kannadi 2.2
Southeast Asian 1.9
Irula 1.8
Siberian 1.1

An interesting pattern I have noticed is that my European admixture percentage is generally lower than other Punjabis. When the European is divided into North and South, I have less North European admixture than a typical Sindhi, Punjabi or Pathan but more South European than those groups.

The final analysis from Dodecad is a fun one:

Using Pakistani Punjabis from Xing et al. (2010) and Behar et al. (2010) Egyptians as references requires me to drop the number of markers to ~38k, but the result of the supervised ADMIXTURE analysis is 77.4% Punjabi and 22.6% Egyptian, which seems compatible with what he expected.

Basically, Dienekes used only 25 Punjabis and 12 Egyptians as reference and then tried to estimate my proportion of these two populations. Of course, the assumption is that these two are my only ancestries. Interestingly, this is very close to what I expected. I plan to do this same analysis with several different reference populations and see what I get.

Dodecad Ancestry Project

I asked Dienekes to include me in his Dodecad Ancestry Project and he gave me the following results:

Ancestral Component Percentage
South Asian 44.9%
West Asian 33.7%
Southwest Asian 5.7%
North European 5.5%
South European 3.7%
East African 3.4%
Northwest African 2.1%
West African 0.6%
East Asian 0.4%
Northeast Asian 0.1%

You can see the results of all the project participants in a spreadsheet. You can also check out the admixture results for the reference samples he used.

Below is a bar chart showing the ancestral population percentages for me (DOD128) along with some other Dodecad participants (those starting with DOD) and some reference populations. I selected those individuals and populations that were somewhat closer to me in their admixture results. Also, as initially sorted, the list goes from most similar to me to least similar from top to bottom.

You can sort the bar chart by the different ancestral components by clicking on the legend on the right.

A word about the ten ancestral components (South Asian, West Asian, Southwest Asian, North European, South European, etc): Admixture results in this case gave 10 ancestral components. These do not necessarily correspond to “pure” ancestral populations and they are not labeled, only defined by their allele frequencies. Dienekes looked at the admixture output for his reference populations and assigned the 10 components different names based on which region it is most common in. Thus calling an ancestral component “West Asian” just means that it is found at highest frequencies in the reference populations living in Western Asia nowadays.

I used hierarchical clustering on the Dodecad results to find out which participants are most similar to me. A tree below shows the section including me.

Closest to me are a Punjabi Brahmin and a half-Sindhi half-Balochi guy, then three Punjabi Jatts.

Through all these investigations, some things have cropped up again and again.

One is that I have a minor amount of African admixture (4% East + West African). Most of it seems to be East African, which is why it doesn’t show up in 23andme ancestry painting. This is consistent with a quarter Egyptian ancestry. An average Egyptian reference sample is 14.7% East African and 4.1% West African. A quarter of that would be 3.7% and 1.0% respectively. Compare that to my 3.4% and 0.6%.

Also, while I am not very similar to Punjabis, they are the group most similar to me. Since there are no Punjabis in the reference data, Sindhis are the next closest. I am in fact more similar to Gujaratis than I am to Turks or any Central or West Asian groups.