Category Archives: Science and Technology - Page 2

Your Genes, Regulated?

The FDA had a meeting the last two days:

FDA is convening this two-day meeting to seek the Panel’s expert opinion and input on scientific issues concerning Direct to Consumer (DTC) genetic tests that make medical claims.

This meeting is focused specifically on issues regarding clinical genetic tests that are marketed directly to consumers (DTC clinical genetic tests), where a consumer can order tests and receive test results without the involvement of a clinician.

The American Medical Association of course wants to limit genetic testing so that you would need a doctor to supervise everything.

We urge the Panel to offer clear findings and recommendations that genetic testing, except under the most limited circumstances, should be carried out under the personal supervision of a qualified health care professional, and provide individuals interested in obtaining genetic testing access to qualified health care professionals for further information.

23andme had two presentations at the meeting which they have posted on their blog.

In our presentations, we take the position that all genetic testing services, whether ordered by a physician or offered through direct access, should adhere to the same standards. We simultaneously request that the FDA consider redefining and establishing regulatory standards, including some fundamental definitions, to accommodate large-scale genetic testing and support innovation of its technologies and applications. We also request that regulation be based upon evidence and not fear of potential harm to individuals which, to date, has not been demonstrated. In fact, growing numbers of participating individuals and independent studies focused on this issue provide preliminary evidence that the vast majority of people understand the information presented and experience no significant negative effects.

Genomics Law Report had an overview of the issues beforehand as well as a Twitter roundup of the meeting. Here are his thoughts after the first day:

First and foremost, I fully expect the MCGP (Molecular and Clinical Genetics Panel) to note, likely more than once, that given the complexity of the questions put to it by the FDA it should be afforded far more time to deliberate and research prior to making any recommendations.

If taking time out for further debate isn’t an option, what is the MCGP likely to recommend? Based on today’s deliberations, I think it’s a safe bet that the MCGP will advise the FDA to (1) demand clear proof of analytical and clinical validity for all genetic tests and (2) require that most, or perhaps even all, genetic tests with demonstrated or potential clinical significance be (to use the FDA’s terminology) “routed through a clinician.”

In other words, I think the odds strongly favor an MCGP recommendation to the FDA that clinical (as defined by the FDA, which is itself a separate issue) direct-to-consumer genetic testing, when offered without a requirement that a clinician participate in the ordering, receipt and interpretation of the test, be removed from the marketplace. At least for the time being.

If you read my blog, you probably know my politics as being quite liberal. I do, however, think that any regulations have to be shown to have actual tangible benefit and prevention of harm. Simple misinterpretation of genetic results by a regular joe causing hypothetical harm is not enough justification.

So what can you do? Razib Khan is already on the task.

1) I am going to release my own 23andMe sequence into the public domain soon. I encourage everyone to download it. I would rather have someone off the street know my own genetic information than be made invisible by the government. That is my right. For now that right is not barred by law. I will exercise it.

2) Spread word of this video via social networking websites and twitter. The media needs to get the word out, but they only will if they know you care. Do you care? I hope you do. This is a power grab, this is not about safety or ethics. If it was, I assume that the “interpretative services” would be provided for free. I doubt they will be.

3) Contact your local representative in congress. I’ve never done this myself, but am going to draft a quick note. They need to be aware that people care, that this isn’t just a minor regulatory issue.

4) The online community needs to get organized. We’re not as powerful as a million doctors and a Leviathan government, but we have right on our side. They’re trying to take from us what is ours.

5) Plan B’s. We need to prepare for the worst. Which nations have the least onerous regulatory regimes? Is genomic tourism going to be necessary? How about DIYgenomics? The cost of the technology to genotype and sequence is going to crash. I know that the Los Angeles DIYbio group has a cheap cast-off sequencer. For those who can’t afford to go abroad soon we’ll be able to get access to our information in our homes. Let’s prepare for that day.

Here are the links to contact your House Representative and your Senators.

Related Reading

Genes, Giants, Monsters, and Men: The Surviving Elites of the Cosmic War and Their Hidden Agenda
Governments Around the World (Kids' Guide to Government)
AMA Guides to the Evaluation of Disease and Injury Causation 2e


Davidski of Eurogenes is also a genome blogger. In his admixture analysis for West, South and Central Asians, I am PKEG1 and my results are as follows:

European 4%
Siberian 1%
Caucasus 32%
Sub-Saharan African 4%
Middle Eastern 9%
East Asian 1%
South Asian 50%

Here’s a chart showing some of the reference samples and Eurogenes participants closest to me. As initially sorted, the list goes from most similar to me to least similar from top to bottom.

You can sort the bar chart by the different ancestral components by clicking on the legend on the right.

As you can see, Pathans and Punjabi Jatts are most similar to me in their admixture results.

Eurogenes also did a supervised admixture analysis by choosing 11 reference populations as the ancestral populations. Here are my results:

Pathan + Sindhi 86.39%
Middle Eastern (Jordanian + Palestinian) 10.78%
Sub-Saharan African (Mandenka + Yoruba) 2.82%
Anatolian + Caucasus (Armenian + Georgian) 0.00%
North Slavic (Polish + Belorussian) 0.00%
Western/Southern European (French) 0.00%
Balochi + Brahui + Makrani 0.00%
Burusho 0.00%
North Kannadi + Sakilli + Selected Gujarati 0.00%
East Asian (Han Chinese) 0.00%
Koryak + Nganassan + Yakut 0.00%

From these results, it doesn’t look like there is any Turkic, Turkish or Balkan ancestry in my past. I was also surprised at the really high Pathan + Sindhi percentage and the lack of so many of the others.

Related Reading

Advanced Genealogy Research Techniques
The Family Tree German Genealogy Guide: How to Trace Your Germanic Ancestry in Europe
What's in Your Genes?: From the Color of Your Eyes to the Length of Your Life, a Revealing Look at Your Genetic Traits
Genetics: From Genes to Genomes (Hartwell, Genetics)

Dodecad Project II

I talked about the Dodecad Project last time. Dienekes also did some cluster analysis using mclust.

When he classified everybody into 48 clusters, I showed up almost all alone in cluster 21. Only one other member who is a Bihari Brahmin had a 50% chance of belonging in my cluster.

With 56 clusters, I am classified with 9 Sindhis (out of a reference population total of 24) and the same Bihari guy (who now has 99% chance of belongign in this cluster).

It looked like I was an outlier and when Dienekes tested for outlier data samples he found me among them.

With 64 clusters, I am again an outlier, though I am classified with a few Punjabis and 20/24 reference Sindhis and 10/22 reference Pathans. I am likely making their cluster not a good tight fit.

For 63 cluster analysis, the outlier status remains and the story is about the same as with 64 clusters.

More interesting was when Dienekes analyzed just South Asians. In his cluster analysis, I was classified with the 3 Punjabis in his project as well as the following reference population samples: 2 out of 25 Singapore Indians, 1 out of 24 Balochi, 18 out of 24 Sindhi, and 9 out of 22 Pathan.

His admixture results for me in this South Asian analysis were:

Pakistan 39.8
Indian 22.4
West Asian 16.3
Dagestan 11.8
European 2.8
North Kannadi 2.2
Southeast Asian 1.9
Irula 1.8
Siberian 1.1

An interesting pattern I have noticed is that my European admixture percentage is generally lower than other Punjabis. When the European is divided into North and South, I have less North European admixture than a typical Sindhi, Punjabi or Pathan but more South European than those groups.

The final analysis from Dodecad is a fun one:

Using Pakistani Punjabis from Xing et al. (2010) and Behar et al. (2010) Egyptians as references requires me to drop the number of markers to ~38k, but the result of the supervised ADMIXTURE analysis is 77.4% Punjabi and 22.6% Egyptian, which seems compatible with what he expected.

Basically, Dienekes used only 25 Punjabis and 12 Egyptians as reference and then tried to estimate my proportion of these two populations. Of course, the assumption is that these two are my only ancestries. Interestingly, this is very close to what I expected. I plan to do this same analysis with several different reference populations and see what I get.

Related Reading

Sphere Packings, Lattices and Groups (Grundlehren der mathematischen Wissenschaften) (v. 290)
The Genome Generation
55. St. Irenaeus of Lyons: Against the Heresies Book 1(Ancient Christian Writers) (v. 1)
From Genes to Genomes: Concepts and Applications of DNA Technology

Dodecad Ancestry Project

I asked Dienekes to include me in his Dodecad Ancestry Project and he gave me the following results:

Ancestral Component Percentage
South Asian 44.9%
West Asian 33.7%
Southwest Asian 5.7%
North European 5.5%
South European 3.7%
East African 3.4%
Northwest African 2.1%
West African 0.6%
East Asian 0.4%
Northeast Asian 0.1%

You can see the results of all the project participants in a spreadsheet. You can also check out the admixture results for the reference samples he used.

Below is a bar chart showing the ancestral population percentages for me (DOD128) along with some other Dodecad participants (those starting with DOD) and some reference populations. I selected those individuals and populations that were somewhat closer to me in their admixture results. Also, as initially sorted, the list goes from most similar to me to least similar from top to bottom.

You can sort the bar chart by the different ancestral components by clicking on the legend on the right.

A word about the ten ancestral components (South Asian, West Asian, Southwest Asian, North European, South European, etc): Admixture results in this case gave 10 ancestral components. These do not necessarily correspond to “pure” ancestral populations and they are not labeled, only defined by their allele frequencies. Dienekes looked at the admixture output for his reference populations and assigned the 10 components different names based on which region it is most common in. Thus calling an ancestral component “West Asian” just means that it is found at highest frequencies in the reference populations living in Western Asia nowadays.

I used hierarchical clustering on the Dodecad results to find out which participants are most similar to me. A tree below shows the section including me.

Closest to me are a Punjabi Brahmin and a half-Sindhi half-Balochi guy, then three Punjabi Jatts.

Through all these investigations, some things have cropped up again and again.

One is that I have a minor amount of African admixture (4% East + West African). Most of it seems to be East African, which is why it doesn’t show up in 23andme ancestry painting. This is consistent with a quarter Egyptian ancestry. An average Egyptian reference sample is 14.7% East African and 4.1% West African. A quarter of that would be 3.7% and 1.0% respectively. Compare that to my 3.4% and 0.6%.

Also, while I am not very similar to Punjabis, they are the group most similar to me. Since there are no Punjabis in the reference data, Sindhis are the next closest. I am in fact more similar to Gujaratis than I am to Turks or any Central or West Asian groups.

Related Reading

The Dramatic Universe: Book 1
Neanderthal Man: In Search of Lost Genomes
Infographics: The Power of Visual Storytelling
The Family Tree Problem Solver: Tried-and-True Tactics for Tracing Elusive Ancestors

McDonald Ancestry Analysis II

When my sister got her 23andme results, we sent them over to Doug McDonald. I was expecting something close to my results, but it was radically different:

This one is different it says 37% Druze, 4% Bushman or Pygmy, the rest North India. It is complicated enough that the program refuses to generate a spot on the map. The chromosome painting looks quite reasonable for that assignment.
I am including several plots .. these show just how odd this is.

Here are the PCA plots that Doug sent. My sister is shown by the crosshairs.

Think of this as two-dimensional projections of a multidimensional space and you’ll notice that my sister is not close to any of the reference groups.

You can see her 3-D position (“Test Person”) in the animation below (or by clicking on animation).

Her chromosome painting, a similar concept to 23andme’s ancestry painting, shows which chromosome segments are most like some population. As you can see, there are a few chromosomes that have almost no “South Asian” segments.

I was very surprised by my sister’s results, especially the 4% Bushman/Pygmy. I expected some East African admixture due to the Egyptian ancestry but no Pygmy. Also, I expected some (10-20%) Middle East contribution but Druze at 37% is just too high. So I asked Doug McDonald to redo my ancestry analysis with the new version of his software.

Here’s what he told me:

It says you are half North India, 3% Bushman or Pygmy, and the rest Iranian, OR 80% Sindhi, 2% Bushman or Pygmy, the rest being Bedouin.

The spot on the map is far SW Pakistan.

The Pygmy is clearly a mistake!

The Pygmy is definitely a mistake. Pygmies are a very distinctive population and because genetic diversity is very high in Africa, the continent of humanity’s origin, sometimes these reference populations can give weird results. These analyses basically try to fit your genetic data to reference populations’ data samples. That’s one reason why you see Sindhi or Pathan as a result for Punjabis because there are no Punjabis in the reference data of HapMap or HGDP.

Here are my PCA plots:

And here is my chromosome painting:

Related Reading

Carved In Stone: Book 1 of the Art of Love Series
The Deep Blue Good-by: A Travis McGee Novel

Doug McDonald Ancestry Analysis

As I noted last time, I was in a situation where I needed some help into ascertaining my genetic ancestry. Fortunately, there are people willing to do that sort of analysis for you. One of these is Doug McDonald. So I sent him my data and within an hour I had an analysis.

The PCA plot below shows me as a large cross in relation to different reference populations (like Europeans, Africans, East Asians etc).

Doug McDonald Ancestry Plot for me

Here’s what he said:

We also do quantitative tests. These come in three flavors, first without South Asia (represented by Pakistan) and the Mideast, second with South Asia, and finally with all three, as comparison panels.

The typical random error in the data (standard deviation) is 1%, meaning that numbers less than about 2% are not highly significant. There are also systematic errors. In particular, there is cross-coupling of values for Europe, the Mideast and S. Asia. For example, on the middle panel, a pure, northwestern European measures about 9% S. Asian, and on the third panel they typically measure 4.5% Mideastern and 8% S. Asian. Actual people from South Asia or the Mideast always test at least 15% European.

His first panel:

Europe 71.1%
East Asia 12.3%
Africa 8.2%
Oceania 4.7%
America 3.3%

When South Asia is added:

South Asia 48.7%
Europe 36.8%
Africa 5.8%
East Asia 5.0%
Oceania 2.4%
America 1.2%

And finally when Middle East is added to the list:

South Asia 46.9%
Europe 29.0%
Mideast 11.0%
East Asia 5.1%
Africa 4.3%
Oceania 2.3%
America 1.4%

And here is his analysis of these results:

This is basically a person from somewhere in region from say Iraq to Pakistan, with a substantial African contribution. The East Asian is probably not real. The African could be a few percent direct recent admixture, or it could be in input from a previously mixed population like the Makrani of Pakistan. My techniques can’t tell them apart.

The most interesting thing here for me was the African percentage.

Related Reading

What's in Your Genes?: From the Color of Your Eyes to the Length of Your Life, a Revealing Look at Your Genetic Traits
The Deep Blue Good-by: A Travis McGee Novel
The Confidence Code: The Science and Art of Self-Assurance---What Women Should Know
Grinding It Out: The Making Of McDonald's

Genome Similarity

23andme has a feature where you can find out how similar your genes are to your friends and family (who you are sharing with at the site). The result is a bar list with percentages showing similarity.

The number next to each person in the bar list is a measure of similarity. Specifically, it is the percentage of matching genotypes for all of the SNPs on our chip that are located in the genes or regions of interest.

If person 1 has AA and person 2 has AA at a particular position, they are 100% similar. If person 1 has AA and person 2 has AG, they are 50% similar at that position. If person 1 has AA and person 2 has GG, they are 0% similar at that position. We then average the percent similarity over all positions included in that comparison.

You can also calculate these similarity measures (IBS, Identical by State, distances) using plink if you have the genetic data for someone.

Discussing the expected similarity percentages, I figured that siblings and parents generally have similarity measures around the mid-80s. Usually for South Asians, it seems like their similarity percentage with other unrelated South Asians is close to 74%, especially for similar ethnic or geographic groups. Please note that this specific number 74% is valid for the specific set of SNPs included in the 23andme v2 chip. For v3 chip, people are getting higher numbers.

However, I looked far and wide and shared data with 81 people. My highest similarity percentage came out to be 73.22% with Amber, followed closely by a Bihari guy at 73.2% and a couple of other Punjabis. While most of my top matches are South Asian, with a large number of Punjabis, there is no particular pattern with several South Indians and Biharis matching highly too. My top non-South Asian matches are Iranians.

I expected my similarity percentages to be lower like they turned out to be due to my quarter non-South Asian ancestry. So that wasn’t a surprise.

I asked my parents and uncles and aunts about my great-grandmother’s ancestry. I knew she was from Egypt, but I found out that her ancestors had arrived in Egypt with Muhammad Ali Pasha. Since Muhammad Ali Pasha was an Albanian who worked for the Ottoman Sultanate, my relatives deduced that we have some Turk and/or Balkan ancestry.

So I asked a number of Turks, Southern Europeans and North Africans to share. I expected my similarity with them to be less than my similarity with South Asians, since 3/4 ancestry is more than 1/4. But I found something strange.

Let’s take any random person I am comparing my genes to who is not South Asian. If I compare how similar that person is to me against how similar he is to any South Asian, it turns out he’s more similar to the South Asian. This turns out to be true for similarity measures between me and all non-South Asians vs similarity measures between that non-South Asian and all South Asians. In brief, all East Africans, North Africans, Southern Europeans, Northern Europeans, Turks and Iranians (that are in my friends list) are more similar to all the South Asians among my friends list than they are to me.

I found this weird. If I had someone in my friends list who belonged to my great grandmother’s ethnicity or a closely related one, then I should be more similar to that friend than any random South Asian, instead of being the least similar like I was in all cases.

Clearly there was a need for better ancestry analysis in my case.

Related Reading

The Solution Revolution: How Business, Government, and Social Enterprises Are Teaming Up to Solve Society's Toughest Problems
The Cure in the Code: How 20th Century Law is Undermining 21st Century Medicine
DNA of the Gods: The Anunnaki Creation of Eve and the Alien Battle for Humanity
Genome: The Autobiography of a Species in 23 Chapters (P.S.)
The Genome Generation

My Biogeographical Ancestry

There are several different ways to figure out your genetic ancestry. One way that 23andme shows your ancestry is by comparison with reference populations of the HGDP (Human Genome Diversity Project) dataset. I have listed how similar I am to various groups in the table below:

Reference Population Similarity Groups Included
Central & South Asians 67.14 Pathan, Makrani, Kalash, Hazara, Balochi, Sindhi, Brahui and Burusho
Northern Europeans 66.94 western Russia, France, Orkney Islands
Southern Europeans 66.93 northern Italy, Tuscany, Sardinia, French Basque
Near Easterners 66.82 Palestinian, Druze, Bedouin
Siberians 66.55 Yakut
Eastern Asians 66.48 Japan, Cambodia, China (Dai, Daur, Han, Hezhen, Lahu, Miaozu, Mongola, Naxi, Oroqen, She, Tu, Tujia, Uygu, Xibo, Yizu)
North Americans 66.47 Pima, Maya
South Americans 66.43 Surui, Karitiana, Piapoco, Curripaco
Oceanians 66.38 Papuans, Melanesians
Northern Africans 66.16 Mozabite
Eastern Africans 64.13 Kenya
Southern Africans 64.04 San, Bantu speaking South Africans
Central Africans 64.01 Biaka, Mbuti Pygmies
Western Africans 63.98 Mandenka, Yoruba

My numbers are not too different from anyone from the northwestern part of the South Asian subcontinent.

One thing to consider over here is that you are being compared to a specific set of populations. As you can see, there is no Indian references here. Similarly, Near Easterners are represented only by samples from Israel and North Africans by one Algerian population. I wonder what the case would be if they had Egyptians or Ethiopians etc in their reference.

Another way to look at your genetic ancestry is with a PCA (Principal Component Analysis) plot. With the same reference populations mentioned above, 23andme calculated the two dimensions of largest variation among that data. These two axes don’t completely describe the variation across the samples, but being the two largest components they can be used to project your genetic data in that space. At the world level, I am the green marker in the middle of the Central/South Asian cluster.

In the South Asian PCA plot, I am in the middle of the Pathan cluster and right at the top edge of the Sindhi one.

Now this doesn’t make me a Pathan. For one thing, 23andme’s reference populations do not have any Punjabis. I am sharing with a number of North Indians and Pakistanis, including several Punjabis, and they all lie around me in the plot.

There is another problem with a PCA plot though. We are looking at the two most significant dimensions, but there are other dimensions too and they combined together could account for a lot of the variation among people’s genomes. Also, let’s say we have someone who is a child of a European and an East Asian parent. Now that person, who is 50% East Asian and 50% European, would be placed about midway between the East Asian and European clusters. That’s where the Uygur and Hazara clusters are. So we can’t say that someone is Uygur just because they are placed in the Uygur cluster in a PCA plot.

There are other ways to look at your genetic ancestry and I have been exploring a bunch of them. We’ll talk about them next.

Related Reading

Bioinformatics and Functional Genomics
Deep Ancestry: Inside The Genographic Project
Clinical Genomics: Practical Considerations for Adult Patient Care
Genomic and Personalized Medicine, Second Edition: V1-2
Me, Myself, and Why: Searching for the Science of Self

Ancestry Painting

23andme has a feature called ancestry painting which gives you the percentages of different populations you are admixed from. In my case, I got the following:

European 91.22%
Asian 8.69%
African 0.09%

Ignore the precision (to 2 decimal places). I showed that because I wanted to highlight the nonzero African ancestry percentage which I will talk about in more detail some other day.

The ancestry painting also shows you the segments on your chromosomes and which ancestral group you inherited them from. Here’s an image showing mine:

My 23andme ancestry painting

Does this mean I am 91% European and 9% Asian? Not quite! My results are about typical for someone from Punjab.

Also, the results depend on which reference populations were used as the exemplar European, Asian and African populations.

23andMe takes advantage of publicly available data for four populations studied extensively via the International HapMap project ( That project obtained the genotypes for 60 individuals of western European descent from Utah, 60 western African individuals from Nigeria, and 90 eastern Asian individuals, 45 from each of Japan and China. Because the two eastern Asian populations are geographically near one another and relatively similar at the genetic level, 23andMe combines these to form a single eastern Asian reference population. For more information on why these regions were used, please see (Why are these three populations used?)

So they are comparing your DNA segments to those of the three populations from the HapMap dataset. Using more reference populations would give you more fine-grained results (which is something I plan to do in my Harappa Ancestry Project).

Using the technique described by Eurogenes, you can check which chromosomal segments are classified as European (C), Asian (A) or African (Y). Here are my results for chromosome 9:

Chromosome Segment Ancestry
9 36587, 97974029 CC
9 97976425, 99363907 AC
9 99367419, 100530260 CC
9 100536329, 104679442 AC
9 104680472, 106598880 CC
9 106602625, 108990980 AC
9 108993234, 133447401 CC
9 133447580, 138437690 AC
9 138443022, 140147760 CC

The number of Asian segments on my homozygous chromosome 9 makes me doubt that it comes from my Egyptian great-grandmother. May be it’s from my great-grandfather.

The African segments are on chromosome 8:

Chromosome Segment Ancestry
8 154984, 4074371 CY
8 140917074, 142173290 CY

I need to do ancestry painting on my own in more detail.

Related Reading

The Family Tree German Genealogy Guide: How to Trace Your Germanic Ancestry in Europe
The Cure in the Code: How 20th Century Law is Undermining 21st Century Medicine
Ancestry's Concise Genealogical Dictionary
Clinical Genomics: Practical Considerations for Adult Patient Care
The Confidence Code: The Science and Art of Self-Assurance---What Women Should Know

Harappa Ancestry Project Update

I have got 25 participants to the Harappa Ancestry Project now. But we still need more especially from the Hindi belt.

I have been detailing the datasets I am using:

I have also started admixture analysis of the reference populations and first batch of project participants.

Related Reading

From Genes to Genomes: Concepts and Applications of DNA Technology
Quicksheet Citing Databases & Images
The Family Tree German Genealogy Guide: How to Trace Your Germanic Ancestry in Europe
The Harappa Files
Genealogy Bundle: Free Sites - Maps - Cemeteries - Civil War - Case Study