Journal club of one: ”An expanded view of complex traits: from polygenic to omnigenic”

An expanded view of complex traits: from polygenic to omnigenic” by Boyle, Yang & Pritchard (2017) came out recently in Cell. It has been all over Twitter, and I’m sure it will influence a lot of people’s thinking — rightfully so. It is a good read, pulls in a lot of threads, and has a nice blend of data analysis and reasoning. It’s good. Go read it!

The paper argues that for a lot of quantitative traits — specifically human diseases and height — almost every gene will be associated with every trait. More than that, almost every gene will be causally involved in every trait, most in indirect ways.

It continues with the kind of analysis used in Pickrell (2014), Finucane & al (2015) among many others, that break genome-wide association down down by genome annotation. How much variability can we attribute to variants in open chromatin regions? How much to genes annotated as ”protein bindning”? And so on.

These analyses point towards gene regulation being important, but not that strongly towards particular annotation terms or pathways. The authors take this to mean that, while genetic mapping, including GWAS, finds causally involved genes, it will not necessarily find ”relevant” genes. That is, not necessarily genes that are the central regulators of the trait. That may be a problem if you want to use genetic mapping to find drug targets, pathways to engineer, or similar.

This observation must speak to anyone who has looked at a list of genes from some mapping effort and thought: ”well, that is mostly genes we know nothing about … and something related to cancer”.

They write:

In summary, for a variety of traits, the largest-effect variants are modestly enriched in specific genes or pathways that may play direct roles in disease. However, the SNPs that contribute the bulk of the heritability tend to be spread across the genome and are not near genes with disease-specific functions. The clearest pattern is that the association signal is broadly enriched in regions that are transcriptionally active or involved in transcriptional regulation in disease-relevant cell types but absent from regions that are transcriptionally inactive in those cell types. For typical traits, huge numbers of variants contribute to heritability, in striking consistency with Fisher’s century-old infinitesimal model.

I summary: it’s universal pleiotropy. I don’t think there is any reason to settle on ”cellular” networks exclusively. After all, cells in a multicellular organism share a common pool of energy and nutrients, and exchange all kinds of signalling molecules. This agrees with classical models and the thinking in evolutionary genetics (see Rockman & Paaby 2013). Or look at this expression QTL and gene network study in aspen (Mähler & al 2017): the genes with eQTL tend to be peripheral, not network hub genes.

It’s a bit like in behaviour genetics, where people are fond of making up these elaborate hypothetical causal stories: if eyesight is heritable, and children with bad eyesight get glasses, and the way you treat a child who wears glasses somehow reinforces certain behaviours, so that children who wear glasses grow up to score a bit better on certain tests — are the eyesight variants also ”intelligence variants”? This is supposed to be a reductio ad absurdum of the idea of calling anything an ”intelligence variant” … But I suspect that this is what genetic causation, when fully laid out, will sometimes look like. It can be messy. It can involve elements that we don’t think of as ”relevant” to the trait.

There are caveats, of course:

One reason that there is a clearer enrichment of variant-level annotation such as open chromatin than in gene-level annotation may be that the resolution is higher. We don’t really know that much about how molecular variation translates to higher level trait variation. And let’s not forget that for most GWAS hits, we don’t know the causative gene.

They suggest defining ”core genes” like this: ”conditional on the genotype and expres-
sion levels of all core genes, the genotypes and expression levels of peripheral genes no longer matter”. Core genes are genes that d-separate the peripheral genes from a trait. That makes sense. Some small number of genes may be necessary molecular intermediates for a trait. But as far as I can tell, it doesn’t follow that useful biological information only comes from studying core genes, nor does it follow that we can easily tell if we’ve hit a core or a peripheral gene.

Also, there are quantitative genetics applications of GWAS data that are agnostic of pathways and genes. If we want to use genetics for prediction, for precision medicine etc, we do not really need to know the functions of the causative genes. We need big cohorts, well defined trait measurements, good coverage of genetic variants, and a good idea of environmental risk factors to feed into prediction models.

It’s pretty entertaining to see the popular articles about this paper, and the juxtaposition of quotes like ”that all those big, expensive genome-wide association studies may wind up being little more than a waste of time” (Gizmodo) with researchers taking the opportunity to bring up up their favourite hypotheses about missing heritability — even if it’s not the same people saying both things. Because if we want to study rare variants, or complex epistatic interactions, or epigenomics, or what have you, the studies will have to be just as big and expensive, probably even more so.

Just please don’t call it ”omnigenetics”.

Literature

Boyle, Evan A., Yang I. Li, and Jonathan K. Pritchard. ”An Expanded View of Complex Traits: From Polygenic to Omnigenic.” Cell 169.7 (2017): 1177-1186.

From Lisbon, part 2

ESEB 2013 is over. I’ve had a great time, met with a lot of cool people and actually coped reasonably well with the outdoor temperature. As a wimpy Swede, I find anything above 30 degrees Celsius rather unpleasant. There have been too many talks and posters to mention all the good stuff, but here are a few more highlights:

Trudy Mackay’s plenary on epistasis in quantitative traits in D. melanogaster: Starting with the Drosophila Genetic Reference Panel and moving on to the Flyland advanced intercross population, Mackay’s group found what appeared to be extensive epistasis in several quantitative traits. Robert Anholt spoke later the same day about similar results in olfactory behaviour. While most of the genetic variance on the population level is still effectively additive, there seems to be a lot of interaction at the level of gene action, and it hinders QTL detection. The variants that did show up appeared to be involved in common networks. Again, we ask ourself how big these networks are and how conserved they might be among different species.

How did all this epistasis come about then? Mackay’s answer is phenotypic buffering or canalisation (as we say in the Nordic countries: a beloved child has many names). That is, that the organism has a certain buffering capacity against mutations, and that the effect of many of them are only revealed on a certain genetic background where buffering has been broken. See their paper: Huang et al (2012). Mackay mentioned some examples in answer to a question: potentially damaging exonic mutations travelled together with compensatory mutations that possibly made them less damaging. It would be really fun to see an investigation of the molecular basis of some examples.

(Being a domestication genetics person, this immediately brings me to Belyaev’s hypothesis about domestication. Belyaev started the famousic farm fox domestation experiment, selecting foxes for reduced fear of humans. And pretty quickly, the foxes started to become in many respects similar to dogs. Belyaev’s hypothesis is that ‘destabilising selection’ for tameness changed some regulatory system (probably in the hypothalamus–pituitary–adrenal axis) that exposed other kinds of variation. I think it’s essentially a hypothesis about buffering.)

Laurent Excoffier about detecting recent polygenic adaptation in humans. Very impressive! The first part of the talk presented a Fst outlier test applied to whole pathways together instead of individual loci. This seems to me analogous to gene set enrichment tests that calculate some expression statistic on predefined gene sets, instead of calculating the statistic individually and then applying term enrichment tests. In both cases, the point is to detect more subtle changes on the pathway as a whole. As with many other enrichment methods, particularly in humans, it is not that obvious what to do next with the list of annotation terms. Even when the list makes good biological sense — really, is there a gene list that wouldn’t seem to make at least a bit of biological sense? The results do (again) imply epistasis in human immune traits, and that is something that could potentially be tested. Though it would be a heroic amount of work, I hope someone will use this kind of methods in some organism where it is actually possible to test the function and compare locally adapted populations.

Alison Wright’s talk on Z chromosome evolution. She works with Judith Mank, and I’ve heard a bit about it before, but sex chromosomes and the idea that you can trace the ‘strata’ of chromosome evolution are always fascinating. Wright also presented some interesting differences in the male hypermethylated region between birds with different mating systems.

William Jeffery on blind cavefish: I’ve been thinking for ages that I should blog about the blind cavefish (for popular/science and in Swedish, that is), because it’s such a beautiful example. The case for eye regression as an adaptive trait rather than just the loss of an unnecessary structure seems pretty convincing! Making an eye regress at the molecular level seems at once rather simple — removal of the lens (by apoptosis in the blind cavefish) seems to be all that is needed — and complex (it’s polygenic and apparently not achieved the same way in all blind cavefish populations).

Virpi Lummaa’s plenary about using parish records from preindustrial Finland to investigate hypotheses about reproduction, longevity and menopause. I heard about the Grandmother hypothesis ages ago, so I knew about it, but I didn’t know to what extent there was empirical support for it. Unfortunately, that many of the cases where I’ve heard a nice hypothesis but don’t know the empirical support turn out to be disappointments. Not this time, however! On top of all the good stuff in the talk, Lummaa had very pretty slides with old photos and paintings by Albert Edelfelt. The visual qualities were surpassed only by Rich FitzJohn’s slides.

Edelfelt_Larin_Paraske

(Larin Paraske by Albert Edelfelt)

Two things that were not so great:

The poster sessions. Now my poster session on Friday turned out very well for me, but many others weren’t so lucky. I don’t know why half of the posters were hung facing the wall with hardly enough space for people to walk by the poster board, but it was a terrible idea and must have stopped a lot of people from seeing more posters.

The gender balance. According to Julia Schroeder only 27% of invited speakers were women. I don’t know how it worked behind the scenes and what the instructions to symposium organisers were, and there might not be an easy fix, but this urgently needs fixing.

Of course, there were many more good talks and posters than the handful I’ve mentioned, and apart from them, the twitter feed and tweetup, the social gatherings and the fact that there were actually several interesting people that came to my poster to chat were highlights for me. I come home with a long list of papers to read and several pages of things to try. Good times!

lisbon