Journal club of one: ”An expanded view of complex traits: from polygenic to omnigenic”

An expanded view of complex traits: from polygenic to omnigenic” by Boyle, Yang & Pritchard (2017) came out recently in Cell. It has been all over Twitter, and I’m sure it will influence a lot of people’s thinking — rightfully so. It is a good read, pulls in a lot of threads, and has a nice blend of data analysis and reasoning. It’s good. Go read it!

The paper argues that for a lot of quantitative traits — specifically human diseases and height — almost every gene will be associated with every trait. More than that, almost every gene will be causally involved in every trait, most in indirect ways.

It continues with the kind of analysis used in Pickrell (2014), Finucane & al (2015) among many others, that break genome-wide association down down by genome annotation. How much variability can we attribute to variants in open chromatin regions? How much to genes annotated as ”protein bindning”? And so on.

These analyses point towards gene regulation being important, but not that strongly towards particular annotation terms or pathways. The authors take this to mean that, while genetic mapping, including GWAS, finds causally involved genes, it will not necessarily find ”relevant” genes. That is, not necessarily genes that are the central regulators of the trait. That may be a problem if you want to use genetic mapping to find drug targets, pathways to engineer, or similar.

This observation must speak to anyone who has looked at a list of genes from some mapping effort and thought: ”well, that is mostly genes we know nothing about … and something related to cancer”.

They write:

In summary, for a variety of traits, the largest-effect variants are modestly enriched in specific genes or pathways that may play direct roles in disease. However, the SNPs that contribute the bulk of the heritability tend to be spread across the genome and are not near genes with disease-specific functions. The clearest pattern is that the association signal is broadly enriched in regions that are transcriptionally active or involved in transcriptional regulation in disease-relevant cell types but absent from regions that are transcriptionally inactive in those cell types. For typical traits, huge numbers of variants contribute to heritability, in striking consistency with Fisher’s century-old infinitesimal model.

I summary: it’s universal pleiotropy. I don’t think there is any reason to settle on ”cellular” networks exclusively. After all, cells in a multicellular organism share a common pool of energy and nutrients, and exchange all kinds of signalling molecules. This agrees with classical models and the thinking in evolutionary genetics (see Rockman & Paaby 2013). Or look at this expression QTL and gene network study in aspen (Mähler & al 2017): the genes with eQTL tend to be peripheral, not network hub genes.

It’s a bit like in behaviour genetics, where people are fond of making up these elaborate hypothetical causal stories: if eyesight is heritable, and children with bad eyesight get glasses, and the way you treat a child who wears glasses somehow reinforces certain behaviours, so that children who wear glasses grow up to score a bit better on certain tests — are the eyesight variants also ”intelligence variants”? This is supposed to be a reductio ad absurdum of the idea of calling anything an ”intelligence variant” … But I suspect that this is what genetic causation, when fully laid out, will sometimes look like. It can be messy. It can involve elements that we don’t think of as ”relevant” to the trait.

There are caveats, of course:

One reason that there is a clearer enrichment of variant-level annotation such as open chromatin than in gene-level annotation may be that the resolution is higher. We don’t really know that much about how molecular variation translates to higher level trait variation. And let’s not forget that for most GWAS hits, we don’t know the causative gene.

They suggest defining ”core genes” like this: ”conditional on the genotype and expres-
sion levels of all core genes, the genotypes and expression levels of peripheral genes no longer matter”. Core genes are genes that d-separate the peripheral genes from a trait. That makes sense. Some small number of genes may be necessary molecular intermediates for a trait. But as far as I can tell, it doesn’t follow that useful biological information only comes from studying core genes, nor does it follow that we can easily tell if we’ve hit a core or a peripheral gene.

Also, there are quantitative genetics applications of GWAS data that are agnostic of pathways and genes. If we want to use genetics for prediction, for precision medicine etc, we do not really need to know the functions of the causative genes. We need big cohorts, well defined trait measurements, good coverage of genetic variants, and a good idea of environmental risk factors to feed into prediction models.

It’s pretty entertaining to see the popular articles about this paper, and the juxtaposition of quotes like ”that all those big, expensive genome-wide association studies may wind up being little more than a waste of time” (Gizmodo) with researchers taking the opportunity to bring up up their favourite hypotheses about missing heritability — even if it’s not the same people saying both things. Because if we want to study rare variants, or complex epistatic interactions, or epigenomics, or what have you, the studies will have to be just as big and expensive, probably even more so.

Just please don’t call it ”omnigenetics”.

Literature

Boyle, Evan A., Yang I. Li, and Jonathan K. Pritchard. ”An Expanded View of Complex Traits: From Polygenic to Omnigenic.” Cell 169.7 (2017): 1177-1186.

Annonser

Paper: ”Heritable genome-wide variation of gene expression and promoter methylation between wild and domesticated chickens”

Since I love author blog posts about papers, I thought I’d write a little about papers I’ve contributed too. So far, they’re not that many, but maybe it can be a habit.

Heritable genome-wide variation of gene expression and promoter methylation between wild and domesticated chickens” was published in BMC Genomics in 2012. The title says it very well: the paper looks at differential expression and DNA methylation of a subset of genes in the hypothalamus of Red Junglefowl and domestic White Leghorn chickens. My contribution was during my MSc project in the group. Previously (Lindqvist & al 2007; Nätt & al 2009) Daniel Nätt, Pelle Jensen and others found a transgenerational effect of unpredictable light stress on domestic chickens. After that, and being interested in chicken domestication, a DNA methylation comparison of wild and domestic seems like a natural thing to do. And it turns out Red Junglefowl and White Leghorns differ in expression of a bunch of genes and in methylation of certain promoters (where promoter is operationally defined as a region around the start of the gene model). And when looking at two generations, the contrasts are correlated between parent and offspring. There is some heritable basis of the differences in gene expression and  DNA methylation.

In Red Junglefowl, ancestor of domestic chickens, gene expression and methylation profiles in thalamus/hypothalamus differed substantially from that of a domesticated egg laying breed. Expression as well as methylation differences were largely maintained in the offspring, demonstrating reliable inheritance of epigenetic variation.

What I did was methylation sensitive high resolution melting. HRM is a typing method based on real time PCR. After PCR you often make a melting curve by ramping up the temperature, denaturing the PCR product. The melting characteristics depend on the sequence, so you can use melting to check that you get the expected PCR product, and it turns out that the difference can be big enough to type SNPs. And if you can type SNPs, you can analyse DNA methylation. So we treat the DNA with bisulfite, which deaminates cytosines to uracil unless they are protected by methylation, and get a converted sequence where an unmethylated C is like a C>T SNP. We set up standard curves with a mixture of whole-genome amplified and in vitro methylated DNA and measured the degree of methylation.

That is averaging over the population of DNA molecules in the sample; I’ve been wondering how HRM performs when the CpGs in the amplicon have heterogenous methylation differences. We’ve used HRM for genotyping as well, and it works, but we’ve switched to pyrosequencing, which gives cleaner results and where the assay design is much easier to get right the first time. I don’t know whether the same applies for methylation analysis with pyro.

heritability_methylation_fig4b

My favourite part of the paper is figure 4b (licence: cc:by 2.0) which shows methylation analysis in the advanced intercross of Red Junglefowl and White Leghorns, which immediately leads to, as mentioned in the paper, the thought of DNA methylation QTL mapping.

Literature

Nätt, D., Rubin, C. J., Wright, D., Johnsson, M., Beltéky, J., Andersson, L., & Jensen, P. (2012). Heritable genome-wide variation of gene expression and promoter methylation between wild and domesticated chickens. BMC genomics, 13(1), 59.

Lindqvist C, Janczak AM, Nätt D, Baranowska I, Lindqvist N, et al. (2007) Transmission of Stress-Induced Learning Impairment and Associated Brain Gene Expression from Parents to Offspring in Chickens. PLoS ONE 2(4): e364. doi:10.1371/journal.pone.0000364

Nätt D, Lindqvist N, Stranneheim H, Lundeberg J, Torjesen PA, et al. (2009) Inheritance of Acquired Behaviour Adaptations and Brain Gene Expression in Chickens. PLoS ONE 4(7): e6405. doi:10.1371/journal.pone.0006405

Journal club of one: ”Functionally enigmatic genes: a case study of the brain ignorome”

This recent paper, Pandey & al (2014), made me interested because I’m in the business of finding genes for traits, and have spent quite some time looking at lists of gene names and annotation database output. One is tempted to look for the ”outstanding candidates” that ”make biological sense” (quotes intended as scare quotes), but the truth is probably that no-one knows what genes and functions we should expect to be affected by genetic variation in, for instance, behaviour. This paper tries to make the case for the unknown parts of the brain transcriptome; they use data about gene expression, protein domains, paralogs and literature to argue that the unknown genes are unknown for no good reason and that they might be just as important as genes that happen to be well-known.

They found genes that are had a high ratio of expression in brain to average expression in other tissues of C57BL/6J and DBA/2J mice and searched PubMed for these genes in combination with neuroscience-related keywords. Some of them have few citations and these are their selectively expressed but little studied genes. They then make a series of comparisons between these and well-studied genes. It turns out the only major difference is that well-studied genes were discovered (entered into GenBank) earlier.

Comments:

I don’t know to what extent these results are suprising. I was not surprised by their main conclusion, but then again, that maybe my opinion was mostly prejudice. There is a literature on biases in the functional genomics literature, but I don’t know much about it. And apparently neither did the authors, initially, as Robert Williams writes in a comment on the PLOS ONE website:

We did not rediscover the lovely work of Robert Hoffmann (now head of WikiGene) until the paper had been submitted in succession to six higher profile journals … Hoffmann and colleagues showed that social factors account for much of the annotation imbalance for genes.

I love the idea of authors writing an informal comment about the background of the paper like this.

The coexpression network results show some of the little known genes are just as connected as known important genes. This suggest some of the unknown genes might be important too, if we can trust that coexpression hub genes are likely to be important (for various values of ”important”). Maybe this is a scientific opportunity for some neuroscientist. Several people I’ve talked with has imagined future Big Science initiatives to describe the function of unknown genes — ”divide them up between labs and characterise them!” — and some initiatives exist, such as the IMPC. On the other hand, how do we know that we really find the most important and interesting functions of a gene? The skeptic in me thinks that going bottom up, from gene to phenotype, will miss the most interesting surprising phenotypes.

I think ”ignorome” is one of those unnecessary bad omics words, which is why I’ve avoided using it.

Their PubMed query was restricted to mouse, human and rat. I wonder why. Maybe there could be something useful from fruit flies or roundworms?

Overall, a fun paper that I recommend reading over a few cups of coffee!

Literature

Pandey AK, Lu L, Wang X, Homayouni R, Williams RW (2014) Functionally Enigmatic Genes: A Case Study of the Brain Ignorome. PLoS ONE 9(2): e88889. doi:10.1371/journal.pone.0088889

Morning coffee: epigenetic inheritance of odour sensitivity

kaffe_tryffel

A while ago I wrote a bit about the recent paper on epigenetic inheritance of acetophenone sensitivity and odorant receptor expression. I spent most of the post talking about potential problems, but actually I’m not that negative. There is quite a literature building up about these transgenerational effects, that is quite inspiring if a little overhyped. I for one do not think epigenetic inheritance is particularly outrageous or disrupting to genetics and evolution as we know it. Take this paper: even if it means inheritance of an acquired trait, it is probably not very stable over the generations, and it is nothing like a general Lamarckian transmission mechanism that can work for any trait. It is probably very specific for odourant receptors. It might allow for genetic assimilation of fear of odours though, which would be cool, but probably not at all easy to demonstrate. But no-one knows how it works, if it does — there are even multiple unknown steps. How does fear conditioning translate to DNA methylation differences sperm that translates to olfactory receptor expression in the brain of the offspring?

A while after the transgenerational effects paper I saw this one in PNAS: Rare event of histone demethylation can initiate singular gene expression of olfactory receptors (Tan, Song & Xie 2013). I had no idea olfactory receptor expression was that fascinating! (As is often the case when you scratch the surface of another problem in biology, there turns out to be interesting stuff there …) Mice have lots and lots of odorant receptor genes, but each olfactory neuron only expresses one of them. Apparently the expression is regulated by histone 3 lycine 9 methylation. The genes start out methylated and suppressed, but once one of them is expressed it will keep all other down by downregulating a histone demethylase. This is a modeling paper that shows that if random demethylation happens slowly enough and the feedback to shut down further demethylation is fast enough, these steps are sufficient to explain the specificity of expression. There are some connections between histone methylation and DNA methylation: it seems that DNA methylation binds proteins that bring histone methylases to the gene (review Cedar & Bergman 2009). Dias & Ressler saw hypomethylation near the olfactory receptor gene in question, Olfr151. Maybe that difference, if it survives through to the developing brain of the offspring, can make demethylation of the locus more likely and give Olfr151 a head start in the race to become the first expressed receptor gene.

Literature

Brian G Dias & Kerry J Ressler (2013) Parental olfactory experience influences behavior and neural structure in subsequent generations Nature neuroscience doi:10.1038/nn.3594

Longzhi Tan, Chenghang Zong, X. Sunney Xie (2013) Rare event of histone demethylation can initiate singular gene expression of olfactory receptors. PNAS 10.1073/pnas.1321511111

Howard Cedar, Yehudit Bergman (2009) Linking DNA methylation and histone modification: patterns and paradigms. Nature reviews genetics doi:10.1038/nrg2540

Morning coffee: microarrays

kaffe_lissabon

Who still uses gene expression microarrays? I do and lots of other people do. And even though it’s pretty clear that RNA-seq is better, as long as it’s more expensive — and it probably still is for many combinations of microarray and sequencing platforms — the trade-off between the technical variability and sample size should still favour microarrays. But the breaking point probably occurs about right now, and I’m looking forward to seeing lots of sequencing based genetical genomics with splice-eQTL, antisense RNA-eQTL and what not! But then again, the same might happen for RNA-seq in a few years: I hope people stick with current generation massively parallel sequencing long enough to get decent sample sizes instead of jumping to small-N studies with the next technology.