Morning coffee: multilevel drift


There is an abstract account of natural selection (Lewontin 1970) where one observes that any population of entities, whatever they may be, will evolve through natural selection if (1) there is variation, that (2) affects reproductive success, and (3) is heritable.

I don’t know how I missed this before, but it recently occured to me that there must be a similarly abstract account of drift, where a population will evolve through drift if there is (1) variation, (2) that is heritable, and (3) sampling due to finite population size.

Drift may not be negligible, especially since at a higher level of organization, the population size should be smaller, making natural selection relatively less efficient.

Morning coffee: against validation and optimization


It appears like I’m accumulating pet peeves at an alarming rate. In all probability, I am guilty of most of them myself, but that is no reason not to complain about them on the internet. For example: Spend some time in a genetics lab, and you will probably hear talk of ”validation” and ”optimization”. But those things rarely happen in a lab.

According to a dictionary, to ”optimize” means to make something as good as possible. That is almost never possible, nor desirable. What we really do is change things until they work according to some accepted standard. That is not optimization; that is tweaking.

To ”validate” means to confirm to that something is true, which is rarely possible. Occasionally we have something to compare to that you are really sure about, so that if a method agrees with it, we can be pretty certain that it works. But a lot of time, we don’t know the answer. The best we can do is to gather additional evidence.

Additional evidence, ideally from some other method with very different assumptions, is great. So is adjusting a protocol until it performs sufficiently well. So why not just say what we mean?

”You keep using that word. I do not think that it means what you think it means.”

Morning coffee: reviewing


(It was a long time since I did one of these posts. I’d better get going!)

One fun thing that happened after I received my PhD is that I started getting requests to review papers, four so far. Four papers (plus re-reviews of revised versions) in about a year probably isn’t that much, but it is strictly greater than zero. I’m sure the entertainment value in reviewing wears off quite fast, but so far it’s been fun, and feels good to pay off some of the sizeable review debt I’ve accumulated while publishing papers from my PhD. Maybe I’m just too naïve and haven’t seen the worst parts of the system yet, but I don’t feel that I’ve had any upsetting revelations from seeing the process from the reviewer’s perspective.

Of course, peer review, like any human endeavour, has components of politics, ego and irrationality. Maybe one could do more to quell those tendencies. I note that different journals have quite different instructions to reviewers. Some provide detailed directions, laying out things that the reviewer should and shouldn’t do, while others just tell you how to use their web form. I’m sure editorial practices also differ.

One thing that did surprise me was when an editor changed the text of a review I wrote. It was nothing major, not a case of removing something inappropriate, but rewording a recommendation to make it stronger. I don’t mind, but I feel that the edit changed the tone of the review. I’ve also heard that this particular kind of comment (when a reviewer states that something is required for a paper to be acceptable for publication) rubs some people the wrong way, because that is up to the editor to decide. In this case, the editor must have felt that a more strongly worded review was the best way to get the author to pay attention, or something like that. I wonder how often this happens. That may be a reason to be even more apprehensive about signing reviews (I did not sign).

So far, I’ve never experienced anything else than single-blind review, but I would be curious to review double-blinded. I doubt the process would differ much: I haven’t reviewed any papers from people I know about, and I haven’t spent any time trying to learn more about them, except in some cases checking out previous work that they’ve referenced. I don’t expect that I’d feel any urge to undertake search engine detective work to figure out who the authors were.

Sometimes, there is the tendency among scientists and non-scientists alike to elevate review to something more than a couple of colleagues reading your paper and commenting on it. I’m pretty convinced peer review and editorial comments improve papers. And as such, the fact that a paper has been accepted by an editor after being reviewed is some evidence of quality. But peer review cannot be a guarantee of correctness. I’m sure I’ve missed and misunderstood things. But still, I promise that I’ll do my best, and I will not have the conscience to turn down a request for peer review for a long time. So if you need a reviewer for a paper on domestication, genetic mapping, chickens or related topics, keep me in mind.

Morning coffee: cost per genome

I recently heard this thing referred to as ”the most overused slide in genomics” (David Klevebring). It might be: what it shows is some estimate of the cost of sequencing a human genome over time, and how it plummets around 2008. Before that, the curve is Sanger sequencing, and then the costs show second generation sequencing (454, Illumina and SOLiD).


The source is the US National Human Genome Research Institute, and they’ve put some thought into how to estimate costs so that machines, reagents, analysis and people to do the work are included and that the different platforms are somewhat comparable. One must first point out that downstream analysis to make any sense of the data (assembly and variant calling) isn’t included. But the most important thing that this graph hides, even if the estimates of the cost would be perfect, is that to ”sequence a genome” means something completely different in 2001 and 2015. (Well, with third generation sequencers that give long reads coming up, the old meaning might come back.)

For data since January 2008 (representing data generated using ‘second-generation’ sequencing platforms), the ”Cost per Genome” graph reflects projects involving the ‘re-sequencing’ of the human genome, where an available reference human genome sequence is available to serve as a backbone for downstream data analyses.

The human genome project was of course about sequencing and assembling the genome into high quality sequences. Very few of the millions of human genomes resequenced since are anywhere close. As people in the sequencing loop know, resequencing with short reads doesn’t give you a genome sequence (and neither does trying to assemble a messy eukaryote genome with short reads only). It gives you a list of variants compared to the reference sequence. The usual short read business has no way of detect anything but single nucleotide variants and small indels. (And the latter depends … Also, you can detect copy number variants, but large scale structural variants are mostly off the table.) Of course, you can use these edits to reconstruct a consensus sequence from the reference, but it would be a total lie.

Again, none of this is news for people who deal with sequencing, and I’m not knocking second-generation sequencing. It’s very useful and has made a lot of new things possible. It’s just something I think about every time I see that slide.

Morning coffe: ”epigenetics” is also ambiguous


I believe there is an analogy between the dual meaning of the word ”gene” and two senses of epigenetics, that this distinction is easy to get wrong and that it contributes to the confusion about the meaning of epigenetics. Gene can mean a sequence that has a name and a function, or it can mean a genetic variant. I sometimes, half-jokingly, call this genetics(1) and genetics(2). The order is wrong from a historical perspective, since the study of heritable variation predates the discovery of molecular genes. The first deals with the function of sequences and their products. The second deals with differences between individuals carrying different variants.

The same can be said about epigenetics. On one hand there is epigenetics(1), aiming to understand the normal function of certain molecular features, i.e. gene regulatory states that can be passed on through cell division. On the other hand, epigenetics(2) aims to explain individual variation between individuals that differ not in their DNA sequence but in other types of heritable states. And the recurring reader knows that I think that, since a lot of genetics(2) makes no assumptions about the molecular nature of the variation it studies, it will mostly work even if some of these states turn out to be epigenetic. In that sense, epigenetics(2) is a part of genetics.

Also: the spectre of epigenetic inheritance

What is is that is so scandalous about epigenetic inheritance? Not much, in my opinion. Some of the points on the spectrum clearly happen in the wild: stable and fluctuating epigenetic inheritance in plants, parental effects in animals and genomic imprinting in both. Widespread epigenetic inheritance in animals would change a lot of things, of course, but even if epigenetic inheritance turns out to be really important and common, genetics and evolution as we know them will not break. The tools to study and understand them are there.

Looking back at the post from yesterday, there are different flavours of epigenetic inheritance. At the most heritable end of the spectrum, epigenetic variants behave pretty much like genetic variants. Because quantitative genetics is agnostic to the molecular nature of the variants, as long as they behave like an inheritance system, most high-level genetic analysis will work the same. It’s just that on the molecular level, one would have to look to epigenetic marks, not to sequence changes, for the causal variant. Even if a substantial proportion of the genetic variance is caused by epigenetic variants rather than DNA sequence variants, this would not be a revolution that changes genetics or evolution into something incommensurable with previous thought.

The most revolutionary potential lies somewhere in the middle of the scale, in parental effects with really high fidelity of transmission that are potentially responsive to the environment, but in principle these things can still be dealt with by the same theoretical tools. Most people just didn’t think they were that important. How about soft inheritance? It seems dramatic, but all examples deal with specific programmed mechanisms: soft inheritance of the sensitivity to a particular odour or of the DNA methylation and expression state of a particular locus. No-one has yet suggested a generalised Lamarckian mechanism; that is still out of the question. DNA mutations are still unable to pass from somatic cells to gametes. Whatever tricks transgenerational mechanisms use to skip over the soma–germline distinction, they must be pretty exceptional. Discoveries of widespread soft inheritance in nature would be surprising, a cause for rethinking certain things and great fun. But conceptually, it is parental effects writ large. We can understand that. We have the technology.

Morning coffee: the spectrum of epigenetic inheritance


Let us think aloud about the different possible meanings of epigenetic inheritance. I don’t want to contribute to unnecessary proliferation of terminology — people have already coined molar/molecular epigenetics (Crews 2009), intergenerational/transgenerational effects (Heard & Martienssen 2014), and probably several more dichotomies. But I thought it could be instructive to try to think about epigenetic inheritance in terms of the contribution it could make to variance components of a quantitative genetic model. After all, quantitative genetics is mostly agnostic about the molecular nature of the heritable variation.

At one end of the spectrum we find molecular epigenetic marks such as DNA methylation, as they feature in the normal development of the organism. Regardless of how faithfully they are transmitted through mitosis, or even if they pass through meiosis, they only contribute to individual variation if they are perturbed in different ways between individuals. If they do vary between individuals, though, in a fashion that is not passed on to the offspring, they will end up in the environmental variance component.

What about transmissible variation? There are multiple non-genetic ways for information to be passed a single generation: maternal or paternal effects need not be epigenetic in the molecular sense. They could be, like genomic imprinting, but they could also be caused by some biomolecule in the sperm, something that passes the blood–placenta barrier or something deposited by the mother into the egg. Transgenerational effects of this kind make related individuals more similar, they will affect the genetic variance component unless they are controlled. And in the best possible world of experimental design, parental effects can be controlled and modelled, and we can in principle separate out the maternal, paternal and genetic component. Think of effects like in Weaver & al (2004) that are perpetuated by maternal behaviour. If the behavioural transmission is strong enough they might form a pretty stable heritable effect that would appear in the genetic variance component if it’s not broken up by cross-fostering.

However, if the variation behaves like germ-line variation it will be irreversible by cross-fostering, inseparable from the genetic variance component, and it will have the potential to form a genuine parallel inheritance system. The question is: how stable will it be? Animals seem to be very good at resetting the epigenetic germline each generation. The most provocative suggestion is probably some type of variation that is both faithfully transmitted and sometimes responsive to the environment. Responsiveness means less fidelity of transmission, though, and it seems (Slatkin 2009) like epigenetic variants need to be stable for many generations to make any lasting impact on heritability. Then, at the heritable end of the spectrum, we find epigenetic variants that arise from some type of random mutation event and are transmitted faithfully through the germline. If they exist, they will behave just like any genetic variants and even have a genomic locus.