A year ago in Lund: the panel discussion at Evolution in Sweden 2016

This meeting took place on the 13th and 14th of January 2016 in Lund. It feels a bit odd to write about it now, but my blog is clearly in a state of anachronistic anarchy as well as ett upphöjt tillstånd av språklig förvirring, so that’s okay. It was a nice meeting, spanning quite a lot of things, from mosasaurs to retroviruses. It ended with a panel discussion of sorts that made me want to see more panel discussions at meetings.

The panel consisted of Anna-Liisa Laine, Sergey Gavrilets, Per Lundberg, Niklas Wahlberg, and Charlie Cornwallis, and a lot of people joined in with comments. I don’t know how the participants were chosen (Anna-Liisa Laine and Sergey Gavrilets were the invited speakers, so they seem like obvious choices), or how they were briefed; Per Lundberg served as a moderator and asked the other participants about their predictions about the future of the field (if memory serves me right).

I thought some of the points were interesting. One of Sergey Gavrilets’ three anticipated future developments was links between different levels of organisation; he mentioned systems biology and community ecology in the same breath. This sounded interesting to me, who not so secretly dreams of the day when systems biology, quantitative genetics, and populations genetics can all be brought to bear on the same phenotypes. (The other two directions of research he brought up were cliodynamics and human evolution.) He himself had, earlier in his talk, provided an example where a model of human behaviour shows the possibility of something interesting — that a kind of cooperation or drive for equality can be favoured without anything like kin or group selection. That is, in some circumstances it pays to protect the weak, and thus make sure that they bullies do not get too much ahead. He said something to the effect that now is the time to apply evolutionary biology to humans. I would disagree with that. On the one hand, if you are interested in studying humans, any time is the time. On the other hand, if the claim is that now, evolutionary biology is mature and solid, so one can go out and apply it to help other disciplines to sort out their problems … I think that would be overly optimistic.

A lot of the discussion was about Mats Björklund‘s talk about predicting evolution, or failing to do so. Unfortunately, I think he had already left, and this was the one talk of the conference that I missed (due to dull practical circumstances stemming from a misplaced wallet), so this part of the discussion mostly passed me by.

A commonplace that recurred a few times was jokes about sequencing … this or that will not be solved by sequencing thousands of genomes, or by big data — you know the kind. This is true, of course; massively parallel sequencing is good when you want to 1) make a new reference genome sequence; 2) get lots and lots of genetic markers or 3) quantify sequences in some library. That certainly doesn’t cover all of evolutionary biology, but it is still quite useful. Every time this came up part of me felt like putting my hand up to declare that I do in fact think that sequencing thousands of individuals is a good idea. But I didn’t, so I write it here where even fewer people will read it.

This is (according to my notes) what the whiteboard said at the end of the session:

”It’s complicated …”
”We need more data …”
”Predictions are difficult/impossible”
”We need more models”

Business as usual
Eventually we’ll get there (where?)
Revise assumptions, models, theories, methods, what to measure

Nothing in evolutionary biology makes sense except in the light of ecology phylogeny disease

Everything in evolution makes sense in the light of mangled Dobzhansky quotes.

(Seriously, I get why pastiches of this particular quote are so common: It’s a very good turn of phrase, and one can easily substitute the scientific field and the concept one thinks is particularly important. Nothing in behavioural ecology makes sense except in the light of Zahavi’s handicap principle etc. It is a fun internal joke, but at the same time sounds properly authoritative. Michael Lynch’s version sometimes seems to be quoted in the latter way.)

Linköping–Edinburgh–Uppsala

If you are the kind of person who reads the lists of decisions from Formas, you may already know this. In March, I’m starting a new postdoc position, in collaboration with John Hickey’s AlphaGenes group at the Roslin Institute in Edinburgh and Dirk-Jan de Koning’s group at the Swedish University of Agriculture in Uppsala, funded by a mobility starting grant for young researchers from the research council Formas. Hurrah!

The project involves using huge datasets from livestock animals to search for genes and variants underlying quantitative traits. In that sense, for me, this is both a new direction (animal breeding research) and a natural continuation (the genetic basis of quantitative traits). So, in the coming years I anticipate, among other things, learning a ton about computational quantitative genetics; meeting and working with great people; travelling more than ever (relative to my relatively low baseline); writing a poem or two about the scenic environs of Edinburgh and the Royal Mounds of Uppsala; figuring out the across-borders relationship thing; discovering new and useful things about quantitative traits; and hopefully picking up a bit of a Scottish tone in my otherwise Swenglish accent.

Linköping has been very good to me, and so have my colleagues in the Wright lab and AVIAN Behavioural Genetics and Physiology group. So, naturally, I’m both happy and sad to leave. Friends in Linköping, we will meet again.

Also, happy new year!

20170101_150010

(Me holding a sign that says (in Swedish): ”Thank you, Formas! I will do my very best.”)

Reviewing, postscript

Later the same day as the post on reviewing was published, I saw the paper by Kovanis and coworkers on the burden of peer review in biomedical literature. It’s silly of me that it didn’t occur to me to look for data on how many papers researchers review. Their first figure shows data on the number of reviews performed 2015 by Publons users:

kovanis_reviewers_figure

Figure 1B from Kovanis & al (2016) PLOS ONE (cc:by 4.0).

If we take these numbers at face value (but we probably shouldn’t, because Publons users seem likely to be a bised sample of researchers), my 4-6 reviews in a year fall somewhere in the middle: on the one hand, more than half of the researchers review fewer papers, but it’s a lot less than those who review the most.

This paper estimates the supply and demand of reviews in biomedical literature. The conclusion is lot like the above graph: reviewer effort is unevenly distributed. In their discussion, the authors write:

Besides, some researchers may be willing to contribute but are never invited. An automated method to improve the matching between submitted articles and the most appropriate candidate peer reviewers may be valuable to the scientific publication system. Such a system could track the number of reviews performed by each author to avoid overburdening them.

This seems right to me. There may be free riders who refuse to pull their weight. But there are probably a lot more of people like me, who could and would review more if they were asked to. A way for editors to find them (us) more easily would probably be a good thing.

Morning coffee: reviewing

20160417_125609

(It was a long time since I did one of these posts. I’d better get going!)

One fun thing that happened after I received my PhD is that I started getting requests to review papers, four so far. Four papers (plus re-reviews of revised versions) in about a year probably isn’t that much, but it is strictly greater than zero. I’m sure the entertainment value in reviewing wears off quite fast, but so far it’s been fun, and feels good to pay off some of the sizeable review debt I’ve accumulated while publishing papers from my PhD. Maybe I’m just too naïve and haven’t seen the worst parts of the system yet, but I don’t feel that I’ve had any upsetting revelations from seeing the process from the reviewer’s perspective.

Of course, peer review, like any human endeavour, has components of politics, ego and irrationality. Maybe one could do more to quell those tendencies. I note that different journals have quite different instructions to reviewers. Some provide detailed directions, laying out things that the reviewer should and shouldn’t do, while others just tell you how to use their web form. I’m sure editorial practices also differ.

One thing that did surprise me was when an editor changed the text of a review I wrote. It was nothing major, not a case of removing something inappropriate, but rewording a recommendation to make it stronger. I don’t mind, but I feel that the edit changed the tone of the review. I’ve also heard that this particular kind of comment (when a reviewer states that something is required for a paper to be acceptable for publication) rubs some people the wrong way, because that is up to the editor to decide. In this case, the editor must have felt that a more strongly worded review was the best way to get the author to pay attention, or something like that. I wonder how often this happens. That may be a reason to be even more apprehensive about signing reviews (I did not sign).

So far, I’ve never experienced anything else than single-blind review, but I would be curious to review double-blinded. I doubt the process would differ much: I haven’t reviewed any papers from people I know about, and I haven’t spent any time trying to learn more about them, except in some cases checking out previous work that they’ve referenced. I don’t expect that I’d feel any urge to undertake search engine detective work to figure out who the authors were.

Sometimes, there is the tendency among scientists and non-scientists alike to elevate review to something more than a couple of colleagues reading your paper and commenting on it. I’m pretty convinced peer review and editorial comments improve papers. And as such, the fact that a paper has been accepted by an editor after being reviewed is some evidence of quality. But peer review cannot be a guarantee of correctness. I’m sure I’ve missed and misunderstood things. But still, I promise that I’ll do my best, and I will not have the conscience to turn down a request for peer review for a long time. So if you need a reviewer for a paper on domestication, genetic mapping, chickens or related topics, keep me in mind.

Paper: ”Feralisation targets different genomic loci to domestication in the chicken”

It is out: Feralisation targets different genomic loci to domestication in the chicken. This is the second of our papers on the Kauai feral and admixed chicken population, and came out a few days ago.

The Kauai chicken population is kind of famous: you can find them for instance on Flickr, or on YouTube. We’ve previously looked at their plumage, listened to the roosters’ crowings, and sequenced mitochondrial DNA to investigate their origins. Based on this, we concur with the common view that the chickens of Kauai probably are a mixture of feral birds of domestic origin and wild Junglefowl. The Kauai chickens look and sound like a mix of wild and domestic, and we found mitochondrial DNA of two haplogroups, one of which (called D) is typical in ancient chicken DNA from Pacific islands (Gering et al 2015).

In this paper, we looked at the rest of the genome of the same chickens — you didn’t think we sequenced the whole thing just to look at the mitochondrion plus a subset of markers, did you? We turn to population genomics, and a family of methods called selective sweep mapping, to search for regions of their genome that show signs of being affected by natural selection. This lets us: 1) draw pretty rainbow plots such as  this one …

kauai2_fig1a

(Figure 1a from the paper in question, Johnsson & al 2016. cc:by The chromosomes have been laid out on the horizontal axis with different colours, and split into windows of 40 kb. Each dot represents the heterozygosity of that windows. For all the details, see the paper.)

… 2) highlight a regions of the genome that may have been selected during feralisation on Kauai (these are the icicles in the graph, highligthed by arrows); 3) conclude that the regions that look like they’ve been selected in feralisation overlap very little with the ones that look like they’ve been selected in chicken domestication. Hence the title.

That was the main result, but of course we also look at what genes are highlighted. Mostly we have no idea how they may contribute to feralisation, but a couple of regions overlap with those that we’ve previously found in genetic mapping of comb size and egg laying in our wild-by-domestic intercross. We also compare the potentially selected regions to domestic chicken sequences.

Last year, Ewen Callaway visited Dominic Wright, Eben Gering and Rie Henriksen on the last fieldtrip to Kauai. The article, When chickens go wild, was published in Nature News in January, and it explains a lot of the ideas nicely. This paper was submitted by then, so the samples they gathered on that trip do not feature in it. But, spoiler alert: there is more to come. (I don’t know what role I personally will play, but that is less important.)

As you may have guessed if you looked at the author list, this was a collaboration between quite a lot of people in Linköping, Michigan, London, and Victoria. Thanks to all involved! This was great fun, and for those of you who like this sort of thing, I hope the paper will be an interesting read.

Literature

M. Johnsson, E. Gering, P. Willis, S. Lopez, L. Van Dorp, G. Hellenthal, R. Henriksen, U. Friberg & D. Wright. (2016) Feralisation targets different genomic loci to domestication in the chicken. Nature Communications. doi:10.1038/ncomms12950

Toying with models: The Game of Life with selection

Conway’s Game of life is probably the most famous cellular automaton, consisting of a grid of cells developing according simple rules. Today, we’re going to add mutation and selection to the game, and let patterns evolve.

The fate of a cell depends on the number cells that live in the of neighbouring positions. A cell with fewer than two neighbours die from starvation. A cell with more than three neighbours die from overpopulation. If a position is empty and has three neighbours, it will be filled by a cell. These rules lead to some interesting patterns, such as still lives that never change, oscillators that alternate between states, patterns that eventually die out but take long time to do so, patterns that keep generating new cells, and so forth.

oscillators still_life

When I played with the Game of life when I was a child, I liked one pattern called ”virus”, that looked a bit like this. On its own, a grid of four-by-four blocks is a still life, but add one cell (the virus), and the whole pattern breaks. This is a version on a 30 x 30 cell board. It unfolds rather slowly, but in the end, a glider collides with a block, and you are left with some oscillators.

blocks virus

There are probably other interesting ways that evolution could be added to the game of life. We will take a hierarchical approach where the game is taken to describe development, and the unit of selection is the pattern. Each generation, we will create a variable population of patterns, allow them to develop and pick the fittest. So, here the term ”development” refers to what happens to a pattern when applying the rules of life, and the term ”evolution” refers to how the population of patterns change over the generations. This differ slightly from Game of life terminology, where ”evolution” and ”generation” usually refer to the development of a pattern, but it is consistent with how biologists use the words: development takes place during the life of an organism, and evolution happens over the generations as organisms reproduce and pass on their genes to offspring. I don’t think there’s any deep analogy here, but we can think of the initial state of the board as the heritable material that is being passed on and occasionally mutated. We let the pattern develop, and at some point, we apply selection.

First, we need an implementation of the game of life in R. We will represent the board as a matrix of ones (live cells) and zeroes (empty positions). Here is function develops the board one tick in time. After dealing with the corners and edges, it’s very short, but also slow as molasses. The next function does this for a given number of ticks.

## Develop one tick. Return new board matrix.
develop <- function(board_matrix) {
  padded <- rbind(matrix(0, nrow = 1, ncol = ncol(board_matrix) + 2),
                  cbind(matrix(0, ncol = 1, nrow = nrow(board_matrix)), 
                        board_matrix,
                        matrix(0, ncol = 1, nrow = nrow(board_matrix))),
                  matrix(0, nrow = 1, ncol = ncol(board_matrix) + 2))
  new_board <- padded
  for (i in 2:(nrow(padded) - 1)) {
    for (j in 2:(ncol(padded) - 1)) {
      neighbours <- sum(padded[(i-1):(i+1), (j-1):(j+1)]) - padded[i, j]
      if (neighbours < 2 | neighbours > 3) {
        new_board[i, j] <- 0
      }
      if (neighbours == 3) {
        new_board[i, j] <- 1
      }
    }
  }
  new_board[2:(nrow(padded) - 1), 2:(ncol(padded) - 1)]
}

## Develop a board a given number of ticks.
tick <- function(board_matrix, ticks) {
  if (ticks > 0) {
    for (i in 1:ticks) {
      board_matrix <- develop(board_matrix) 
    }
  }
  board_matrix
}

We introduce random mutations to the board. We will use a mutation rate of 0.0011 per cell, which gives us a mean of a bout one mutation for a 30 x 30 board.

## Mutate a board
mutate <- function(board_matrix, mutation_rate) {
  mutated <- as.vector(board_matrix)
  outcomes <- rbinom(n = length(mutated), size = 1, prob = mutation_rate)
  for (i in 1:length(outcomes)) {
    if (outcomes[i] == 1)
      mutated[i] <- ifelse(mutated[i] == 0, 1, 0)
  }
  matrix(mutated, ncol = ncol(board_matrix), nrow = nrow(board_matrix))
}

I was interested in the virus pattern, so I decided to apply a simple directional selection scheme for number of cells at tick 80, which is a while after the virus pattern has stabilized itself into oscillators. We will count the number of cells at tick 80 and call that ”fitness”, even if it actually isn’t (it is a trait that affects fitness by virtue of the fact that we select on it). We will allow the top half of the population to produce two offspring each, thus keeping the population size constant at 100 individuals.

## Calculates the fitness of an individual at a given time
get_fitness <- function(board_matrix, time) {
  board_matrix %>% tick(time) %>% sum
}

## Develop a generation and calculate fitness
grow <- function(generation) {
  generation$fitness <- sapply(generation$board, get_fitness, time = 80)
  generation
}

## Select a generation based on fitness, and create the next generation,
## adding mutation.
next_generation <- function(generation) {
  keep <- order(generation$fitness, decreasing = TRUE)[1:50]
  new_generation <- list(board = vector(mode = "list", length = 100),
                         fitness = numeric(100))
  ix <- rep(keep, each = 2)
  for (i in 1:100) new_generation$board[[i]] <- generation$board[[ix[i]]]
  new_generation$board <- lapply(new_generation$board, mutate, mutation_rate = mu)
  new_generation
}

## Evolve a board, with mutation and selection for a number of generation.
evolve <- function(board, n_gen = 10) { 
  generations <- vector(mode = "list", length = n_gen)

  generations[[1]] <- list(board = vector(mode = "list", length = 100),
                           fitness = numeric(100))
  for (i in 1:100) generations[[1]]$board[[i]] <- board
  generations[[1]]$board <- lapply(generations[[1]]$board, mutate, mutation_rate = mu)

  for (i in 1:(n_gen - 1)) {
    generations[[i]] <- grow(generations[[i]])
    generations[[i + 1]] <- next_generation(generations[[i]])
  }
  generations[[n_gen]] <- grow(generations[[n_gen]])
  generations
}

Let me now tell you that I was almost completely wrong about what happens with this pattern once you apply selection. I thought that the initial pattern of nine stable blocks (36 cells) was pretty good, and that it would be preserved for long, and that virus-like patterns (like the first animation above) would mostly have degenerated around 80. As this plot of the evolution of the number of cells in one replicate shows, I grossly underestimated this pattern. The y-axis is number of cells at time 80, and the x-axis individuals, the vertical lines separating generations. Already by generation five, most individuals do better than 36 cells in this case:

blocks_trajectory_plot

As one example, here is the starting position and the state at time 80 for a couple of individuals from generation 10 of one of my replicates:

blocks_g10_1 blocks_g10_80

blocks_g10_1b blocks_g10_80b

Here is how the average cell number at time 80 evolves in five replicates. Clearly, things are still going on at generation 10, not only in the replicate shown above.

mean_fitness_blocks

Here is the same plot for the virus pattern I showed above, i.e. the blocks but with one single added cell, fixed in the starting population. Prior genetic architecture matters. Even if the virus pattern has fewer cells than the blocks pattern at time 80, it is apparently a better starting point to quickly evolve more cells:

mean_fitness_virus

And finally, out of curiosity, what happens if we start with an empty 30 x 30 board?

mean_fitness_blank

Not much. The simple still life block evolves a lot. But in my replicate three, this creature emerged. ”Life, uh, finds a way.”

blank_denovo

Unfortunately, many of the selected patterns extended to the edges of the board, making them play not precisely the game of life, but the game of life with edge effects. I’d like to use a much bigger board and see how far patterns extend. It would also be fun to follow them longer. To do that, I would need to implement a more efficient way to update the board (this is very possible, but I was lazy). It would also be fun to select for something more complex, with multiple fitness components, potentially in conflict, e.g. favouring patterns that grow large at a later time while being as small as possible at an earlier time.

Code is on github, including functions to display and animate boards with the animation package and ImageMagick, and code for the plots. Again, the blocks_selection.R script is slow, so leave it running and go do something else.

Toying with models: The Luria–Delbrück fluctuation test

I hope that Genetics will continue running expository papers about their old classics, like this one by Philip Meneely about Luria & Delbrück (1943). Luria & Delbrück performed an experiment on bacteriophage resistance in Escherichia coli, growing bacterial cultures, exposing them to a phage, and then plating and counting the survivors, who have become resistant to the phage. They considered two hypotheses: either resistance occurs adaptively, in response to the phage, or it occurs by mutation some time during the growth of the culture but before the phages are added. They find the latter to be the case, and this is an example of how mutations happen irrespective of their effects of fitness, in a sense at random. Their analysis is based on a model of bacterial growth and mutation, and the aim of this exercise is to explore this model by simulating some data.

First, we assume that mutation happens with a fixed mutation rate \mu = 2 \cdot 10^{-8} , which is quite close to their estimated value, and that the mutation can’t reverse. We also assume that the bacteria grow by doubling each generation up to 30 generations. We start a culture from a single susceptible bacterium, and let it grow for a number of generations before the phage is added. (We’re going to use discrete generations, while Luria & Delbrück use a continuous function.) Then:

n_{susceptible,i+1}= 2 (n_{susceptible,i} - n_{mutants,i})

n_{resistant,i+1} = 2 (n_{resistant,i} + n_{mutants,i})

That is, every generation i, the mutants that occur move from the susceptible to the resistant category. The number of mutants that happen among the susceptible is binomially distributed:

n_{mutants,i} \sim Binomial(n_{susceptible,i}, \mu) .

This is an R function to simulate a culture:

culture <- function(generations, mu) {
  n_susceptible <- numeric(generations)
  n_resistant <- numeric(generations)
  n_mutants <- numeric(generations)
  n_susceptible[1] <- 1
  for (i in 1:(generations - 1)) {
    n_mutants[i] <- rbinom(n = 1, size = n_susceptible[i], prob = mu)
    n_susceptible[i + 1] &lt;- 2 * (n_susceptible[i] - n_mutants[i])
    n_resistant[i + 1] &lt;- 2 * (n_resistant[i] + n_mutants[i])
  }
  data.frame(generation = 1:generations,
             n_susceptible,
             n_resistant,
             n_mutants)
}
cultures <- replicate(1000, culture(30, 2e-8), simplify = FALSE)

We run a few replicate cultures and plot the number of resistant bacteria. This graph shows the point pretty well: Because of random mutation and exponential growth, the cultures where mutations happen to arise relatively early will give rise to a lot more resistant bacteria than the ones were the first mutations are late. Therefore, there will be a lot of variation between the cultures because of their different histories.

resistant

combined <- Reduce(function (x, y) rbind(x, y), cultures)
combined$culture <- rep(1:1000, each = 30)

resistant_plot <- qplot(x = generation, y = n_resistant, group = culture,
      data = combined, geom = "line", alpha = I(1/10), size = I(1)) + theme_bw()

We compare this to what happens under the alternative hypothesis where resistance arises as a consequence of introduction of the phage with some resistance rate (this is not the same as the mutation rate above, even though we’re using the same value). Then the number of resistant cells in a culture will be: n_{acquired} \sim Binomial(2^{29}, \mu_{aquried}) .

resistant <- unlist(lapply(cultures, function(x) max(x$n_resistant)))

acquired_resistant <- rbinom(n = 1000, size = 2^29, 2e-8)

resistant_combined <- rbind(transform(data.frame(resistant = acquired_resistant), model = "acquired"),
                            transform(data.frame(resistant = resistant), model = "mutation"))

resistant_histograms <- qplot(x = resistant, data = resistant_combined,bins = 10) +
  facet_wrap(~ model, scale = "free_x")

histograms

Here are two histograms side by side to compare the cases. The important thing is the shape. If the acquired resistance hypothesis holds, the number of resistant bacteria in replicate cultures follows a Poisson distribution, because it arises when one counts the number of binomially distributed events that occur in a given number of trials. The interesting thing about the Poisson distribution in this case is that its mean is equal to the variance. However, under the mutation model (as we’ve already illustrated), there is a lot of variation between cultures. These fluctuations make the variance much larger than the mean, which is also what Luria and Delbrück found in their data. Therefore, the results are inconsistent with acquired mutation, and hence the experiment is called the Luria–Delbrück fluctuation test.

mean(resistant)
var(resistant)
mean(acquired_resistant)
var(acquired_resistant)

Literature

Luria, S. E., & Delbrück, M. (1943). Mutations of bacteria from virus sensitivity to virus resistance. Genetics, 28(6), 491.

Meneely, P. M. (2016). Pick Your Poisson: An Educational Primer for Luria and Delbrück’s Classic Paper. Genetics, 202(2), 371-375.

Code on github.