There is grandeur in this view of life

martins bioblogg

More Haskell: a bootstrap

leave a comment »

So my playing around with Haskell goes on. You can follow the progress of the little bootstrap exercise on github. Now it’s gotten to the point where it actually does a bootstrap interval for the mean of a sample. Consider the following R script:

n <- 100
fake.data <- data.frame(group=rep(1, n), data=rpois(n, 10))
write.table(fake.data, quote=F, row.names=F, col.names=F,
            sep=",", file="fake_data.csv")

library(plyr)
bootstrap.replicates <- llply(vector("list", 100),
                              sample, x=fake.data$data,
                              replace=T, size=n)
bootstrap.means <- unlist(llply(bootstrap.replicates, mean))
print(mean(fake.data$data))
print(quantile(bootstrap.means, c(0.025, 0.975)))
[1] 10.31
    2.5%    97.5% 
 9.72475 10.85200

So, that was a simple bootstrap in R: we get some draws from a Poisson distribution, sample 100 times from the data with replacement, and summarise the replicates.  This is my Haskell thing running in GHCi:

*Main> main
"boot"
"will eventually bootstrap, if martin knows his stuff"
fake_data.csv
[8,6,11,16,5,11,12,12,7,9,13,13,12,7,13,7,7,11,9,14,14,13,10,14,17,12,8,
10,15,12,13,13,7,10,9,6,7,8,10,12,10,10,10,12,11,8,16,12,13,13,12,15,7,
7,8,9,5,7,13,10,12,11,8,6,12,14,12,14,6,9,10,9,10,6,9,7,6,12,13,7,11,7,
13,15,10,10,9,12,12,6,10,6,8,10,13,8,9,13,12,13]
10.31
(9.8,10.83)

It’s certainly not the prettiest thing in the world (for one thing, it will crash if there is an extra line break at the end of the file). Next stop: type declarations! Haskell will infer the types for me, but it is probably a good idea to declare the intended types. Or at least to be able to do so is. Then the plan is to make some use of the first column in the data file, i.e. group the sample belongs to, to add a second sample and make a comparison between the means. And then it’s pretty much done and maybe I’ll move on to something more useful. I’m thinking that implementing least squares linear models would be a decent exercise?

About these ads

Written by mrtnj

16 februari, 2013 at 10:55

Publicerat i data analysis, english

Tagged with ,

Kommentera

Fyll i dina uppgifter nedan eller klicka på en ikon för att logga in:

WordPress.com Logo

Du kommenterar med ditt WordPress.com-konto. Logga ut / Ändra )

Twitter-bild

Du kommenterar med ditt Twitter-konto. Logga ut / Ändra )

Facebook-foto

Du kommenterar med ditt Facebook-konto. Logga ut / Ändra )

Google+ photo

Du kommenterar med ditt Google+-konto. Logga ut / Ändra )

Ansluter till %s

Följ

Få meddelanden om nya inlägg via e-post.

Gör sällskap med 1 168 andra följare

%d bloggare gillar detta: