There is grandeur in this view of life

martins bioblogg

Using R: writing a table with odd lines (GFF track headers)

leave a comment »

The other day, I wanted to add track lines to a GFF file, so that I could view different features as separate custom tracks in a genome browser. The need to shuffle genome coordinates between different file formats seems to occur all the time when you deal with some kind of bioinformatic data. It’s usually just text files; one just has to keep track of whether the positions should start on 0 or 1 and whether the end should include the last base or not . . .

> head(gff)

  seqname         source        feature     start       end score strand
1       5 protein_coding           mRNA 169010747 169031776     .      +
2       5 protein_coding        protein 169015421 169021641     .      +
3       5 protein_coding five_prime_UTR 169010747 169010893     .      +
4       5 protein_coding five_prime_UTR 169015398 169015420     .      +
5       5 protein_coding            CDS 169015421 169015579     .      +
6       5 protein_coding            CDS 169018052 169018228     .      +
  frame                                                     group
1     . ID=ENST00000504258;Name=CCDC99-005;Parent=ENSG00000040275
2     . ID=ENSP00000421249;Name=CCDC99-005;Parent=ENST00000504258
3     .                                    Parent=ENST00000504258
4     .                                    Parent=ENST00000504258
5     0                    Name=CDS:CCDC99;Parent=ENST00000504258
6     0                    Name=CDS:CCDC99;Parent=ENST00000504258

The above example consists of a few lines from the Ensembl human database, not the actual tracks I was interested in. Anyway, this is what I did: instead of using write.table() directly, explicitly open a file for writing, first write some track line, then write the relevant subset, and repeat.

tracks <- unique(gff$feature)
connection <- file("separate_tracks.gff", "w")
for (k in 1:length(tracks)) {
  writeLines(paste("track name=", tracks[k], sep=""), connection)
  write.table(subset(gff, feature==tracks[k]),
              sep="\t", row.names=F, col.names=F,
              quote=F, file=connection)
}
close(connection)
About these ads

Written by mrtnj

28 januari, 2013 den 21:00

Publicerat i med mera

Taggad med ,

Kommentera

Fyll i dina uppgifter nedan eller klicka på en ikon för att logga in:

WordPress.com Logo

Du kommenterar med ditt WordPress.com-konto. Logga ut / Ändra )

Twitter-bild

Du kommenterar med ditt Twitter-konto. Logga ut / Ändra )

Facebook-foto

Du kommenterar med ditt Facebook-konto. Logga ut / Ändra )

Google+ photo

Du kommenterar med ditt Google+-konto. Logga ut / Ändra )

Ansluter till %s

Följ

Få meddelanden om nya inlägg via e-post.

Gör sällskap med 1 118 andra följare

%d bloggers like this: