Using R: Don’t save your workspace

To everyone learning R: Don’t save your workspace.

When you exit an R session, you’re faced with the question of whether or not to save your workspace. You should almost never answer yes. Saving your workspace creates an image of your current variables and functions, and saves them to a file called ”.RData”. When you re-open R from that working directory, the workspace will be loaded, and all these things will be available to you again. But you don’t want that, so don’t save your workspace.

Loading a saved workspace turns your R script from a program, where everything happens logically according to the plan that is the code, to something akin to a cardboard box taken down from the attic, full of assorted pages and notebooks that may or may not be what they seem to be. You end up having to put an inordinate trust in your old self. I don’t know about your old selves, dear reader, but if they are anything like mine, don’t save your workspace.

What should one do instead? One should source the script often, ideally from freshly minted R sessions, to make sure to always be working with a script that runs and does what it’s supposed to. Storing a data frame in the workspace can seem comforting, but what happens the day I overwrite it by mistake? Don’t save your workspace.

Yes, I’m exaggerating. When using any modern computer system, we rely on saved information and saved state all the time. And yes, every time a computation takes too much time to reproduce, one should write it to a file to load every time. But I that should be a deliberate choice, worthy of its own save() and load() calls, and certainly not something one does with simple stuff that can be reproduced a the blink of an eye. Put more trust in your script than in your memory, and don’t save your workspace.

Annonser

7 thoughts on “Using R: Don’t save your workspace

  1. Get the best of both worlds, use RMarkdown files that keeps your script up-to-date, and Knitr option `knitr::opts_chunk$set(cache = TRUE)` to save the state of each chunk as cache files. You want to get back to where you were? Just knit your Rmarkdown file in a fresh session, it’ll get everything back to place in a breeze (without all the clutter that you may have added outside of your script).

  2. I respectfully disagree 100%. Yes, it is neater and cleaner to store data you’ve generated in dedicated .rdata files. No, that doesn’t always happen. Further, I do not want to have to run source() and library() on a few dozen tools every time I start up.
    Frankly, you seem to be suffering from MATLAB-osis, where I all too often people run clr, clf, clear all at the top of every script because they’re scared to death MATLAB will crash on them (as it often does).

Kommentera

Fyll i dina uppgifter nedan eller klicka på en ikon för att logga in:

WordPress.com Logo

Du kommenterar med ditt WordPress.com-konto. Logga ut / Ändra )

Twitter-bild

Du kommenterar med ditt Twitter-konto. Logga ut / Ändra )

Facebook-foto

Du kommenterar med ditt Facebook-konto. Logga ut / Ändra )

Google+ photo

Du kommenterar med ditt Google+-konto. Logga ut / Ändra )

Ansluter till %s