Using R: Don’t save your workspace

To everyone learning R: Don’t save your workspace.

When you exit an R session, you’re faced with the question of whether or not to save your workspace. You should almost never answer yes. Saving your workspace creates an image of your current variables and functions, and saves them to a file called ”.RData”. When you re-open R from that working directory, the workspace will be loaded, and all these things will be available to you again. But you don’t want that, so don’t save your workspace.

Loading a saved workspace turns your R script from a program, where everything happens logically according to the plan that is the code, to something akin to a cardboard box taken down from the attic, full of assorted pages and notebooks that may or may not be what they seem to be. You end up having to put an inordinate trust in your old self. I don’t know about your old selves, dear reader, but if they are anything like mine, don’t save your workspace.

What should one do instead? One should source the script often, ideally from freshly minted R sessions, to make sure to always be working with a script that runs and does what it’s supposed to. Storing a data frame in the workspace can seem comforting, but what happens the day I overwrite it by mistake? Don’t save your workspace.

Yes, I’m exaggerating. When using any modern computer system, we rely on saved information and saved state all the time. And yes, every time a computation takes too much time to reproduce, one should write it to a file to load every time. But I that should be a deliberate choice, worthy of its own save() and load() calls, and certainly not something one does with simple stuff that can be reproduced a the blink of an eye. Put more trust in your script than in your memory, and don’t save your workspace.

9 reaktioner på ”Using R: Don’t save your workspace

  1. Get the best of both worlds, use RMarkdown files that keeps your script up-to-date, and Knitr option `knitr::opts_chunk$set(cache = TRUE)` to save the state of each chunk as cache files. You want to get back to where you were? Just knit your Rmarkdown file in a fresh session, it’ll get everything back to place in a breeze (without all the clutter that you may have added outside of your script).

  2. Second this. Too bad R defaults to saving when quitting. Using `alias R =’R –no-save’` made my scripts so much more reproducible.

  3. I respectfully disagree 100%. Yes, it is neater and cleaner to store data you’ve generated in dedicated .rdata files. No, that doesn’t always happen. Further, I do not want to have to run source() and library() on a few dozen tools every time I start up.
    Frankly, you seem to be suffering from MATLAB-osis, where I all too often people run clr, clf, clear all at the top of every script because they’re scared to death MATLAB will crash on them (as it often does).

    • I’ve never had a problem with R crashing a lot, nor do I suffer the misfortune of having to use Matlab. But to each their own.

  4. Hi! This was very informative to me!! Thanks! I figures how to svae the script separatedly, but how do I save my data (only the data)w? I am making a lot of manipulation on it and would like to save this new data. But every time I try to save .rda is saves the entire workspace again.

Kommentarer är stängda.