I’ve been developing a package where I needed a function to take numerous different actions (different mutations) depending on the values of different variables within each row of a dataframe. I started off by using a series of nested dplyr::if_else functions inside of a dplyr::mutate call. I ended up with a bit of a mess, perhaps a dozen or so if_else calls… that’s when I got some abuse from my colleague following a Github pull request.
There must be a better way? A quick Google got me here. A neat idea, but I noticed the suggestion of the new-ish dplyr::case_when function in the comments section of this excellent blog post. I’ve taken up the baton, by exploring this function in this post.
The example from the ?case_when is quite informative; I give a visual interpretation (see the code comments for details). However, we should explore it further and apply it to a dataframe to expand our understanding.
The same outcome but which did you find the more readable? (Imagine you were quality assuring this code?)
As pointed out in the case_when help examples, ordering is important where you want to go from most specific to least specific. In the example below we wanted the Mazda RX4 Wag to be labbelled as a Mazda Wagon in the newly created brand variable. This failed due to our ordering; to suceed we should move this before the left hand side (LHS) first argument. Notice how the right hand side of the ~ provides the replacement value. Try replacing the "Wow" with a numeric 50, what happens when you run the code?
case_when is still somewhat new and experimental. For now I may stick with nested if_else statements until this is more stable and works well within mutate despite case_when being a bit easier to read. If you play around with this demo code it’s quite easy to break, this may be in part to some useful non-standard evaluation intrinsic to mutate. For example replacing the & with && causes it to error. Try it with your own data and keep your eyes peeled for further developments! In Hadley we trust.
A relatively new offering from the tidyverse on making nested if_else statements more readible.