Author: by Andy Nicholls, Richard Pugh, Aimee Gott
Audience: Budding data scientists
Reviewer: Mike James
R is important so learning it is important.
Of course you can't learn it in 24 hours no matter what the book says - more like 24 months. R, data and statistics is a tough subject. Not impossible, but there is a lot of it.
The biggest question any book on R has to settle is whether it is going to teach you statistics as well as programming. This particular book mostly, but not entirely, leaves the stats to others. This is mostly about how to program in R and how to work with data. Any statistician will tell you that most of the work in actually doing an analysis is in getting the data into the right form and this is where R, and a knowledge of how to use it to transform and patch up data, is valuable.
The book is divided into 24 sections, each of which is supposed to take an hour. They don't take an hour. They don't even take that same amount of time as each one covers a different amount of ground. So don't worry if you take longer over one than another - it's not you, it's the book.
The first hour is a nice history of R. Unlike most first chapters telling imparting background knowledge this one is interesting and more important than usual. From here we move on to the normal introductory topics - getting started with R and using an IDE. In Hour 2 we first encounter the idea of R objects. R isn't really an object-oriented language and if you have a background in programming you might be very confused by what R calls objects. Don't read too much into the fact that something is called an object.
The next two chapters are about data structures and R has some complicated types and ways of working with them. This is where statistical data processing is different from other types of data processing so spending two whole sections on the topic makes good sense. The next hour continues the data theme with a look at how to work with dates and times and factors - basically categorical variables used to organize data.
Hour 6 marks a move into more traditional programming topics. starting off with using R functions. Next we have two hours on writing your own R functions. This is probably too early in the book so if you want to skip and come back this is a good idea - but note that this is where If/Else conditionals are introduced. Hours 9 completes the look at flow of control with loops. This isn't really enough to teach you to program in four hours but it isn't bad. There is so much you can do with R without knowing how to program that is doesn't matter as much if you don't get all of this part of the book immediately.
At this point the book goes back to the all important data. Hour 10 is about importing data - Excel and SQL. If only all out data came from these simple to use sources. In practice it comes from a much wider range of devices and programs and this is why we need to learn how to work with data in ad-hoc ways.
Hour 11 continues to look at data with topics such as restructuring data, merging, aggregation and so on. Hour 12 deals with efficiency and it covers dyplyr, which is unusually advanced for an introductory book.
The next three hours are a course on graphics - statistical graphics that is - with charts of all types as well as basic raw graphics elements.
Hours 16 and 17 deal with statistics - linear models to be exact. The fist of these chapters covers simple regression, multiple regression and ANOVA. If you didn't know much about the methods before you read this you won't know much afterwards. The emphasis is on how to use R to compute these models rather than the statistical aspects. A final section deals with R and object-orientation - it might as well be left out and why it is in with models is a mystery. In the second hour we move on to time series and generalized linear models.
The final few hours are spent on miscellaneous topics - code efficiency, package building, classes, dynamic reporting and building web sites using Shiny.
Overall this is a good book. The most important thing to know is that it won't teach you statistics and it is about programming in R. The only real problem is that the presentation isn't from a programming point of view. If you are a programmer and want to learn R then you will find it hard going because it hardly makes use of the deeper ideas of programing to make things seem easier and logical. From this point of view everything is a special case and you just have to learn how it works. If you aren't a programmer then you are likely to find the approach good as long as you take your time and come back to the book when you need to clarify some aspect of using R.
Recommended as long as you know stats and don't want to become a programmer in general - just an R programmer.
To keep up with our coverage of books for programmers, follow @bookwatchiprog on Twitter or subscribe to I Programmer's Books RSS feed for each day's new addition to Book Watch and for new reviews.
Related Reviews And Articles
Beginning R: The Statistical Programming Language
Data Mashups in R
Getting Started with R
R in Action
The Art of R Programming
A Programmer's Guide to R - Data and Objects