Author: Robert Kabacoff Publisher: Manning Pages: 608 ISBN: 9781617291388 Print: 1617291382 Audience: Statistical practitioners Rating: 4.7 Reviewer: Janet Swift
With everyone wanting to be a data scientist now is the time to learn R. Is R in Action the way to do it?
This is the second edition of a book that we thought a lot of when it was first reviewed back in 2011. It is still good!
This book could just as easily be reviewed in the Mathematics books section because it is very heavy on statistical practice. This isn't a bad thing, but if you are a programmer looking for a primer on R as a language you might need to look elsewhere.
This is not to say that the book is light on details of the R language, but the account is angled towards the practitioner wanting to get to grips with actually doing some statistics as quickly as possible. It really does answer the question of "how do I" and especially so if you are familiar with any other stats package.
The book is divided into four sections  Getting Started, Basic Methods, Intermediate Methods and Advanced Methods  titles that don't tell you much about what they contain!
Getting Started goes over the basics of working with R. At its end you will know what R is all about, how to install some data and plot some charts. The chapter on creating a dataset takes you quickly through the data structures that R supports, but from a user's, rather than programmer's, point of view. There are some nice asides aimed at the programmer such as telling you that R makes use of terminology in ways that you might not be used to. The final two chapters in the section expand on this to include details such as transforming and generally manipulating the data  which is a big part of any statistical analysis.
The book has a slight tendency to assume that you will remember anything that has been introduced earlier and doesn't often bother to make a reference either forward or back to where you can find out more about something. It's not a big problem, however, because you can always go hunting for yourself.
Once you get out of the first part the emphasis is very much on the statistics. Part II describes how to create basic charts and perform basic statistical tests  ttests and equivalent nonparametric tests. My guess is that this is the section of most use to the stats beginner who is trying to solve a homework problem using R.
Part III of the book begins to get more advanced and shows you how to perform regression and an Anova using R. This is to be welcome because R doesn't do these things in the same way as a package like SPSS say. The final part of the section deals with power analysis  i.e. how big a sample do I need, and resampling  both of which are unusual as "intermediate" topics but it is good to see them introduced so early. There is also a chapter on intermediate graphs and one on resampling procedures.
If regression and Anova are intermediate topics what can be waiting for the reader in an advanced section?
The answer is the generalized linear model and principle components/factor analysis, which are indeed advanced statistics even if they are basic tools in some disciplines.
Chapter 15 is all new and covers a topic that was ignored in the first edition  time series. This starts simple with getting time series data into R and moves up though smoothing and seasonal decomposition finishing up with ARIMA models.
Also new are the chapters on clustering and classifying. They cover the basic techniques that you should know hierarchical clustering, kmeans, logistic regression, decision trees and support vector machines but not classical discriminant analysis.
The final two chapters are also interesting  advanced missing values and advanced graphics. The treatment of missing values is worth buying the book for on its own  it is a topic that is all too often ignored.
Don't buy this book if you really don't have a clue about statistics  it isn't a statistics primer.
What it does is to assume that you know much of the theory, filling in some gaps for you, and then shows you how to do the job using R. Generally the description of how to achieve some task or other includes enough asides and comments to make you think about the task and perhaps even invent your own way of doing the job.
It also isnt' a good book to buy if you are looking for something to solve your programming problems  it isn't good on the specifics of importing data from a particular application, for example  but there are cookbooks for this and you can always search the web for a preprogrammed solution.
If you want to discover how to do statistics using R then this is a really good place to start. Highly recommended if you know stats and not R.
To keep up with our coverage of books for programmers, follow @bookwatchiprog on Twitter or subscribe to I Programmer's Books RSS feed for each day's new addition to Book Watch and for new reviews.
Apache Hadoop YARN
Authors: Arun C Murthy, Vinod Kumar Vavilapalli Publisher: Addison Wesley, 2014 Pages: 304 ISBN: 9780321934505 Aimed at: Programmers who want to learn about the most recent Hadoop Rating: 4 Reviewed by: Kay Ewbank
Subtitled "Moving Beyond MapReduce and Batch Processing with Apache Hadoop".

The Scrum Field Guide
Author: Mitch Lacey Publisher: Addison Wesley Pages: 480 ISBN: 9780133853629 Print: 0133853624 Kindle: B019PFBM3O Audience: Development Project Managers Rating: 4.5 Reviewer: Kay Ewbank
This practical guide to Scrum is an updated second edition of the popular title.
 More Reviews 
