Author: Norman Matloff
Publisher: No Starch Press
Audience: Stats programmers
Reviewer: Mike James
Billed as a tour of statistical software design does this really mean "no statistical knowledge required?"
The big problem with providing an introduction to a programming language like R that is associated with a particular task - i.e. statistics - is that there is a tendency to teach the stats rather than the language.
Of course you can't explain the language without giving some examples that involve statistics and you will find quite a few in this book. The focus however is on programming using R.
The first chapter explains how to get started with R and then gives an overview of the key ideas - many of which are explained in much more detail later on. The chapter ends with an example of using R to perform a regression analysis.
Chapters 2 though 6 deal with the main R data structures: Vectors; Matrices and arrays; Lists, Data frames; Factors and tables. There is no doubt that R's data structures are what cause most problem when first getting to know the language because they tend not to be the "pure" data structures you find in the rest of computing, but instead are customized with practical facilities to make handling stats data easier. Here you discover not only what the data structures are, but also how to get data in and out and how they behave in simple operations.
Chapter 7 is where we get to grips with the control structures of R - loops, if and so on. We also learn about other language issues such as functions and scope. One of the interesting things about R is that in many cases you don't need to master the control structures because what you want to do can be achieved by loading a data structure and calling a function. However, if you are going to do more with R than standard analysis you do have to program in the wider sense.
We now move into more specialized topics. Chapter 8 is on doing maths - mostly linear algebra - and implementing simulations. Chapter 9 is on object-oriented programming. I'm not convinced by R's object-oriented facilities and I think that for a lot of work they are best avoided. Chapter 10 deals with more sophisticated I/O, including internet-oriented I/O using sockets. Chapter 11 is an extended essay on string manipulation which is more sophisticated in R than you might expect. Chapter 12 is on drawing graphs and charts.
From here the book focuses on language and programming topics. Chapter 13 is on debugging using the limited but usable R debugging tools. Next we have a chapter on performance optimization, which tackles the main issues that most programmers have with R - is functional code faster than procedural? Chapter 15 is about using R with other languages - C and Python to be exact. Finally, Chapter 16 goes into parallel R and making the best use of multiple cores.
This is a good book if you want to learn the important facts of life about programming in R. It isn't a book that will teach you statistics, however, and nor should it be. If you want a stats primer find another book. This one explains the features that make R different and takes you through all you need to know to perform standard analysis and implement custom procedures. There are plenty of examples, but another important caveat is that this isn't a cookbook. You have to read, understand and then implement your own solutions to the problems that arise in everyday data analysis.
If you aren't already an R expert, read this book and you will be well on your way to becoming one. Recommended.