|Programming Skills for Data Science|
|Written by Mike James|
Authors: Michael Freeman and Joel Ross
As I've said before, the big problem with writing a book on R is whether it should concentrate on the programming language or the statistical procedures via the use of the language. This particular book is more toward the programming language with some simple statistical procedures - mostly with graphics acting as examples.
It starts off, Part I Chapter 1, with setting up your computer and very sensibly covers IDEs including RStudio, which is the obvious one to use. Don't try using just a text editor to program in R - it will cost you a lot of time unless you are already an expert and even then a good IDE will save you from mistakes. It also covers setting up GitHub which plays a moderately central role in the rest of the book. Collaboration is common in statistical work but it still isn't clear to me that Git or GitHub is a key component - it certainly makes things more complicated at first.
Chapter 2 shows you how to use R from the command line, including navigating the file system. Useful stuff, but I think it should be in an appendix.
Part II is about using Git but mainly via GitHub to manage your projects. If you don't plan using GitHub skip on to Part III. Chapter 4 is also about markdown as a way of creating simple documentation, another useful skill.
Part III gets to grips with R. It goes though the basics of variables, functions, conditionals and lists. Personally I think it should cover Data Frames as the ultimate R data structure, but this is postponed until Part IV. All of the descriptions are good and easy to read. There is a lot of intelligent writing in this part of the book - in fact there is a lot of intelligent writing in most of the book. This isn't a dummies book and you need to read it carefully.
Part IV is moving towards statistics but it is still mostly about using the R language to manipulate data. After a brief look at the generalities of data the book moves on to Data Frames. Then on to manipulating data mostly using the dplyr and the tidyr functions. Chapter 13 is a short introduction to accessing a SQL database. Chapter 14 covers REST and accessing web data including JSON.
Part V is much more about stats but only simple graphs and charts. Here you learn to plot with ggplot2, plotly, rbokeh and leaflet. Part VI returns to programming aspects of using R. Chapter 18 deals with dynamic reports using markdown, 19 is about websites using Shiny and 20 returns to the idea of using GitHub for collaboration. The final chapter provides some guidance on learning statistics, other language and so on.
This book will not teach you much about statistics apart from some very basic ideas about data. I will teach you quite a lot about R. For my tastes not quite enough about R but it does a better job than other books I have reviewed. The writing style is, as I said earlier "intelligent". There are plenty of comments and asides to set the scene and it is all easy to read.
Highly recommended as an introduction to R and the programming practices that surround it. You will still need to teach yourself statistics, but that is another, and much bigger, problem.
|Last Updated ( Tuesday, 30 July 2019 )|