Programming Skills for Data Science
Written by Mike James   

Authors: Michael Freeman and Joel Ross
Publisher: Addison-Wesley
Date: December 2018
Pages: 384
ISBN: 978-0135133101
Print: 0135133106
Kindle: B07KMDCHT2
Audience: Would-be data scientists
Rating: 5
Reviewer: Mike James
If you are looking for a programmer's guide to R, this might be it.

As I've said before, the big problem with writing a book on R is whether it should concentrate on the programming language or the statistical procedures via the use of the language. This particular book is more toward the programming language with some simple statistical procedures - mostly with graphics acting as examples.

It starts off, Part I Chapter 1,  with setting up your computer and very sensibly covers IDEs  including RStudio, which is the obvious one to use. Don't try using just a text editor to program in R - it will cost you a lot of time unless you are already an expert and even then a good IDE will save you from mistakes. It also covers setting up GitHub which plays a moderately central role in the rest of the book. Collaboration is common in statistical work but it still isn't clear to me that Git or GitHub is a key component - it certainly makes things more complicated at first.

Chapter 2 shows you how to use R from the command line, including navigating the file system. Useful stuff, but I think it should be in an appendix.

Part II is about using Git but mainly via GitHub to manage your projects. If you don't plan using GitHub skip on to Part III. Chapter 4 is also about markdown as a way of creating simple documentation, another useful skill.

 

Banner

Part III gets to grips with R. It goes though the basics of variables, functions, conditionals and lists. Personally I think it should cover Data Frames as the ultimate R data structure, but this is postponed until Part IV. All of the descriptions are good and easy to read. There is a lot of intelligent writing in this part of the book - in fact there is a lot of intelligent writing in most of the book. This isn't a dummies book and you need to read it carefully.

Part IV is moving towards statistics but it is still mostly about using the R language to manipulate data. After a brief look at the generalities of data the book moves on to Data Frames. Then on to manipulating data mostly using the dplyr and the tidyr functions. Chapter 13 is a short introduction to accessing a SQL database. Chapter 14 covers REST and accessing web data including JSON.

Part V is much more about stats but only simple graphs and charts. Here you learn to plot with ggplot2, plotly, rbokeh and leaflet. Part VI returns to programming aspects of using R. Chapter 18 deals with dynamic reports using markdown, 19 is about websites using Shiny and 20 returns to the idea of using GitHub for collaboration. The final chapter provides some guidance on learning statistics, other language and so on.

This book will not teach you much about statistics apart from some very basic ideas about data. I will teach you quite a lot about R. For my tastes not quite enough about R but it does a better job than other books I have reviewed. The writing style is, as I said earlier "intelligent". There are plenty of comments and asides to set the scene and it is all easy to read.

Highly recommended as an introduction to R and the programming practices that surround it. You will still need to teach yourself statistics, but that is another, and much bigger, problem.

 

To keep up with our coverage of books for programmers, follow @bookwatchiprog on Twitter or subscribe to I Programmer's Books RSS feed for each day's new addition to Book Watch and for new reviews.

Banner


Programming Rust

Author: Jim Blandy and Jason Orendorff
Publisher: O'Reilly
Date: Aug 2016
Pages: 400
ISBN: 978-1491927281
Print: 1491927283
Kindle: B077NSY211
Audience: Systems programmers
Rating: 4
Reviewer: Mike James
Rust - it's a hit language of the moment. The language we all love to love. So what could be bet [ ... ]



Google BigQuery: The Definitive Guide

Author: Valliappa Lakshmanan and Jordan Tigani
Publisher: O'Reilly
Pages: 498
ISBN: 978-1492044468
Print: 1492044466
Kindle: B07ZHQ3MGN
Audience: Developers wanting to use BigQuery
Rating: 5
Reviewer: Kay Ewbank

Google BigQuery is a distributed, serverless SQL engine that provides a way to query pet [ ... ]


More Reviews

Last Updated ( Tuesday, 30 July 2019 )