SQL Server 2017 Machine Learning Services with R
Article Index
SQL Server 2017 Machine Learning Services with R
Chapters 3 - 6
Chapters 7 - 10, Conclusion

Author: Tomaz Kastrun and Julie Koesmarno
Publisher: Packt Publishing
Pages: 338
ISBN: 978-1787283572
Print: 1787283577
Kindle: B077Z9PV4F
Audience: DBAs and data scientists
Rating: 3.0
Reviewer: Ian Stirk 

This book, with the subtitle, Data exploration, modeling, and advanced analytics, aims to introduce you to using R with SQL Server, how does it fare? 

R is a popular tool for Machine Learning and statistical analysis, so it makes sense to investigate the recent incorporation of R within SQL Server. This book covers the whole Software Lifecycle usage of R with SQL Server (installation, development, deployment, maintenance etc), providing practical examples throughout. It is targeted primarily at data professionals, DBAs, and data scientists. It assumes you have little or no experience of R, and only a little understanding of SQL Server. 

This book mainly focuses on SQL Server 2017, but much of it is applicable to SQL Server 2016. Similarly, much of the background information is applicable to SQL Server 2017’s other Machine Learning language Python. 

Below is a chapter-by-chapter exploration of the topics covered. 

Chapter 1 Introduction to R and SQL Server

The book opens with a look at how R was used prior to SQL Server 2016. Used from the 1980s, R had the advantages of being open source, easy to install, extensible, with little initial competition from other free statistical software. 

While Microsoft provided statistical and predictive functionality with its SQL Server Analysis Services (SSAS), there was demand for additional functionality. Various workarounds to get R functionality applied to SQL Server data were tried, producing useful results, but they generally had limitations (e.g. security, memory, performance). Microsoft’s move towards becoming open source friendly included the incorporation of R into SQL Server 2016 (and Power BI). The importance of having the ability to access a single shared source of data, and its corollary impact on performance and making business decision is noted.

The chapter ends with a useful discussion on how R can be used by DBAs, for example, instead of just monitoring the current situation, R can be used to make predictions (e.g. Capacity Planning).

This chapter is well written, easy to read, with informative discussions, a good flow between the topics, and useful links for further material. These traits generally apply to the whole of the book.

Chapter 2 Overview of Microsoft Machine Learning Server and SQL Server

Incorporating R into IT departments highlighted various barriers, including: lack of knowledge, complex architecture, and siloed workers – each of these is discussed with their potential solutions. There’s a short discussion on how Microsoft is embracing the R language. It’s noted that basic R functionality is included in all editions of SQL Server 2017, but advanced functionality (e.g. full parallelism) is only available in the Enterprise edition.

Next, the various types of Microsoft R offering are briefly examined. From the community Microsoft R Open to Microsoft Machine Learning R Server. The latter can process large datasets in parallel distributed across various nodes, on various platforms (e.g. Windows, Hadoop etc). The name changes to the various R offerings in SQL Server 2017, to Machine Learning Services is noted - this was done so Python could be included under the Machine Learning Services umbrella.

The various products in Microsoft’s R platform are briefly examined, namely: 

  • Microsoft R Open (MRO) – free, backward compatible, memory bound

  • Microsoft R Client – like MRO but includes parallelism and multi-threading

  • Microsoft Machine Learning R Server – standalone server for heavy duty R processing

  • Microsoft SQL Server Machine Learning R Services – integrated into the database. This will be this book’s focus

  • R Tools for Visual Studio (RTVS) – addin R editor for Visual Studio 

The chapter ends with an architectural look at how various components (R IDE, SQL Server, and R Engine) communicate when fulfilling a R query from within SQL Server. There’s a brief look at some of R’s limitations (e.g. performance, memory, security), and how Microsoft’s RevoScaleR package and database roles can address these.

This chapter provides a useful overview of the various R products, however they could have been introduced more clearly. Some code is introduced, that uses ‘sp_execute_external_script’ before instructing that the database setting ‘external scripts enabled’ should be enabled - this is described in chapter 3. Code is introduced without explanation, i.e. it assumes you are already familiar with R – which is at odds with the introduction which states ‘This book is for data analysts... with some or no experience in R…’

Last Updated ( Tuesday, 02 July 2019 )