Author: Conrad Carlberg
Audience: Excel users needing to analyze data
Reviewer: Kay Ewbank
Predictive analysis combines stats and quantitative analysis. Is this a good way into an emerging field.
Microsoft Excel is increasingly being touted as a tool for analyzing data in a structured way. It’s the application Microsoft expects business users to use for Business Intelligence work, with tools such as PowerPivot providing ways to view and work with data drawn from large data sets. This book looks at another area of data analysis with Excel; predictive analytics.
As Carlberg points out, this is a term that started as a way to marry stats and quantitative analysis, and has now elbowed its way into the jargon. So predictive analytics, so far as this book is concerned, is the analysis of data such as web traffic, and finding ways to forecast what will happen based on current trends.
Carlberg takes a different analytical technique in each of the chapters in the book, using an Excel workbook that you can download from the Que website to show you what he’s talking about. The book kicks off with a chapter on how to gather data from sites you don’t own (such as Amazon) for your own analysis. The data gathering relies on some VBA routines that are included in the samples, and Carlberg’s code puts together a useful data gathering tool, with good advice on how to plan the structure of your data analysis workbook.
Having grabbed your data, Carlberg works through a number of possible analysis techniques, starting with linear regression. A topic title like that could well take you straight back to lectures where you spent your time doodling in your notebook and not really following what’s going on. Carlberg makes the subject more relevant with good examples and, as elsewhere in the book, shows how Excel and its various add-ons can be used; in this case, the tool being the regression tool from the Data Analysis add-in.
Chapter 3 looks at forecasting with moving averages, and shows how the simple idea of a moving average forms the basis of more sophisticated analyses. Chapter 4 looks at the use of smoothing and tracking, how to use Excel’s exponential smoothing tool, how to set up the smoothing, and the technique called Holt’s linear exponential smoothing. It doesn’t sound like a riveting read, but in fact as in the rest of this book, Carlberg manages to keep the coverage understandable with good examples and clear explanations.
Chapter 5 covers Regression for forecasting a time series using linear regression and auto-regression, and throws in ARIMA Box-Jenkins techniques. By Chapter 6, Carlberg has moved on to logistic regression, and having taken you through the whys and wherefores that make this useful, follows on in Chapter 7 with an example showing how to use it to predict purchasing behavior. He then moves on to compare using Excel for logistic regression with doing the same thing using the freeware statistics package R. Chapter 9 looks at principal components analysis, explaining why it had to be invented - to arrive at a manageable set of variables for analysis - and what the difference is between principal components analysis and factor analysis. The final two chapters cover ARIMA Box-Jenkins in more detail, and Varimax factor rotation.
Just what you’ll make of this book depends on your background. If you know statistics, you’ll probably write it off as suitable for use by students. Experts in stats aren’t the target for this book, though. It’s aimed at people who need to analyze data in Excel, and don’t really know what the different techniques can do and when they should be used. For this audience, Carlberg’s clear descriptions and good examples will fit the bill very well.