Author: Conrad Carlberg
Publisher: Que, 2011
Aimed at: Students, researchers, professionals who want to use Excel for statistical analysis
Pros: A well-presented introduction to statistics with lots of useful Excel examples
Cons: Restricted to classical linear parametric stats.
Reviewed by: Janet Swift
Excel has made both number crunching and data analysis more accessible. But with statistics it's important you know how and why as well as churning out results.
The introduction to this suggests it is suitable for those with little background in statistics. It also reveals that it can be used with any edition of Excel from 97 onwards - but while you don't need to be an Excel expert it doesn't teach you the basics of Excel - but there are plenty of other books for that plus online help on the Microsoft site.
Supporting material for this book is maintained on its own web page. On the Update tab you'll find an Errata list. It also acknowledges the fact that the figures are not as easy to read as the author would have liked. He does however have a solution:
There is a workbook for each chapter, and in each workbook you'll find a worksheet associated with each of that chapter's figures. I think you'll find it easier to see what's going on in the Excel worksheets than in the book's figures, and you'll also find it easier to follow the discussion by examining the contents of different worksheet cells on your own computer.
So even if you prefer to work through examples unaided, there's good reason to click on the links in the Download tab.
As a course in Statistics this book starts from the basics and goes quite a long way.
Chapter 1 has the title "About Variables and Values" and in the context of statistics a "variable" is an attribute or property that describes a person or a thing, such as surname, and points out that database jargon refers to variables as fields. It goes on to look at scales of measurement - category scales (also known as nominal scales); and numerical scales - and then the problems in distinguishing text values from interval values. Then comes a quick section on charting numeric variable and by the end of the chapter we are introduced to frequency distributions.
Chapter 2 is about measures of central tendency and uses Excel to calculate the mean, the median and the mode of a group of observations. Chapter 3, with the title "Variability: How Values Disperse" is where we meet standard deviation and variance with a discussion that looks at concepts as well as looking at how to calculate them before introducing Excel's functions. There also a look at bias in the standard deviation, a topic that is discussed again later in the book and the concept of degrees of freedom is introduced.
Chapter 4 is on correlation - first the concept and how to calculate it and then we get to Excel and to the Analysis ToolPak referred to by its Excel 2010 name, Data Analysis add-in. Discussion continues to be interspersed with practical examples with the section "Correlations isn't Causation" being an important example. The latter part of the chapter covers multiple regression.
Chapter 5 returns to looking at nominal variables and moves very quickly to an Excel topic - pivot tables - before getting on to statistical tests and the null hypothesis. It goes into issues of interpretation of results and setting decision rules. A boxout tells us that the author's approach is frequentist, rather than Bayesian, but notes that this controversy is "well beyond the scope of this book". Next comes a good discussion of two fundamental assumptions that have to be met - random selection and independent selection. Shortly afterwards there a good summary of the steps of hypothesis testing. The second paret of te chapter is on two-way pivot tables and the Chi-Square test and distribution and the relevant Excel functions.
Chapter 6 starts with a complaint about the way that the Data Analysis add-in makes things difficult for the statistician by returning results as vales rather than as formulas that could be explored to show what is going on. A main topic of the chapter is "internal validity"; the idea that experimental rescuers needs to be designed so that we can have confidence in its results. Carlberg goes through the factors that can compromise internal validity. After this he gives us a through exploration of the F-test and the tool provided for it by the Excel add-in, pointing out the shortcomings in the latter.
Chapter 7 returns to the Normal Distribution and introduces "skewness".and confidence intervals. The relevant Excel functions are introduced as well as thorough discussion of the concepts. The chapter concludes with the Central Limit Theorem.
The next three chapters are on Testing Differences Between Means. Chapter 8, which covers z-tests and t-tests opens by looking at the reasons for testing the differences between means. This chapter makes good use of charts to illustrate the points it makes, explaining:
You can teach yourself quite a bit about both the nature of a statistical test and about the data that plays into that test, by charting the data.
Chapter 9 is a detailed and discursive chapter that looks at three t-test approaches to investigating differences between means provided by Excel and the considerations associated with each of them. Then Chapter 10 looks at Analysis of Variance (ANOVA) and the F test as an alternative to t-tests. You'll notice that this is pitched at a level that is pretty advanced - but the theoretical discussion continues to be readable and authoritative. There's a worksheet for the Scheffé Procedure and the chapter concludes with a discussion of planned orthogonal contrasts.
Chapter 11 goes into further issues concerning analysis of variance starting with Factorial ANOVA including the Data Analysis add-in Two Factor with Replication tool. Next comes a discussion on interaction - the way factors operate jointly, then we look at the problem of unequal group size and considers the Two Factor without Replication tool. It concludes with a section on the limitations on the ANOVA tools in Excel, suggesting that in some circumstances you need different software.
The final four chapters comprise two on Multiple Regressional Analysis and two on Analysis of Covariance and in each case there's a chapter on the basics followed by one on Further Issues. These have the same balance of theory and practice using Excel established in earlier chapters and build on the ANOVA section.
Overall this is a readable and authoritative introduction to statistical analysis that makes highly intelligent, and where necessary critical, use of Excel. Like the author, I wish Excel had been available while I was engaged in research and now that it is this book will help researchers and business analysts in both the theory and practice of statistics.