Charles Wheelan’s Naked Statistics is an insightful book laced with college-style humor as indicated by the soft porn book cover. Read this book if you want to understand the concepts behind statistics without having to mine a text book. The book is a quick read at only 250 pages, much of it skimmable. It is especially valuable for digital analytics professionals and marketing executives who want to understand more about data science predictions which are essentially statistically-based “guesstimates”.
Readability: 4 out of 5 stars. The text is very often at the level of Time Magazine which is to say about 7th grade. Lots of stories, straightforward sentences, very few footnotes or reference notes per chapter, but a nice section at the end of each chapter which dives into more mathematical detail, should you desire it. There are a few chapters where the book slogs through excess detail and over used examples.
Impact: 4 out of 5 stars. If you read this book, it will change the way you hear and interpret statistics in the world-at-large as well as your specific industry. You will also feel smarter (what a bonus!). The potential impact is medium-high.
Speed read pattern: You really have to read it from start to finish because each chapter builds on the last. However, once you get the point, I recommend skimming over the examples which cuts the reading time down. Skip the conclusion. It does not summarize the book but instead provides feel-good examples about making the world a better place with statistics. Do not skip the statistical tools review. This is the clearest tool comparison I’ve seen.
The book has 13 chapters. My summary notes are included below:
1 – What’s the Point – This chapter introduces statistics as a way to summarize or simplify the data around us. Example: stock market index. He compares data as a crime scene where we might want to “capture everything” and the statistics are the detective work that comes to a meaningful answer.
2 – Descriptive Stats – Statistical vocabulary is introduced here (mean, median, standard deviation, quartiles, relative and absolute numbers). If you always wanted to know what those funny symbols meant, this chapter will tell you.
3 – Deceptive Description – Examples of how statistics go awry are included here. There is a specific call out on scorecards. When scorecard success contains financial incentives be on the lookout for manipulation.
4 – Correlation – The correlation coefficient is easily understood in this chapter (-1 to 0 to 1). Zero is no correlation. -1 is a perfect negative correlation (e.g. what you ate and the movement of the stock market). +1 is a perfect positive correlation (e.g. what you ate and how much you weigh).
5 – Basic Probability – In infamous coin flipping begins here, though the password examples are compelling. This also contains a good financial references about how to calculate the payout (or expected value) of a change. If you do any web testing, this is very helpful to communicate impact. The law of large numbers (a.k.a. “why casinos always win”) is also introduced here.
5 1/2 – The Monty Hall Problem – BEST CHAPTER IN THE BOOK. Highly entertaining game show illustration about whether it’s better to keep your original guess or switch when conditions change.
6 – Problems with Probability – This crosses over with Nate Silver’s Signal and Noise book a bit. The discussion covers the financial crises and the use of “Value at Risk.” It’s a good reminder to lift your head up and see the broader picture. For web analysts, this means visiting the website once in awhile. 🙂 Independent and dependant variables as well as clusters are covered. A fascinating discussion about reversion to the mean ends the chapter. This theory illustrates why companies that are featured on Business Week or teams featured on Sports Illustrated routinely tank afterward.
7 – The Importance of Data – Here we learn how sampling 1000 people can project actual patterns of 1 million. This chapter also lays out five types of sampling bias that could affect the results, but the end result is a firmer belief in the statistical soundness of sampling.
8 – The Central Limit Theorem – At this point the learning curve gets steeper. There are more equations used to explain concepts that rely on previous concepts. This is where standard error and the interpretation of it is covered. The larger the sample, the more closely it will approximate normal distribution. This allows us to make inferences about the data in general (e.g. which candidate will win an election).
9 – Inference – An unlikely pattern in the data is just that, until supported with more evidence. If you have ever been suspected of cheating by a professor, you will want to read this chapter. The importance of forming and rejecting a hypothesis is covered. There is no standard for statistical significance. It is a flexible target often found between .05 (95% confidence) and .01 (99% confidence). This is a good chapter to read if people ever ask you if they can trust your numbers.
10 – Polling – Polls are a twist on the previous theorem because we are now looking for a percentage, not the mean and deviations from it. The importance of response rate to any poll is covered.
11 – Regression Analysis – Regression analysis is the best tool for finding meaning in complex data sets. It’s basically the line you see in a scatter plot chart that shows the association between the X and Y axes. This is the SECOND BEST CHAPTER IN THE BOOK a MUST for digital marketers. The chapter is a bit mathematical so I recommend reading the earlier chapters first. Also be sure to catch the t-distribution discussion in the appendix.
12 – Common Regression Mistakes – Finding a connection in the data does not mean there are not other risks. An example is the mass prescription of estrogen to older women in the 1990s where a later clinical trial discovered significant health risks. The New York Times Magazine estimated tens of thousands of women died prematurely. Regression analysis is a powerful – and potentially dangerous – statistical tool. This chapter outlines seven common mistakes from non-linear relationships to too many variables.
13 – Program Evaluation – This chapter covers how a good test is constructed to determine cause and effect. It leaves you with a very good appreciation of how HARD it is to cleanly isolate these two. Some elegant examples are covered including how natural events can make good experimental models.
Conclusion – Five examples of how statistics is making the world a better place.
Appendix – Statistical software – Excel, SAS, R, and SPSS are reviewed.