**Show Transcript**

Allison Hartsoe: 00:01 This is the Customer Equity Accelerator. If you are a marketing executive who wants to deliver bottom-line impact by identifying and connecting with revenue generating customers, then this is the show for you. I'm your host, Allison Hartsoe, CEO of Ambition Data. Each week I bring you the leaders behind the customer-centric revolution who share their expert advice. Are you ready to accelerate? Then let's go! Welcome everybody. Today's show is part of a summer reading series and one of my most popular blog posts. It is the summary of the book Naked Statistics by Charles Wheelan and at the end of this podcast what I want you to feel is either you've read the book, and you don't need to spend the time reading it, or you should feel like you want to dive into a very specific part of the book. So either way time wise it's a win-win.

Allison Hartsoe: 00:58 Now it's true that when you put the word naked in front of just about anything, it becomes very exciting and Charles Wheelan's Naked Statistics book is an insightful book. It's laced with a lot of college style humor as indicated by its, I'll call it a soft porn book cover, just as a very interesting histogram, put it that way. So when you read this book, it's for the purpose of understanding the concepts behind statistics without having to mind what might be a very dry textbook, and the book is a quick read. It's only about 250 pages. Much of it is very skimmable. I think it's especially valuable for people who started as digital analytics professionals or marketing executives who want to know more about the data science predictions, maybe the right tools that are being used to create those data science predictions and just want to be a little bit more informed and ask perhaps more targeted questions, more intelligent questions about the information they're receiving.

Allison Hartsoe: 01:57 Now, the readability here, I say it's about four out of five stars, and that means the text is typically at the level of Time magazine, which is to say about seventh grade. There are lots of stories which makes it very palatable. They're very straightforward sentences. There's only a few footnotes and references per chapter, but there's a nice section at the end of each chapter which dives into more of mathematical detail if you really want to go there. There are a few chapters where the book slogs along through perhaps some inescapable excessive detail and maybe some examples that just keep coming back again and again, but that does help you kind of follow what he's doing with the math, so you don't have to get used to a new example each time. My reading for impact is about four out of five stars, so if you read this book, it will change the way you hear and interpret statistics in the world at large as well as in your specific industry.

Allison Hartsoe: 02:53 You'll probably feel smarter, but the longterm impact you, or is it going to change the world? Is it going to change your whole outlook if you read this book? It's going to make you more informed about perhaps things that you do in your everyday job. I also include in my book reviews is a speed read pattern. That's because I think that if you do read a book, it's helpful. If someone else has gone there before to know the best way to consume the content in the book, so the speed read pattern I recommend in this one is it's actually more start to finish because the chapter, each chapter builds on the last. However, once you get to the point, I recommend just skimming over the examples which cuts the reading time down. So if you don't need to go into exhaustive detail about height versus weight and you understand the basics of what he's saying, just move along.

Allison Hartsoe: 03:40 It does not summarize the book in the conclusion, but it provides a bunch of feel-good examples about how you can make the world a better place with statistics, that's okay to skip, but I would not skip the tool comparison at the very end. I think that is helpful. So let's go into the chapters. The book is divided into 13 chapters. We won't go in depth into every chapter, and there aren't any major chunks within the book. But in general, he starts at the beginning with the fundamentals of statistics and some basic statistical vocabulary. So in the very beginning, the chapter introduces statistics about as a way to summarize or simplify data around us. He talks about the stock market index. He compares data as a crime scene where we might want to capture everything, and the statistics are the detective work. That helps us come to a more meaningful answer.

Allison Hartsoe: 04:34 Similarly, in the second chapter, the statistical vocabulary is introduced such as mean median, standard deviation, core tiles, relative and absolute numbers and if you always wanted to know what those funny symbols meant, then this chapter will tell you. In the third chapter, he starts picking up steam, and he talks about deceptive description, which is an example of or many examples of how statistics go awry, and there's a specific call out here on scorecards that I think is worth your attention. He starts with an example of Senator Joe McCarthy in the 50s, and I'll read you this passage. He says during a speech in Wheeling, West Virginia, McCarthy waved in the air a piece of paper and declared, I have in my hand a list of 205 a list of names that were made known to the secretary of state as being members of the Communist Party who nonetheless, we're still working in shaping policy in the State Department.

Allison Hartsoe: 05:31 Well, it turns out that the paper actually had no names on it at all, but the specificity of the charge gave it credibility despite the fact that it was a bald-faced lie. So in the chapter on deceptive description, Wheelan's point is that when we are more precise, sometimes it is easier to believe what people are saying, even if what they say is incorrect. So he cautions us to be careful with statistics. And from this point, he goes on into scorecards, which are an excellent building block to that point. Wheelan makes the point that people respond to incentives, and scorecards tend to surface the things that drive those incentives. And he says, but you better be darn certain that the folks being evaluated can't make themselves look better statistically in ways that are not consistent with the goal at hand. And to stick this example, he talks about the state of New York that introduced scorecards to evaluate the mortality rates of patients, of cardiologists performing a specific type of surgery which was related to heart disease.

Allison Hartsoe: 06:38 So what happened is the state started putting all the doctors on a scorecard and saying, well, here's your mortality rate and are you a good doctor or a bad doctor based on your mortality rate? I think it's safe to assume that every doctor is not trying to kill people, that they're doing their best to do their job. But as a result of this scorecard, it provided an incentive for doctors not to operate on or review the sickest cases, the sickest patients. So this is a great example of a scorecard which seems to be driving a positive outcome, actually not driving a positive outcome because people who were higher risk cases were less attractive to doctors once they knew their mortality rate was being scored and judged. Another example is the US news and world report rankings, which are so famous and popular. This is, of course, a massive index of colleges.

Allison Hartsoe: 07:36 They do rank many things, but the one he refers to in the book is the college rankings with 16 indicators at the time or presenting various weights on the total. The interesting point about these rankings is the judge, a lot of aspects about how competitive is it to get in, what were the SAT and ACT scores and did students get a job at the end? Those are good things to judge, but what it doesn't help us learn is whether the education they actually received during those four years improve their talents or enrich their knowledge, and people love easy answers. He quotes a president of one college and scorecards often requires simplification or aggregation of complex behaviors, and even though that is an easy answer, it gives us a nice number to look at. It is. Oftentimes it's a dangerous thing to do because scorecards, especially when they contained financial incentives are easy to manipulate, and they may be driving second and third order operations that we did not count on.

Allison Hartsoe: 08:39 So Hi cautions on scorecards. The next chapters start going into more detail with statistics. Chapter four talks about correlation. The correlation coefficient is really easily understood in the chapter. Basically, zero is no correlation. Negative one is a perfect negative correlation. For example, what you ate to the movement of the stock market and plus one is a perfect positive correlation such as what you ate and how much you weigh. Then he moves on to basic probability, and then this is where the infamous coin flipping examples begin. This chapter also contains good financial references about how to calculate the payout or expected value of a change, and if you do lab testing, it's a very helpful place to learn to communicate impact the law of large numbers or why casinos always win is also introduced here. In chapter five and a half, which is probably one of the most entertaining chapters in the book.

Allison Hartsoe: 09:37 He talks about the Monty Hall problem. This is the idea, and you can also catch this scene in the movie 21 with Kevin Spacey came out years ago and it's basically the idea that if you've got three doors on a game show and somebody asked you to pick a door, and then they reveal what's behind one door, do you switch your guests or do you stay? And it's perfectly illustrated with statistics because the statistical likelihood that your original guests is correct has actually just gone up if what's revealed behind one of the doors is like a goat instead of the car. So this is the Monty Hall problem, but people tend to say, Oh, maybe I'm wrong and might have a tendency to switch. So that's obviously a fun chapter. And from there he moves on to the problems with probability. And this actually crosses over with Nate Silver's book, the signal and the noise, which I also have in the blog posts.

Allison Hartsoe: 10:36 And I'll cross-link that in the show notes. But in this particular section, I love how he covers the financial crisis of years ago and the use of value at risk. It's just a really, really good reminder to lift your head up and see the broader picture. For Web analysts, this means you kind of look at the website once in a while, and he starts covering independent and dependent variables here as well as clusters. Often great tools that we use in analysis. There's a brief discussion here as well about the reversion to the means, which is a theory that illustrates why companies that are featured in business week or teams featured in sports illustrated tend to routinely tank afterwards, and this is a concept that he pushes through the book, which is to say that basically if there's a blip in the data, suddenly the data is much higher than it was before.

Allison Hartsoe: 11:29 This team is really hot, this company's really hot, that it might be just a temporary blip in the data. Hence the reversion to the mean. I think that's an interesting idea and something that we never take into account. When we're looking at success, we want to say, oh, everything we did pause the success. It's hard to understand that might just be a fluke, but hey, that's the reason we do testing, right? Chapter Seven and eight start to go into more deep examples. The importance of data as chapter seven and here is where he talks about how sampling a thousand people in a project can actually predict the patterns of a million. And that, of course, works really well with the next chapter, which is all about the central limit theorem. The central limit theorem basically says that in repeated samples, the difference between two means will be distributed roughly as normal distribution.

Allison Hartsoe: 12:21 So the idea is that if I do a poll and I do another poll, the difference between the means of those two poles should fall into a normal distribution. So that helps us understand that even if we have a small sample for a large volume of people that were still in the Ballpark, that our answers are still generally correct. This is basically what allows polling and all the political theory to stand. Although from the last election, I'm not sure how I'm not sure how valid that is. I'm sure the Theorem is still valid, but the methods for which they collect the data could probably be improved. I think there was a whole blog post about the validity of polling techniques over at 528 which is Nate Silver's site after the last election. So anyway, in chapter nine he starts moving to in France, which is the confidence interval that you see in your data at all, interlocks very nicely and he makes the point that there is no standard for statistical significance.

Allison Hartsoe: 13:22 And I think this is interesting because we often treat like 0.5 which is 95% confidence or 0.01 which is 99% confidence as a standard. And it's true there really is no gold standard, although we tend to use those often. Try to patch this in. I'm going to just end the sentence instead of rat holing. It's true. There really is no standard. Next, he moves on to polling and polls are obviously a twist on the previous theorem because we're now looking for a percentage, not the means and the deviations from it. The importance of the response rate to any poll is also covered here. And then he gets into the really awesome stuff. So at chapter 11, he starts talking about regression analysis, and this is just such a great tool for finding meaning and complex data sets. It's an awesome tool. It's like a nuclear bomb if you use it incorrectly, it's powerful and dangerous.

Allison Hartsoe: 14:18 So let's talk a little bit about regression analysis. This is basically the line. You see a scatter plot chart that shows the association between the X and Y-axis. It is subject to the fact that the data has to be representable by a linear relationship and that what you pick for the x and y-axes are actually related. And this chapter indeed gets a bit mathematical. So unless you've got the grounding, you probably should read the previous chapters first before you get into this chapter. But the chapter is really well illustrated. For example, he starts with, can stress on the job kill you. What a great hack. Yeah. Who wouldn't want to know that? Yes, but it relates to the degree of control or autonomy in the job, and so through the examples, he unpacks different types of autonomy, different ways that this regression analysis cause helps them understand the connection between stress and types of stress and jobs and types of jobs.

Allison Hartsoe: 15:19 He says quote, regression analysis has the amazing capacity to isolate a statistical relationship that we care about that just job control and heart disease while taking into account other factors that might confuse the relationship at its core. Regression analysis seeks to find the best fit for a linear relationship between two variables and that's where I'm going to take you back to your math days here, but that's where ordinary least squares give us the best description for that linear relationship between two variables and I won't get too mathy here, but it's basically y equals A plus BX where B is what is called the regression coefficient. That one number is packed with a whole bunch of information. If it's positive or negative, then it tells you the direction of the association. So for example, if you run in an analysis of height and weight and the height end up with a plus in front of it, then we can say taller people tend to weigh more.

Allison Hartsoe: 16:13 Or if it was a negative, then we would say oddly that taller people tend to weigh less. It can also tell you about magnitude because you can then deduce for every one inch in height and increase a 4.5 pounds is seen and it can help you understand significance because in the observed set of data you can compare this regression coefficient to other sets of data in order to determine if you're in the right ballpark. So there's a high proportion of all important research, especially in social sciences that have been done based on regression analysis. Now interestingly, I digress here for a little bit. Sandy Pentland over at MIT did the social physics book, which takes all of that research up to another level. That is a fantastic book. I would like to review as part of my summer series and, and I'll just plan to do that.

Allison Hartsoe: 17:04 But for the past half-century, basic regression analysis is what is holding up a lot of our social science assumptions. Now the next chapter talks about what could possibly go wrong. Well, I alluded to a couple of things at the very beginning, but basically finding a connection and the data does not mean that there are not other risks. An example he uses in the chapter is the mass prescription of estrogen to older women in the 1990s. A later clinical trial discovered significant health risks based on the way the model was put together. And as a result, the New York Times magazine estimated that tens of thousands of women died prematurely. So what a powerful tool and what a dangerous tool. This chapter outlines a bunch of mistakes that happen when or can happen when we look at regression analysis. I'm gonna run through these because they're really important. And as analysts, we need to keep this in mind when we're making judgments about what is in the data and what people should do with the data.

Allison Hartsoe: 18:08 So number one, using regression to analyze a nonlinear relationship. U shaped curves are an example. Another example is where there's no consistent relationship is I spend money on golf lessons, and my average score is all over the place. Number two, correlations do not equal causation. We talk about this all the time, and there's a fun site called spurious correlations, but basically, this is the idea that because autism is rising and the rates of vaccination arising does one actually caused the other. That has been a huge debate. Number three is reverse causality. Will taking golf lessons improve my golf score, my golf score doesn't go up, but is it actually the lessons? So that's an example where you can't necessarily reverse causality. Number four is omitted variable bias. So, for example, golfers are prone to heart disease, cancer, and arthritis. Well, if you admit the age variable, then you discover that it's not Gulf killing people.

Allison Hartsoe: 19:09 It's actually age. Number five is highly correlated explanatory variables. That means that two or more explanatory variables are intertwined with each other and they're highly correlated, and it's not easy to unpack. So for example, if you were trying to understand cocaine and heroin use on SAT scores, but the majority of people in your sample were using the different products at different times, then that would be very difficult to unpack. Was it an effective cocaine? Was it an effective heroin? They're basically on top of each other. Obviously, he's picking on the college humor, I dunno how much humor there is in cocaine and heroin versus SAT scores, but the author does to a good job of picking up creative examples. I'll give him that. Number six is extrapolating beyond the data. For example, height and weight analysis cannot predict the weight of a 21-inch newborn by the data.

Allison Hartsoe: 20:05 She should weigh a negative 19.6 pounds, but she actually weighs 8.5 pounds. So the conclusions that we make should be tight and specific to constrain under what circumstances they apply. So, for example, people age two and higher height and weight. Number seven and the last caution is data mining too many variables, and this is the idea that yeah, if you throw enough junk in the hopper, one is bound to meet statistical significance just on the threshold by chance. I think that is really the, it gets back to spurious correlations, and that is what makes that site so funny as they keep throwing a whole bunch of junk in there and coming up with interesting correlations. Hey, it's statistically significant, but this is where common sense comes into play. You just have to use your common sense. Does this actually make sense? A perfect example of this is the caused the financial crisis and this was the value at risk index where people kept going higher and higher and higher with the value at risk and no one really stopped to say, well, in order to get that money at work in the market, we're actually creating loans that we give to people who really can't afford it, who are common sense wise, not good risks.

Allison Hartsoe: 21:23 So that's an example where your own common sense, your own human judgment still has to override what our statistics tell us and finally at the end he talks about program evaluation, which is some very good examples of when it can be hard to cleanly isolate different variables to elegant examples are covered, talking about how to make good experimental models. And in the conclusion, he gives five examples of how statistics are making the world a better place and then he covers statistical software packages such as excel SASR and SPSS and talks about the pros and cons between the different tools. Now, that was a long way to talk about statistics, so hopefully, you found that useful. I know I found it useful just to go back and say, Oh yes, this is something else that we should be thinking about. But if you want to talk more about the subject or perhaps developed great hypotheses to test, then you can give me a call or reach out to alison@ambitiondata.com or at a hard sell on Twitter, or you can also find me on LinkedIn.

Allison Hartsoe: 22:26 Being able to run tests is one of the keys to staying in tune with your customer base. It's what the fast-moving companies are so good at, and it is highly valuable as the speed of business continues to climb. So give us a shout if you are struggling in this area, and you're looking for some support. As always, links to everything I have discussed, are at ambitiondata.com/podcast. Thank you for joining me today and remember when you use your data effectively, you can build customer equity. It is not magic. It's just a very specific journey that you can follow to get results.

Allison Hartsoe: 23:02 Thank you for joining today's show. This is your host, Allison Hartsoe, and I have two gifts for you. First, I've written a guide for the customer centric Cmo, which contains some of the best ideas from this podcast, and you can receive it right now. Simply text, ambitiondata, one word to, three, one, nine, nine, six, (31996) and after you get that white paper, you'll have the option for the second gift, which is to receive The Signal. Once a month. I put together a list of three to five things I've seen that represent customer equity signal not noise, and believe me, there's a lot of noise out there. Things I include could be smart tools. I've run across, articles I've shared cool statistics, or people and companies I think are making amazing progress as they build customer equity. I hope you enjoy the CMO guide and The Signal. See you next week on the Customer Equity Accelerator.

**Key Concepts: **Customer Lifetime Value, Marketing, Digital Data, Customer Centricity, Long-Term Customer Value, Marketing Leaders, Analytics, Creativity, Product Development, Audience Research

**Who Should Listen: **CAOs, CCOs, CSOs, CDOs, Digital Marketers, Business Analysts, C-suite professionals, Entrepreneurs, eCommerce, Data Scientists, Analysts, CMOs, Customer Insights Leaders, CX Analysts, Data Services Leaders, Data Insights Leaders, SVPs or VPs of Marketing or Digital Marketing, SVPs or VPs of Customer Success, Customer Advocates, Product Managers, Product Developers