Tags

, , , , , , , , , ,

Welcome to my first book review.

I want to give credit to Keith DeCandido of Tor’s Star Trek Rewatch for my inspiration on how to organize and breakdown my reviews. For fiction books, the breakdown will be as follows:

  • Info Dump: The plot and synapsis. Short author profile.
  • Spine Shivering: The moments that transcended the page.
  • Key Players: Who are the important characters, not just main characters.
  • Ebb and Flow: Weaknesses and strengths of the story. How did it all work together?
  • Recommendation: Do I recommend this book?

When I read a non-fiction book, the breakdown will depend on the book:

  • Premise: What does this book try to say?
  • Historical Significance: A short profile of our author.
  • Breakdown: Detailed analysis of books.
  • Recommendation: Should you read this?

This week, I had been  intending  to review the Signal and the Noise [S&N], a book in statical analysis, by Nate Silver. For reasons which essentially amount to S&N being vastly more complicated than I originally understood, I am splitting the book review in two. Part one will examine Chapters 01-07, part two will follow Chapter 08 till the book finishes.

Premise: While I can’t talk about the latter half of the book for certain, I certainly understand what the first half is aiming for. At the end of chapter 07 Nate Silver says “The first half of this book has largely been concerned with where [approximations] have been serving us well and where they’ve been failing us.” This is the half that I am examining.

Silver is attempting to rationally look at data collection and analysis without human bias. Failing that, as he assumes we will, at least acknowledge this short coming and factor said failure into our understanding of the data.

Historical Significance: Nate Silver is the founder of the FiveThirtyEight blog. As you might suspect, 538 is the number of electoral votes in the United States Congress. FiveThirtyEight is dedicated to taking all available information, from polling to demographics to historical results to determine the result of every electoral election in the country. Silver also developed PECOTA, a sabermetric system designed to predict the performance of MLB players.

His work was brought to my attention during the 2008 election but quickly forgotten once Obama won. My second encounter with his work was his appearance on the now defunct baseball podcast, Baseball Today, hosted by baseball writer and scout Keith Law. Based on my great respect for Klaw’s work, I looked into Silver’s book.

So, how’s that book looking?

Breakdown:

  • Introduction: The introduction gives the reader the basic premise for the book. Here, it explains how information, thus data, always exists but our ability to find said informaiton was limited for most of human existence. Silver states this changed with the invention of the printing press, showing the rapid increase of data consumption. The problem, Silver states, is that this increase of data is often mistaken accuracy. In fact, the increased data reproduction and sharing also exponentially increased the error associated with the data. If anyone’s looked at Wikipedia or college  papers, they can attest to this fact. In fact, the introduction reads like a well thought out, well reasoned thesis paper.

This is a solid introduction. I like it. By laying the basic foundation for the premise, Silver quietly sets up the book. Every point he later explores here are explained in greater detail later. Each chapter follows a specific theme as they examine different looks at the errors in understanding humanity’s collected data.

  • A Catastrophic Failure of Prediction: Chapter 01 uses the housing market bubble and the 2004 collapse. Here, Silver asserts, the amount of money spent on the market was unsustainable. The rating system for lenders, which spurred the market on was flawed, based on an inaccurate understanding of statistics and perceived value. For instance, when compensated for inflation, the worth of a house, purchased in 1890 for $10,000, is, in 2010, merely 10,600. That’s an increase in value of 6%, over 120 years. Meanwhile, most people assume their house appreciates at a value closer to 13% annually.

The housing market and, by extension, the economy Silver argues, is based off poor predictions. We assume what’s going to happen in the future based on past events. The problem is that we have no properly understood past events  or their own context, thus magnifying the flaws inherent in the prediction.

Misconception of data and antiquated notions of economics fuel the housing market. The main one, in my opinion, being that the assessment of risk was based on looking at the data that told the investors what they wanted to see (i.e. risk was relatively small instead of gargantuan). The housing market is something that many of us understand and relate to. Because it’s very nature is tightly woven into our lives this is an effective starting place. Simultaneously familiar and eye-opening as Silver disabuses us of the comfort that old ideas are safe. In this case, what you don’t know, or don’t want to know, can hurt you.

  • Are You Smarter than a Television Pundit: Chapter 02 delves into whether or not TV analysts are actually as incompetent as they appear. Silver discovered that television pundits, whether Republican or Democratic or working for ESPN all have roughly the same chance of being right and wrong as anyone else: roughly 50/50.

The breakdown is two separate categories of thinkers, unrelated to political or sports affiliation. Hedgehogs and foxes. I enjoyed these as they seem to apply to everyone I know. Hedgehogs have ideological stances. They are more likely to stand by their stances, even as facts change, and to be found espousing those theories loudly (on television). Foxes try to take in all the information and make a rational conclusion, no matter their personal opinion. You’ll rarely find them on television.

This section is great for putting information you hear from the media in perspective. The loudest, hardest opinions are easiest to hear.

  • All I About Are W’s and L’s: Chapter 03 looks into baseball and the evolution there in of it’s statics and sabermetrics. The greatest misunderstanding over sabermetrics is that it is attempting to remove baseball from baseball. In reality, it’s trying to gain a more accurate view of the game. RBI and ERA don’t post an accurate view of a players performance. RBI tell you how often a person bats with runners on base, not how well he hit. ERA fails to account for defense behind a pitcher. Even something as simple as errors depends on a player being good enough to even be close enough to attempt the play.

Silver uses Justin Pedroia as an example, pointing out that the older metrics and data analysis aren’t the best and we

This chapter strives to convey the understanding that you can have all the classifications and statistics in the world but if they aren’t giving you an accurate picture of events.

  • For Tears You’ve Been Telling Us that Rain is Green: Chapter 04 discusses weather. This chapter is great because the weather is a complex system often relegated to over simplification  It turns out that yes, Weather.com and Accuweather.com are in fact lying to you, but mostly out of your own interest. Better to forecast an 85% of rain when the data says 70; they modify their reports based on a trusted history of public response.

In other words, our expectations for data play a larger factor. Weather.gov is the actual NOAA National Weather Service’s homepage. That’s almost useless to the average person because that’s raw data, the average consumer doesn’t want to look at. The NWS attempts to be accurate and precise, not tailored to consumer expectations, which can admittedly be unfair.

  • Desperately Seeking Signal:  Chapter 05 uses earthquake to demonstrate how too much data can cause it’s own problems. There are theories that attempt to match the data and then there are theories which match the data. The first is what any reasonable science project attempts to answer. “How do I get x because of t, z, and r?” The second is when the hypothesis and answer match too specifically. In other words, the theories only work with specific criteria. This often occurs when too little it known or, book title, there is too much noise to determine what is actual signal. Correlation is not causation.

Earthquakes are a primary example, argues Silver. We can’t get accurate data on when they occur because we can’t drill into the crust. We know how they happen but we can’t accurately predict when. One of the problems is that, from my own geology classes, earthquakes and fault lines and severity are on a geologic timescale. What that means is they operate on frequency is in hundreds and thousands of years. Humans aren’t made for that. Oh well.

The other issue is, again, the misunderstanding of statics. I wonder if that’s a recurring theme. An earthquake that’s expected to happen once every 30 years is that, while most people assumed, without understanding  that the quake will happen 30 years on a dot and if it doesn’t happen, something is wrong or they are due. Realistically, it means that there might be 45 or 60 years between earthquakes, because that’s still one every 30 years. It’s a great reminder that, like an 80% chance of rain, isn’t 8 out of 10.

  • How to Drown in Three Feet of Water: Chapter 06 tries to convey how important it is to convey uncertainty in predictions. One of the scarier aspects of statics are how they are used. Silver makes the case that very often we ignore the fact that are projections aren’t solid numbers but instead guesses. There is a certain amount of instability inherent. When the NWS predicts an 80% chance of rain, what they are really saying is in 80% of simulations it thundered and in 20% it was completely sunny.

Using the economy as his example, Silver says that when the unemployment is at 9.2% and predicted to rise or fall, people should be aware that it can go beyond those estimates. He warns that politicians in the White House are notoriously bad at predicting results, regardless of political affiliation  but that results they do predict almost always favor their White House.

Short version? He says beware that there is always error in projections. They aren’t guaranteed.

  • Role Models: Chapter 07 checks out the flu epidemics. I’ll keep this short as I’ve already examined instability, but this chapter makes the point that panic and inaccurate models effect results. Humans try to make connections that aren’t there. That’s what they do. By not paying attention to the circumstances of outbreaks, as they pertain the uniqueness of their environment  horrible miscalculations and expectations result. Sometimes in deaths, other times in political embarrassment.

Recommendation: Not sure yet. Wait till I finish it next Tuesday. So far though, I love every page. It’s well thought out, well planned. There are plenty of analogies to help his case, many examples that different people can relate to, all working together cohesively. Honestly, the first half of his book makes very compelling arguments about the dangers of misusing and misunderstanding data, as well as human error and our desire to see patterns where they don’t exist. How willing someone is to accept this idea will likely determine how they receive the book.

Silver says that the second half will examine how discovering the signal from the noise may be refined. I’m curious to see what he has to say.

See you next week.

Advertisements