*EDIT- an earlier version of this article had the wrong link to get to the Nautilus article. It’s fixed now. Thanks to my friends at CauseScience for catching that! Go check them out. They post awesome science stuff AND fix broken links.

Today’s Wednesday ORF comes to us, once again, from Nautilus magazine, and homes in on the topic of data presentation. (I swear, I swear, I’m not a shill for Nautilus; it’s just actually that good of a magazine. Plus, it’s a reliable source of ORF material on which I can rely when I’m in post-conference recovery mode. I wasn’t presenting my research this year – I was mostly at the Society for Neuroscience conference for network and science tourism – but it’s still pretty exhausting. Stay tuned for a couple Neuroscience run-down posts later this week.)

The context.
Over the course of her training, one of the single most important skills that a scientist learns is how to interpret data. Now, half the time, we’re interpreting our own data. I might ask, What’s the signal to noise ratio of a particular reading? Are the data too variable to glean any useful information from them? Are the data significant, in the scientific, statistical sense (meaning that there’s a less than 5% possibility that an experiment would yield a particular result by chance)? And more importantly – even if the data are statistically significant, are they real? Ideally, with each new experiment we perform, we subject our own work to rigorous skepticism. Ideally. It doesn’t always happen, because we’re human and imperfect and sometimes we’d really like to believe that our pet theories are borne out by data that just… isn’t quite there yet.

A happy side effect of learning to interpret one’s own data is learning how to see the data produced by other people, and to approach their data with similar skepticism. If a journal article claims a dramatic increase in Factor X, do the data support that claim? If a presenter claims that a particular procedure increases fluorescence in a cell, does the photographic evidence agree with the interpretation? And importantly – does the presentation of the data differ from the actual numbers? Once one learns to ask these questions, one starts to observe that, worryingly, data outside of scientific journals – and sometimes even within them – can be presented in confusing, even misleading, ways. There are spurious correlations (e.g. if X goes up and Y also goes up, then they must be related – see Tyler Vigen’s site Spurious Correlations for great examples). There are graphs with misleading, mislabeled, and even absent axes. There are alarmist reports of increased percentage risk (e.g., for a particular type of cancer, and so on) that never refer back to the original percentage risk… I could go on. And at another time, I will. At length. The point is, data in the news, in science, and in popular culture can be confusing, and can lead us to make assumptions that may not be wholly accurate.

The article.
So how can we demystify data? Becca Cudmore, writing at Nautilus on November 6, has some ideas. Her article – “Five Ways to Lie with Charts” – gives a great overview on various ways that graphical data can be spun to promote an interpretation favorable to the presenter. Now, I think the title itself is a little overly dramatic; technically speaking, none of these charts (presumably) is actually presenting false data – i.e., lying – for the purpose of the example.  And in fact, even scientists will try to present their data – which is often messy! – in the best light, and there’s nothing wrong with that. What these graphs do, however, is present data in a very visually leading way, and if one is unused to looking at data in this format, one might easily be swayed.

As I said above – I plan to write much more about this topic in a future (non-ORF) post. But if you’re looking for a take-home, I’d recommend going into any discussion of data with two questions in mind:

  1. What do the numbers actually say? For example, a graph might show a 30% increase in the height of a bar… but if the axis doesn’t start at zero, that representation might be over-emphasizing a small change. Maybe that amount of change is relevant for this field. Maybe it’s not. Always check.
  2. How do I feel about this graph? What’s your first impression? What are you perceiving? Our brains are great at pattern recognition – that’s why we can “see” constellations! – but they are also easily fooled by visual illusions. Check your gut feeling, and see whether your initial perception of the data leads you to believe something that may not be right.

More on this to come. Go be skeptical, dudes and dudettes.