Thursday, August 25, 2011

Creating the context for successful analyses

Context is essential.
If you just jump right into analysis of a complex data set and subsequent visualization and story telling without first establishing a proper context, you stand a good chance of misdirecting your focus, time, and energy.

If you are strong analyst, visualizer, and storyteller you can still end up with an interesting discovery, a good story, and exciting graphics to share. But you can have also missed hidden secrets that would have provided even greater understanding and value to your audience.

Forcing your audience to work harder. And worst of all, if you don't supply the context, you force your audience - every reader, viewer, listener, everyone who wants to interact with and learn from your visualization - to work harder with a lot of guess work and uncertainty to put what your findings into a useful perspective.

Templates shut down thinking. It is common to find examples of data analysis and visual reporting online with almost no context at all. For recurring analyses and reporting (such as various government monthly reports) it's also common to see the same basic template and boilerplate re-used verbatim month after month with no sign that any new or fresh thinking about how the context might have changed, or what had been learned in previous months, or how best to present that month's results for maximum clarity and ease of understanding.

Set the stage for discovery. What foundational context is it essential to establish in order to set the stage for the most successful exploratory analysis and discovery of new and surprising and useful domain trends, patterns, and exceptions?

Here are some ingredients that can help create a strong contextual foundation in a given data domain . These are especially important for recurring situations such as analyzing and reporting on employment/unemployment.

What's the central question? One of the best ways to supply context is to list up front the key questions that you hoped to answer as you started your analysis. Then in the storytelling and reporting that you create, make sure you establish a link back to these questions with any answers you have found, any surprises you discovered along the way, and any new questions you are keen to explore during the next round of analysis. In other words, show your thinking and link it back to the context your questions established.

Link to the mission. If the analysis you are doing is in support of an important mission, including a description of that mission and the vision and core values that the mission supports will add power and depth to the context of your work.

Think in advance of full set of key metrics. One other thing to note which we will return to in a future post is that there seems to be a connection between analyses that visualize the fewest domain metrics and the analyses that begin with the weakest contextual foundation.

What principles do you use to establish context for your data analysis/visualization/storytelling?

Showing a key metric with multiple views: a nice example

Bill McBride's Calculated Risk blog has some crisp charts showing the latest new unemployment claims. The main chart shows this key metric since January 2000.



A second chart shows the same metric going all the way back to January 1971.



Both charts use a 4 week moving average to smooth out the more erratic week to week behavior. Bill's use of a dual chart approach helps present a much more complete picture of this important metric that puts recent behavior in context. Of course, even his "short" period is almost 11 years long so doesn't suffer from the common weakness of plotting too few data points.

Additional employment related charts showing other metrics and other views can be found in the Employment tab of Calculated Risk's Graph Gallery. Bill is prolific and posts some of the best looking, most unique charts related to economics and finance. Check out his gallery for yourself. You won't be disappointed

Despite these two excellent charts, one weakness I see in Calculated Risk's presentation of this important unemployment metric is that the verbal storytelling is weak. Bill's charts have potential explanatory power with important stories to tell, especially combined with the other charts in the Employment tab of the gallery, but these stories are left mostly as an exercise for the viewer.

In the blog post, the "story" told is mostly quotes from the dull boilerplate in the Department of Labor's UNEMPLOYMENT INSURANCE WEEKLY CLAIMS REPORT. This text discusses this metric with a very short term focus of only the preceding 4 weeks.

A second weakness is that the reporting (like almost all other reporting on the subject) only talks about and shows charts for this one Headline Initial Claims metric from the report while other complementary metrics are shunted aside. For example, some key missing metrics that are mentioned in the DOL report and whose short and long term time series charts could help us better understand the unemployment situation include:
  • insured unemployment rate - the percentage of "covered" workers collecting regular state benefits
  • insured unemployment - the number of people currently collecting regular state benefits
  • total persons claiming benefits in all programs
Some other metrics from other sources might also be added to the mix for fuller understanding such as:
  • total persons unemployed
  • percentage of total unemployed who are collecting benefits in all programs
  • total unemployed who are NOT collecting benefits
Note that Calculated Risk's Employment Tab does include these useful and complementary metrics shown in easy to digest graphic form but a story line to tie all these metrics together remains a challenge for another day.
  • headline unemployment percentage
  • employment population ratio
  • participation rate
  • number of workers who are part time for economic reasons
  • number unemployed for over 26 weeks
  • number unemployed for over 26 weeks as percentage of civilian labor force
What other employment related metrics would you like to see?

Do you know of others posting on the initial claims number who are crafting more complete stories than the standard laid down by the DOL report?


Monday, August 22, 2011

Reviewing the Situation

It's been a while since we formulated the Trend Visualization Principles displayed in our blog's right hand panel. So we thought it was a good time to revisit them and update them to reflect both on what we have learned and on the rapid pace of change in the intersecting domains of data gathering, analysis, visualization, story telling, sharing, and collaboration.

Our revised seven principles are:
  1. Context comes first
  2. Create a history
  3. Look at ALL the data
  4. Share ALL your data
  5. Explain your calculations
  6. Show your thinking
  7. Insure readability
Please check out the full statement in the right hand panel and let us know what you think.

These principles when put into practice can serve as antidotes to some of the weaknesses highlighted in the most recent post. (Common Weaknesses in Online Visualization & Storytelling)

We plan on posting some longer explanations of our thinking on these principles in coming days starting with two that are oft neglected in practice (self included) . Our experience has been that these two can reap substantial rewards when we make the time, when we take the time to remember to put them into action
  • Look at ALL the data
  • Show your thinking
What are your most useful principles that you keep in mind while gathering analyzing, visualizing, story telling, sharing, and collaborating about important trend data?

Friday, August 19, 2011

Common Weaknesses in Online Visualization & Storytelling

Looking around the web we notice a growing number of examples of online visualization and storytelling.

Some of these are brilliant and incisive and easily accessible and digestible by their intended audience.

Too many however (including I am sure many of my own) exemplify one or more of a common set of weaknesses that make them harder to understand and that diminish their usefulness and value.

Here is a list of the shortcomings that appear most regularly. Once you can recognize them, all of these are correctable, often with only a modest effort that will pay big dividends. We've talked about many of these in previous posts and will no doubt return to them again to describe the particular details.

Can you think of any others? Which ones do you think are most important to correct?

Please share your thinking in the comments. Thanks.

Online Visualization and Storytelling Weaknesses and Shortcomings
  1. Too short a time period shown
  2. Too few metrics shown (sometimes only one or two out of thousands) and often only a single independent view of the key story telling metrics
  3. No story presented - figuring out the story is left as an exercise for the audience/reader/viewer. This often goes hand in hand with visualizations that require a substantial time investment by the audience in order to discover messages that are not obvious at first glance. Or worse to spend time and not be able to figure out why that particular graphic was chosen from amongst all the choices available to the analyst/storyteller
  4. Data set used to create the graphics is not readily available for further analysis by interested audience members
  5. The larger data set used by the analyst/visualizer/storyteller is not available and not even defined or listed. Consequently the viewer has no idea of how much effort the storyteller put into the analysis before deciding to display a particular choice of graphical elements.
  6. A standard template is re-used without any new or fresh thinking and without any sign of building on what's already been learned from previous analyses
  7. Presenting only a single point in time for many metrics that change over time without providing the relevant time line view
  8. Comparing just the most recent and the previous value of a particular metric without taking earlier values into account. This goes hand in hand with over use of graphs and tables showing month over month change.
  9. When showing month over month change, failing to normalize the values to yearly percentages
  10. Explaining time series behavior in dense text that is hard to parse and understand even for expert data analysts when a simple time series graphic would have done the job in seconds
  11. Limited opportunities for further collaboration between the audience and those who created the visualizations and story line.
  12. Too few data points in the time series
  13. Use of large unsorted lists where some simple sorts and application of some variant of the 80/20 rule would have conveyed much more meaning in a much shorter time
  14. Too may metrics all mushed together into a single indecipherable graphic. Such charts typically are ones that have no story line associated with them. What does the chart mean? You go figure it out!
  15. Burying the lead (the potentially most interesting story element) so only audience members who invest significant time will ever have a chance to stumble across it. Everyone focuses on some headline number while the action is just a little bit below the surface and eager to see the light of day
  16. Absence of comparisons of the result to useful baseline values
  17. Working exclusively with the raw metrics as they arrive from their providers and missing out on opportunities to combine metrics to create calculated values that enhance the storytelling potential
  18. Heavy emphasis on working with aggregated metrics (e.g headline numbers) and not showing whether the same patterns hold up under a variety of disaggregation approaches
  19. Using widely varying raw metric values when a carefully selected simple moving average would have revealed greater insight
  20. Overly tiny graphics that fail to take advantage of the full screen real estate available and make key elements more difficult to read and understand
What weaknesses would you love to see corrected?

Monday, August 1, 2011

Telling a Story with Time Series Data

Once again, Barry Ritholtz over at The Big Picture shows his skills as a visual story teller in his recent post (Our Problem in Pictures) on the debt and deficit negotiations -- weaving together a series of charts produced using the powerful time series charting capabilities provided by the Federal Reserve Bank of St Louis (FRED)