Tuesday, December 26, 2006

Readily Reusable Trend Data

One of the goals of this blog is to demonstrate approaches that lower the cost and time of working with trend data. Our intention is to make it possible for important trend data in any given area of interest to be available in an understandable form to a wider range of citizens including non-experts.

Trends matter. They matter a lot. Historically, only those with substantial resources of time, money, and expertise have had ready access. Despite advances in technology, these impediments to more widespread access remain firmly in place except for a few prominent exceptions (e.g access to stock market trend data).

Our thesis is that there already is enough technology in place to accomplish the goal of wider availability. On today's internet, massive storehouses of trend data are publicly available from a growing range of important areas of interest. We have discussed several of these internet web sites in previous entries (most recently focusing on the FRED data available from the Federal Reserve Bank of St. Louis).

Many of these data storehouses also include their own specialized software that makes it possible for ordinary citizens to examine trend behavior once you learn the unique rules for creating charts at that web site.

Drawbacks of this approach are several: each site is different; the range of different trend views available is limited by what the designer of the site thought were important; and the data is not readily combinable with data from other sources without applying a substantial amount of time and expertise.

This brings us to the topic of today's entry: the idea of Readily Reusable Trend Data. If the storehouse of data from each interesting web site were all converted to a common, straightforward format that was easy to understand and reuse, then it would be possible to use a single, powerful, generic trend data visualization tool to look at and analyze and play with and understand any combination of available trend data from any source.

There are many possible candidates for such a universal readily reusable trend data format. A format that has proven itself in the trend analysis work I have been doing in recent years is a Comma Separate Values (CSV) file that obeys a few additional rules that make it effective with trend data.

DEFINITION: a comma-separated values (CSV) file is simply a text file representing a two-dimensional table of rows and columns. The text file consists of a series of rows of data. The values in each row of the table are separated from each other by a comma. The Nth value represents the entry for the Nth column.

Existing table-oriented software applications such as spreadsheets and relational databases can accept CSV files as input.

DEFINITION: a readily reusable trend data file is a CSV file that obeys the following set of additional rules:

  1. the file consists of a header row and a series of data rows
  2. the header row contains the names of each trend data factor
  3. each data row represents one time interval
  4. for each time interval there is only a single row
  5. there is one column (typically the first) that shows the time for the interval represented by each row
  6. each column represents one and only one trend data factor - the value in the nth position in each row corresponds to the the nth trend data factor named in the header row

Here's an example of a readily reusable trend data file:

Time, Checking account balance, Saving balance, Total assets

Jan-06, 473, 1322, 1795

Feb-06, 841, 938, 1779

Mar-06, 143, 1222, 1365

...

It's easy to see how data in this format is readily reusable. For example, you could use Microsoft Excel to open this file and it would automatically arrange the data into exactly the desired two dimensional table. It would then be quickly possible to use Excel to create a variety of different trend charts using the powerful generic charting capabilities of Excel. For example, you could plot any individual factor, or all three factors together. In addition, you could decide to add features such as moving averages or perform further calculations to create a new trend series such as computing the percent of total assets represented by savings.

Ready reusability may seem simple and obvious and maybe not worth much thought. Our experience counters that impression. It turns out to be really important in helping move towards the goal of lowering the cost and time of working with trend data and making that data understandable to an ever wider range of citizens.

In an upcoming series of posts we will shift our attention to a demonstration of how the availability of readily reusable trend data combined with a powerful generic trend visualization tool (TLViz - the TimeLine Visualizer) can move us much closer to our goals.