Our blog about technology
and civic issues

DataIO

A few months ago, while a solution for the MTA budget shortfall was being debated by the New York State Senate,  The Open Planning Project helped parse MTA budget data into a machine searchable format. The MTA originally published the budget as a PDF. To extract the data I used a utility called pdftohtml to convert it into an XML document. I then used the python library lxml to convert the document into a set of csv files. The results of this labor can be seen on TOPP’s data site.

Soon after I published this data I was told by a number of people that the data would be more useful if presented in another format. At first I just started creating a bunch of command line python scripts that would suck in these csv files and spit them out in different formats. I quickly realized that I could accumulate these scripts and create a quick and dirty web application.

Over a few train rides I created an application called DataIO, and this week I finally got a chance to upload it to Google App Engine.  Specifically I received three requests for data in different formats.  I’ll give examples using the data set containing the MTA’s annual labor expenses.

  • JSON for Flot:
    Flot is a great javascript graphing library, but it’s not that easy to convert a CSV file into a Flot friendly format. After creating a data set in DataIO, you can request the data back in a JSON dictionary that can be plugged directly into Flot. For our Labor Expenses example this means simply constructing a URL such as this one:

    http://www.dataio.org/data/Wfb?format=flot&base_column=0&base_row=0

    The “base_column” query string parameter represents the column in the CSV file that will used for the legend of the graph. The “base_row” represents the row in the CSV file that contains the values for the x-axis of the graph.

    It’s not obvious how that JSON will display, so DataIO allows you to preview the graph by adding a “preview” query string argument:

    http://www.dataio.org/data/Wfb?format=flot&base_column=0&base_row=0&preview=true

  • Google Charts:
    Flot is great, but it’s not always the right solution. For example, if I wanted to add the Flot graph generated above into this blog post, I would have to load three javascript files onto this webpage. Google Chart offers a better solution for this use case: it creates this chart as an image, which can be included into a blog post without having to use Javascript. To construct the Google chart for our Labor expenses example, we can send DataIO the following request:

    http://www.dataio.org/data/Wfb?format=gchart_line&base_column=0&base_row=0

    which returns the URL for the following image:

    http://chart.apis.google.com/chart?cht=lc&chs=800x300&chd=e:yf1K2t4H5P6C,KdKjKWKbKiKq,MDM3NkOVPGP7,H8InI9JUJwKO,O1PRP-QLQSQw,KBKZKfKrK1K6,BgBEA6A-BAA8&chtt=Labor%20Expensed%20by%20Category&chdl=Payroll|Overtime|Health%20%26%20Welfare|OPEB%20Current%20Payment|Pensions|Other-Fringe%20Benefits|Reimbursable%20Overhead&chco=DC143C,912CEE,0000FF,00EE00,FFFF00,FF8C00,000000&chxt=x,y&chxl=0:|2007|2008|2009|2010|2011|2012|1:|-400.0|950.0|2300.0|3650.0|5000.0

  • Data multiplied by a factor:
    The MTA publishes all of their financial data in millions of dollars. Often it is useful to see the data in other units, such as dollars:

    http://www.dataio.org/data/Wfb?format=html&multiplication_factor=1000000&multiplication_start_row=1

    or in millions of Euros:

    http://www.dataio.org/data/Wfb?format=html&multiplication_factor=0.734&multiplication_start_row=1

    The number to multiply by is sent in via the multiplication_factor argument and the multiplication_start_row tells DataIO not to multiply the first row by the factor.

A complete list of query string arguments that can be used to interact with DataIO are located on its front page.  The code for this application is hosted at bitbucket.

This entry was posted in open government. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>