A few months ago, while a solution for the MTA budget shortfall was being debated by the New York State Senate, The Open Planning Project helped parse MTA budget data into a machine searchable format. The MTA originally published the budget as a PDF. To extract the data I used a utility called pdftohtml to convert it into an XML document. I then used the python library lxml to convert the document into a set of csv files. The results of this labor can be seen on TOPP’s data site.
Soon after I published this data I was told by a number of people that the data would be more useful if presented in another format. At first I just started creating a bunch of command line python scripts that would suck in these csv files and spit them out in different formats. I quickly realized that I could accumulate these scripts and create a quick and dirty web application.
Over a few train rides I created an application called DataIO, and this week I finally got a chance to upload it to Google App Engine. Specifically I received three requests for data in different formats. I’ll give examples using the data set containing the MTA’s annual labor expenses.
- JSON for Flot:
Flot is a great javascript graphing library, but it’s not that easy to convert a CSV file into a Flot friendly format. After creating a data set in DataIO, you can request the data back in a JSON dictionary that can be plugged directly into Flot. For our Labor Expenses example this means simply constructing a URL such as this one:http://www.dataio.org/data/Wfb?format=flot&base_column=0&base_row=0
The “base_column” query string parameter represents the column in the CSV file that will used for the legend of the graph. The “base_row” represents the row in the CSV file that contains the values for the x-axis of the graph.
It’s not obvious how that JSON will display, so DataIO allows you to preview the graph by adding a “preview” query string argument:
http://www.dataio.org/data/Wfb?format=flot&base_column=0&base_row=0&preview=true - Google Charts:
Flot is great, but it’s not always the right solution. For example, if I wanted to add the Flot graph generated above into this blog post, I would have to load three javascript files onto this webpage. Google Chart offers a better solution for this use case: it creates this chart as an image, which can be included into a blog post without having to use Javascript. To construct the Google chart for our Labor expenses example, we can send DataIO the following request:http://www.dataio.org/data/Wfb?format=gchart_line&base_column=0&base_row=0which returns the URL for the following image:
- Data multiplied by a factor:
The MTA publishes all of their financial data in millions of dollars. Often it is useful to see the data in other units, such as dollars:http://www.dataio.org/data/Wfb?format=html&multiplication_factor=1000000&multiplication_start_row=1or in millions of Euros:
http://www.dataio.org/data/Wfb?format=html&multiplication_factor=0.734&multiplication_start_row=1The number to multiply by is sent in via the multiplication_factor argument and the multiplication_start_row tells DataIO not to multiply the first row by the factor.
A complete list of query string arguments that can be used to interact with DataIO are located on its front page. The code for this application is hosted at bitbucket.