-
Latest from the Blog
Our Other Blogs
What We Do
We partner with forward-thinking organizations and public agencies on software development and technology strategy.
Areas of expertise:
- Open Source Software
- Open Data and Open Government Strategy
- Journalism
- Transportation Reform
- Video Production
- Curriculum Development
Contact us today to find out what we can accomplish together.
Upcoming Events
- September 6, 2010:
- September 12, 2010:
- October 2, 2010:
- October 13, 2010:
- October 18, 2010:
Tags
bike racks business civic media civic works conferences Creative Commons events fixcity foss4g gothamschools government job job opening livable streets livable streets education los angeles mapping new york nyc NYC DOT open311 open data opengeo opengeo suite open government openmuni open source panels participatory planning planning san francisco software standards streetfilms streetsblog streetseducation summerstreets talks training transit transportation urban planning video walk21 walk to schoolArchives
DataIO
A few months ago, while a solution for the MTA budget shortfall was being debated by the New York State Senate, The Open Planning Project helped parse MTA budget data into a machine searchable format. The MTA originally published the budget as a PDF. To extract the data I used a utility called pdftohtml to convert it into an XML document. I then used the python library lxml to convert the document into a set of csv files. The results of this labor can be seen on TOPP’s data site.
Soon after I published this data I was told by a number of people that the data would be more useful if presented in another format. At first I just started creating a bunch of command line python scripts that would suck in these csv files and spit them out in different formats. I quickly realized that I could accumulate these scripts and create a quick and dirty web application.
Over a few train rides I created an application called DataIO, and this week I finally got a chance to upload it to Google App Engine. Specifically I received three requests for data in different formats. I’ll give examples using the data set containing the MTA’s annual labor expenses.
Flot is a great javascript graphing library, but it’s not that easy to convert a CSV file into a Flot friendly format. After creating a data set in DataIO, you can request the data back in a JSON dictionary that can be plugged directly into Flot. For our Labor Expenses example this means simply constructing a URL such as this one:
http://www.dataio.org/data/Wfb?format=flot&base_column=0&base_row=0
The “base_column” query string parameter represents the column in the CSV file that will used for the legend of the graph. The “base_row” represents the row in the CSV file that contains the values for the x-axis of the graph.
It’s not obvious how that JSON will display, so DataIO allows you to preview the graph by adding a “preview” query string argument:
http://www.dataio.org/data/Wfb?format=flot&base_column=0&base_row=0&preview=trueFlot is great, but it’s not always the right solution. For example, if I wanted to add the Flot graph generated above into this blog post, I would have to load three javascript files onto this webpage. Google Chart offers a better solution for this use case: it creates this chart as an image, which can be included into a blog post without having to use Javascript. To construct the Google chart for our Labor expenses example, we can send DataIO the following request:
http://www.dataio.org/data/Wfb?format=gchart_line&base_column=0&base_row=0which returns the URL for the following image:
The MTA publishes all of their financial data in millions of dollars. Often it is useful to see the data in other units, such as dollars:
http://www.dataio.org/data/Wfb?format=html&multiplication_factor=1000000&multiplication_start_row=1or in millions of Euros:
http://www.dataio.org/data/Wfb?format=html&multiplication_factor=0.734&multiplication_start_row=1The number to multiply by is sent in via the multiplication_factor argument and the multiplication_start_row tells DataIO not to multiply the first row by the factor.
A complete list of query string arguments that can be used to interact with DataIO are located on its front page. The code for this application is hosted at bitbucket.