It’s long overdue but here’s a quick recap of the Hacks and Hackers Hackday I attended on the 11th March 2011 at the Atrium, Cardiff. It was my first hackday – though I’d been itching to try one for a while it seems they don’t tend to happen in Cardiff that often. I wasn’t disappointed, the day was really enjoyable.
The hackday was sponsored and organised by Scraperwiki, an open-source effort which aims to make webscraper tools, and the data they harvest, easily available and modifiable to the general public. In particular, the event attempted to bring journalists (hacks) and developers (hackers) together so that by combining each other’s strengths, we might teach each one another how to obtain and interpret the masses of public data that exists on the internet.
I went along with Carey and Warren from Box UK, and we formed a team with three ‘hacks’ – Steve Fossey, Eva Tallaksen from Intrafish and Gareth Morlais from BBC Cymru. We named ourselves Co-Ordnance, though I can’t remember why – think it was decided while I was busy hacking!
Inspired by some ideas Eva had we initially decided to try to write an application to model stock market data, specifically a tool to collate stock market updates from floated companies. With a little more brainstorming we decided to make this a sub-feature of a larger tool that would plot companies on a UK map according to their registered addresses. The user would be able to filter by the type of business and the region in which it was registered, as well as letting them choose a specific point in time. Thus one could make observations about the behaviour of businesses, such as the popularity of certain sectors of business in certain areas, the impact the recession had on business growth or the collapse of businesses, which regions are ripe for investment, and so on.
While Warren and Eva focused on writing a Scraperwiki script to collect the stock market data the rest of us planned the application. The first thing we considered was how we could obtain geolocation data for the companies that we were to be referencing, and that led us into trying to scrape data from the Companies House website.
This proved to be difficult, as URLs within the site feature hashed components – possibly in an attempt to prevent spiders from navigating the site. This meant we’d never come up with a scraper solution in the time that we had available, but thankfully a member of the Scraperwiki team told us about OpenCorporates, which aims to “have a URL for every company in the world”. Many of those URLs provide addresses for companies, which we could geo-encode (using Google’s Geoencoder) to obtain UK co-ordinates for that company. So we were able to plot companies per region on a map.
So once we’d downloaded the data of a few thousand companies (enough for a sample dataset) and indexed the data by location, I needed to write the frontend to plot the companies on a Google Map. By this point we had a little under three hours left, so it was a bit of a push – especially since I hadn’t done any Google Maps development in about three years, and the API had completely changed during that period. Thankfully a generous supply of sweets, Coke and beer kept me energetic, and we just about made it. We didn’t get chance to integrate Warren’s work into the app, which was a shame, but we still managed to achieve two solutions, which I thought impressive considering the time we had available to us (about 7 hours, once we’d got started).
In the reckoning we came third, which we were chuffed with. In addition to this Warren also won an individual prize for making the best use of Scraperwiki, which was totally deserved. It would perhaps have been nice if the judges had explained why they voted as they had, as I, and others I spoke to, were confused as to what the marking criteria was. Nevertheless it was a really good day, which challenged me in ways I’d never been challenged before, and forced me to work within a totally new team, under pressure, on a topic that I knew nothing about. I’d recommend the Scraperwiki tools, and the next time I need to harvest some data I’ll definitely be making use of them.
The code I wrote on the day is available at my Github page, and I’m hoping to get the app into a more usable state and hosted on this site in the near future.