Category Archives: Guardian

Visualising data on Cambridgeshire council wards with Carto

“Data is the new oil…” is a phrase often used by those in the industry to describe the vast,  valuable and extremely powerful resource available to us. We generate an astronomical amount of data, in fact 90% of the data in the world today has been created in the last two years alone according to IBM. However, there is a real skills gap when it comes to working with the data – people often perceive it as being too complicated or intimidating to work with, while those who can extract it, process it and refine it will reap the rewards.

Data is readily available if you know where to look for it. The UK Government’s data.gov.uk site opens up datasets on the environment, health and education and often these datasets come cleansed without the need to process it too much (though I would always advise the cleansing and scrutiny of data regardless of source!). As a Cambridgeshire resident, I am very lucky to have access to Cambridgeshire Insight Open Data, which is a repository of open data relating to local matters and is managed by Cambridgeshire County Council and run by a local partnership. I therefore wanted to have a go at playing around with the data and utilising the numerous free tools available to explore and visualise it with the hope of making it more accessible – let’s face it, there’s more to life than staring at a spreadsheet!

My first project will be looking at crime figures in the county and projecting them on to a map, by council ward, using an online tool called Carto. The data will be sourced from the Cambridgeshire Crime Rates and Counts 2015/16 data set on the Cambridgeshire Insight Open Data site and I will be focusing on the data related to the rate of crimes per 1,000 population at ward and district level. The data I’m using can be found in column “CU” in the Excel sheet.

The benefit of looking at the rate instead of raw numbers is that it means the stats aren’t skewed by heavily populated areas e.g. I would expect the number of crimes in heavily populated areas to be higher than those in remote rural communities, but as the rate takes in to account the population within the area, it provides a more accurate measure.

The mapping processes behind Carto aren’t sophisticated enough to recognise a council ward name in data (yet), but it does allow you to overlay KML data meaning that the wards can be “drawn” on to the map in a new layer. Where does this KML data come from? Surely we don’t have to create it ourselves? Heh, don’t be silly, someone has already done all of the hard work for us (phew!) – thank you to Alasdair Rae, a Lecturer in the Department of Town and Regional Planning at the University of Sheffield who created it for an article in the Guardian and a spreadsheet containing all of the KML shapes of all council wards in England can be downloaded via a Google Fusion table.

I then matched up the wards, by name, between the two sheets and pulled out the KML code for each one and the resulting sheet can be found here. From here, everything can be done from within Carto so create yourself a free account and prepare to be dazzled!

There are plenty of Carto tutorials but there are essentially two sides to the tool: the dataset side (where the data can be uploaded) and the mapping side (where the maps connect to your existing datasets). The great thing about Carto is that your maps can be embedded in your own website so hey presto, here’s the resulting map…

Total crime rate per 1,000 people by Cambridgeshire ward (2015/2016)

Source: Cambridgeshire Insight Open Data :: Full page view of the map

I hope to make this a regular series of small tutorials about data visualisation and the use of open data so check back in a few months to see what else I have been up to!

Disclaimer: I am employed by Cambridgeshire County Council, though I have no links with the Insight Open Data team.

Quick links

Week 4: Interactive graphic based on US unemployment stats

Our goal this week was to think about what kind of interactive graphic we could create based on the data used in the Guardian’s piece about unemployment in the US -> http://www.guardian.co.uk/news/datablog/interactive/2011/sep/08/us-unemployment-obama-jobs-speech-state-map

There is a lot of data used behind the scenes of this graphic which is great but is also slightly frustrating. For example, if you click on a particular state, you get a wealth of additional information – but it doesn’t allow you to easily compare it to other states. The same goes for the drop-down at the top of the graphic – it’s great that you can view the unemployment rate at particular point in time, but it’s really hard to compare unless you are focussing on a particular state. I do however like the range of comparisons that have been made with the data, especially the ability to visualise the percentage point difference from the national figure – I shall have to remember that one in future 🙂

And so, I jotted down some thoughts about what I would like to see on an interactive graphic like this and came up with the following list;

  • The Guardian piece focuses on the unemployment rate in the US since Obama came to power…what about further back?
  • Is state level in-depth enough? What about within the state – how does the unemployment rate differ within the states themselves?
  • In the accompanying course material, we were told not to add more than 6 colours to a choropleth map (which makes total sense for comparison) but what about viewing a small list of those counties with the very lowest & highest unemployment rates that would normally be enclosed in ranges?
  • Based on feedback from last week’s assignment – I wanted to focus more on type, colour and “interactiveness” of the graphic – this is definitely where I need more practise.

And so with all of this in mind, I scribbled down possible graph/map/info ideas and arranged them on the table (see last week’s post for an idea of how it looked!) and I came up with this:

Unemployment Rates in the US (PDF)

Unemployment Rates in the US – with notes (PDF)

Notes about the graphic

  • The user is able to scroll back in time to see how the unemployment data differs on the map of the US. I added a line graph so that it was clear to see years when the unemployment rate was particularly high/low. I did think about adding an overlay to show the years that a new President came in to power – incidentally there does seem to be a trend of the unemployment rate dropping in the year this happens – but I did not progress along this line of investigation for this project. Maybe another time 🙂
  • The map at the top is interactive and allows the user to click on a particular county to see detailed information about it as well as the state in which is belongs. The small bar chart on the left would become active when a county is selected.
  • The user also has the ability to tick the boxes and add lines to the graph showing the county and state unemployment rates and compare them to the national figures.
  • I have taken on board comments from last week about colour, type and making it appear more interactive. It was VERY hard being so restrained with colour (I’m not used to this!) but I actually found working with Colorbrewer for the map colours gave me a base to start from and I didn’t stray from there.

I am really happy with this graphic and I didn’t rush as much as I did last week. I took my time, didn’t faff around with Illustrator too much and so had more time to concentrate on what I wanted to do and actually what I’d want to see on an interactive visualisation like this.

22 days left on my Illustrator trial…will I be adding it to my Christmas list (as well as Alberto Cairo’s book and Andy Kirk’s too)? YES!

Week 3: Sketch an interactive graphic

The goal for this week was to think about how an interactive graphic based on a particular report by Publish What You Fund, and also published in a Guardian blog, would look. The data in question relates to how transparent major donor organisations are with their own data and so each organisation has been rated using a distinct set of criteria created by Publish What You Fund, therefore producing an overall transparency index.

This assignment has really stretched me this week and made me take full advantage of the sketching/note-taking apps on my tablet as I found I was coming up with ideas in random places and needed to get them down for exploration.

My first task was to find out what the heck “transparency” actually meant and how it was actually measured and I was thankful that the data originated from a very well organised website. I then looked at both source websites and noted down what I thought was missing and how I would like to play with the data myself. This took about three or four days – and this is where a lot of sketching and brain storming came in; thinking of the “what ifs….” and “oooh how about I just change this…” scenarios.

I toyed with the data in Excel to see if I could find any interesting correlations such as splitting the data right down to individual indicators, looking at the annual resources and budget of each donor and in turn where the money goes but what I was really missing was information about the donor itself. I was very pleased to see the UK’s Department for International Development at the top of the list but in all honesty, I really knew nothing about them  and so I wanted to build that in to the graphic.

And so I started by jotting down potential graphs/data to include in my final interactive graphic and started arranging the sketches until I had something that I thought could work. Incidentally, I find jotting things down on paper like this so helpful as you invest very little time in it and it allows easy rearranging of elements – paper prototypes FTW!

 

 

From there, I installed the trial version of Illustrator CS6 and started playing around. To cut a long story short (it really was a long story as I battled with Illustrator’s graphs – I won in the end though!) I came up with the following design;

Aid Transparency Graphic (PDF)                  

Aid Transparency Graphic + Notes (PDF)

Notes about the graphic

  • The bar chart that can be seen at the top of the graphic can be manipulated by the buttons on the right hand side and the user can select to show the results of individual aid information levels or all of them (the total).
  • The user can also select to show particular countries instead of having everything on the graph which I found really hard to read in the Guardian blog.
  • If a user clicks on a donor’s name or the bar associated with that donor, the panel at the bottom will display additional information about the organisation. I added a space for some text about the organisation to add a bit of context and also a timeline to chart their major accomplishments so that users would be able to relate to an organisation’s particular focus. Both pieces of information could be scraped from donors’ websites and annual reports.
  • I have tried to minimise the use of the word “transparency” and instead used “openness” where possible as I personally wasn’t very clear about what this meant at first.

I am personally really pleased with this, as the work involved way more that playing around with a few graphs. I had to think about what I wanted to say, how I was going to represent it in a prototype form that would communicate how an interactive version of it would work. But I’m doing something that I love and time did indeed fly when I was tinkering all weekend!