Author Archives: Adele

Visualising data on Cambridgeshire council wards with Carto

“Data is the new oil…” is a phrase often used by those in the industry to describe the vast,  valuable and extremely powerful resource available to us. We generate an astronomical amount of data, in fact 90% of the data in the world today has been created in the last two years alone according to IBM. However, there is a real skills gap when it comes to working with the data – people often perceive it as being too complicated or intimidating to work with, while those who can extract it, process it and refine it will reap the rewards.

Data is readily available if you know where to look for it. The UK Government’s data.gov.uk site opens up datasets on the environment, health and education and often these datasets come cleansed without the need to process it too much (though I would always advise the cleansing and scrutiny of data regardless of source!). As a Cambridgeshire resident, I am very lucky to have access to Cambridgeshire Insight Open Data, which is a repository of open data relating to local matters and is managed by Cambridgeshire County Council and run by a local partnership. I therefore wanted to have a go at playing around with the data and utilising the numerous free tools available to explore and visualise it with the hope of making it more accessible – let’s face it, there’s more to life than staring at a spreadsheet!

My first project will be looking at crime figures in the county and projecting them on to a map, by council ward, using an online tool called Carto. The data will be sourced from the Cambridgeshire Crime Rates and Counts 2015/16 data set on the Cambridgeshire Insight Open Data site and I will be focusing on the data related to the rate of crimes per 1,000 population at ward and district level. The data I’m using can be found in column “CU” in the Excel sheet.

The benefit of looking at the rate instead of raw numbers is that it means the stats aren’t skewed by heavily populated areas e.g. I would expect the number of crimes in heavily populated areas to be higher than those in remote rural communities, but as the rate takes in to account the population within the area, it provides a more accurate measure.

The mapping processes behind Carto aren’t sophisticated enough to recognise a council ward name in data (yet), but it does allow you to overlay KML data meaning that the wards can be “drawn” on to the map in a new layer. Where does this KML data come from? Surely we don’t have to create it ourselves? Heh, don’t be silly, someone has already done all of the hard work for us (phew!) – thank you to Alasdair Rae, a Lecturer in the Department of Town and Regional Planning at the University of Sheffield who created it for an article in the Guardian and a spreadsheet containing all of the KML shapes of all council wards in England can be downloaded via a Google Fusion table.

I then matched up the wards, by name, between the two sheets and pulled out the KML code for each one and the resulting sheet can be found here. From here, everything can be done from within Carto so create yourself a free account and prepare to be dazzled!

There are plenty of Carto tutorials but there are essentially two sides to the tool: the dataset side (where the data can be uploaded) and the mapping side (where the maps connect to your existing datasets). The great thing about Carto is that your maps can be embedded in your own website so hey presto, here’s the resulting map…

Total crime rate per 1,000 people by Cambridgeshire ward (2015/2016)

Source: Cambridgeshire Insight Open Data :: Full page view of the map

I hope to make this a regular series of small tutorials about data visualisation and the use of open data so check back in a few months to see what else I have been up to!

Disclaimer: I am employed by Cambridgeshire County Council, though I have no links with the Insight Open Data team.

Quick links

Data or it didn’t happen!

I attended the Visualized.io conference in November and amongst many brilliant talks was one by @JazzPazz about lifelogging where I first heard the phrase “Data or it didn’t happen”. Much like the phrase I’ve heard many times before, “Photo or it didn’t happen…”, we are now moving towards an age where we have to prove we have seen something, been somewhere, walked that epic walk. I think I welcome this to a point; whenever I see a sweeping statement made in the paper I want to see the numbers to prove it (though stats can of course be spun to say what you want) and anything that makes us question the drivel we are fed in the papers can only be a good thing.

What I don’t welcome is when the stats let us down!

I have started a fitness regime (HA, yes me!) and am on the Couch to 5K running program. I’ve got the podcasts on my phone, I’m signed up to Endomondo and so far it’s been great. Wow, the stats I can get on each C25K running session are awesome and I love being able to see how I’ve improved via the fastest mile, average pace or 12 minute test, I love it. UNTIL TODAY. Today, my GPS let me down and copped out over half way through the run – technology let me down. I had such a great run too; it was cold outside, the grass was crispy and my route took me along an old railway line where birds were darting through the trees and I smashed it. I was able to run each minute session in full and repeated it eight times, I’m recovering quicker between each run and I felt I went faster…but I have no proof!

Data or it didn’t happen.

I know it happened, I know how awesome it felt and that’s something I need to focus on more. Why do I log my runs and share them anyway? It’s certainly not to prove that Usain Bolt has anything to worry about! But I guess it’s for pride in myself, for motivation purposes, a badge of honour (gamification of life…) and to play with all the graphs. In a few months time this will just be a minor hiccup on the graph, but it was interesting to see how much of an effect it had on me. Next run I’m going to do even better to get my stats back on track…unless the GPS lets me down again and then it’s war!

Raspberry Pi (or rather Raspberry Spi – setting up a live webcam from scratch)

While @elpollouk has been in San Francisco for GDC, I’ve been having *issues* with our letting agents thinking they can have 24/7 access to the flat without letting me know in advance – not only is it against the law, it’s bloody freaky coming home to find things moved around when you’re on your own. So, I decided to do something about it and gave myself a little project today – setup a Raspberry Pi from scratch and have it streaming a webcam image so I can keep an eye on my flat when I’m at work and also have it alert me if/when it detects motion.

Here’s my kit:

1 x Raspberry Pi (model B)
1 x Playstation 3 Eye Toy camera
1 x Pimoroni Rainbow PiBow Case
1 x 16GB Class 10 SDHC SD Card (preloaded with NOOBS)
1 x Edimax EW-7811UN 150Mbps Wireless Nano USB Adapter
1 x Pimoroni PiHub 4 Port Raspberry Pi Hub (you must use a powered USB hub if connecting a webcam as it will draw too much power from the Pi itself – this one was listed on the eLinux wiki as being specifically designed for the Raspberry Pi and as a novice I enjoyed having that reassurance especially with a product that I’m not too familiar with).

NOTE: The Rainbow PiBow case is beautiful, but please remember to peel off all the white plastic from each piece of acrylic before constructing it. Yes, I was that person. Yes, I had to take it all apart and start again.

This won’t be an exhaustive guide on how I set it all up, because there are some fabulous resources on the web and I think it’s within the spirit of the Pi to have a bit of a play and tinker. However I’ll note down the blogs I found really useful along with any additions to the instructions that I found when installing the kit.

  1. Set up the wifi (not specifically for Edimax but worked nonetheless): http://richteel.blogspot.co.uk/2012/11/configuring-wifi-connection.html
  2. Once on the network and connected to the Internet, run the following commands to get your Pi up-to-date:
    sudo apt-get update 
    sudo apt-get dist-upgrade
  3. Set up a remote connection, I’m connecting to it from a Mac and so found the instructions on this site really helpful – http://ic3n1.blogspot.co.uk/2013/01/raspberry-pi-rdp-remote-desktop-from-mac.html (from point 12 onwards).
  4. Once I had got the remote connection up and running, I moved the Raspberry Pi away from the TV, unplugged the keyboard and mouse and placed it on shelf – pride of place. When I was connected to it via remote desktop, I couldn’t find a way in the GUI to restart/shut down the device and so a quick Google helped me to discovered that it could be done via command line with one of these commands:
    sudo reboot 
    or
    sudo shutdown now
  5. To set up the webcam, I followed this guide (http://pingbin.com/2012/12/raspberry-pi-web-cam-server-motion/) aaaaaand it didn’t work. A quick Google suggested I try popping the URL to the camera in Safari instead of Chrome and stone the crows it only bloody worked! I have no idea why Safari works, but it did!

So I currently have a Raspberry Pi set up on the wireless network that I can connect to remotely and view a web cam connected to it via a powered USB hub. Not bad for the first day of playing, I really look forward to developing this further and seeing how I can turn it in to a surveillance device so I can catch those pesky letting agents in action!

Mapping Country Links in Tableau

Map I wanted!  

I’ve been looking to do this sort of visualisation with map data for a while; to show links between countries and to have the chunkiness of the lines correspond to particular amounts. I am still very, very new to Tableau but thought their tutorials may be a great place to start. The Advanced Mapping Techniques tutorial had exactly what I wanted, though bizarrely their training workbooks below the video don’t contain that specific example. However, by scrolling through the video and working it out myself – thank GOODNESS there was a shot of the data behind the scenes around the 1 minute mark – I managed to put together my own version, and it works! Woo hoo!

So, how did I get there? Here’s how to do it step-by-step:

1. Create a spreadsheet with the following data (you can copy & paste mine from below).

IMPORTANT – each line is a one-way path but in order for it work, there must also be a return path. For example, the data in the first row goes from the UK to US, but there also needs to be a return line back from the US to the UK (row 4). If you don’t do this, all the points in the first three rows will show in the UK unless they have return points – which is a bit pointless (fnarr!)

Country Path Total
UK UK-US 200
UK UK-Brazil 450
UK UK-China 700
US UK-US 200
Brazil UK-Brazil 450
China UK-China 700

2. Create your workbook in Tableau and import your data (I connect live so I can tweak my data behind the scenes).

3. Tableau will automatically generate the Longitude and Latitude for you, but make sure you spell the country names correctly! Then set-up the following dimensions:

Columns: Longitude
Rows: Latitude
Color: Path
Size: SUM(Total)

Here’s a screenshot to help! Oh and make sure it’s set to a “Line” graph.

Tableau setup

Hooray you now have your map set-up!

You can now do funky things like creating (sort of) arrows instead of fat lines by altering the data like this…

Country Path Total
UK UK-US 200
UK UK-Brazil 450
UK UK-China 700
US UK-US
Brazil UK-Brazil
China UK-China

This means the line width starts from nothing at the point of origin and grows as it reaches its destination. Your map should now look like this:

Map pointy lines

See, that was fairly painless, though I still want to continue having a play around with it, I’m sure there is a lot more you can do. I’d like to find a definitive list of country names that can be used in Tableau, can initials work too I wonder? Who knows, that’s for another time…

Grand Designs

I’ve had a bit of a rubbish start to the year – bad news and sad news abound and being incredibly busy at work has left me a little drained. But it’s times like this when the mind really helps you to focus on the important things in life and makes you realise that there’s a wonderful great big world out there and so many fantastic opportunities to take advantage of.

I have found myself seeking a little bit of solace during lunchtimes in  The National Gallery (which is right next to work and I’m ashamed to say has taken me 7 years to explore…silly girl!) and stare at some gorgeous paintings. Something to take my mind of things for a few minutes; something different to focus on. It’s been fantastic – I even found myself wandering in to a local art shop and buying a sketch pad and pencils and furiously trying to emulate the head of Medusa from Perseus turning Phineas and his Followers to Stone.

*Come on Adele get to the point*

To bring this back to the whole data viz thang, I realised that while I have the skills for playing with data and I have the skills to manipulate the data/graphics on the PC, I have little/no knowledge about DESIGN itself. Sure, I know what looks good and what doesn’t but I have no basis, no grass-roots knowledge about Graphic Design itself. With this in mind, on a whim I signed up for an 8 week evening course at Central Saint Martins College in London in Graphic Design and I’m so excited! It sounds like it strips everything back to basics; no computers just theory, small projects and basic materials and it’s exactly what I need. Something new to focus on, something that I know I will love and something that will kick me in the right direction (and also give me a little kick up the butt too!).

I have also started to read The Design of Everyday Things – Donald Norman, it was recommended on a blog I was reading about design. I’m really, really enjoying it and would advise you to read it if you are interested in UX design and good practice, it has made me laugh numerous times on the train in to work. I couldn’t help but chuckle when I was faced with the following scenario in the toilets at the local cinema last weekend, I think it’s something Donald Norman would enjoy. It was the only tap; how am I supposed to wash my hands, knowing I’ll burn myself if I do?!?

2014-02-08 17.31.19

GCSE Results

There has been much chatter this week about GCSE results, which not only serves as a means by which to make me feel old (16 years ago people, 16 YEARS!) but it has also been a fabulous opportunity to dive in to the data. News outlets predictably report on the performance gap between girls & boys and the increase/decrease in the uptake of particular subjects…some things never change! I personally wanted to get my hands on the data so that I could whack it in to Tableau and play with it myself (thus also learning how to use Tableau too – double bonus!)

I obtained all of my data from a comprehensive Guardian article and wrangled it a bit get it in to the format that I needed (BIG shout out for the fabulous Tableau add-on for reshaping data in Excel!)

NOTE: As this blog is hosted by WordPress, I can’t embed the graphics due to Javascript restrictions. So if you want to see the graphics in all of their beauty and filter by different criteria, please click the link below each graphic).

GCSE Subject Breakdown

Subjects taken by all candidates in 2013 – interactive graphic allows you to filter by gender and year.

GCSE Subject Treemap

Link to interactive graphic

Percentage of grades by gender

Split by gender and grade – you can see that girls’ entries achieve higher grades than boys’. As widely reported, 8.3% of entries from girls achieved an A* compared to 5.3% of entries from boys. The interactive graphic allows you to filter by year.

Grades for entries by gender

Link to interactive graphic

Change in the percentage of entries achieving each grade

This graphic demonstrates that the percentage of entries achieving top grades has fallen between 2012 and 2013. The interactive graphic allows you to filter by gender.

Change in grades for entries

Link to interactive graphic

Results per subject

For each subject, we have a breakdown of the percentage of entries achieving each grade – for example 16.6% of Chemistry entries achieved an A* compared to 9.8% of French entries. The interactive graphic allows you to filter by gender.

Results per subject

Link to interactive graphic

This was a fantastic opportunity to for me to brush up on my Tableau skills and play with interesting and insightful data. Happy to hear comments on how I can improve my work or further areas I can explore.

Daily Mail pie chart fail, a crime against data visualisation

I have a love/hate relationship with pie charts – when they are used well, they are a brilliant way of showing proportions (x is bigger than y, which is bigger than z) and seeing where a particular slice fits in as part of the whole (mmm pie). I’m certainly not the first person to wax lyrical about pie charts, I’m aware that Nathan Yau has demonstrated a good use of pie charts and I totally agree with him.

However, as I was browsing the web this morning, a story caught my eye in The Mail (don’t judge me) about Britain’s crime hotspots and how Stratford in East London is awarded the country’s worst crime hotspot (I was a Games Maker at the 2012 Olympics and Stratford is very much of interest to me). And then I saw THIS MONSTROSITY…

Daily Mail Pie Fail

Daily Mail Pie Fail
Source: Daily Mail

WHOAAAHHHHH THERE – I’ll give you a couple of minutes to digest that beauty.

This is a perfect example of why I also hate pie charts…I think it should be reported to the data viz police.

Here’s a few of the problems:

  • It contains far too many slices – it’s information overload and it’s really hard to compare categories, it’s just a sea of labels.
  • It’s 3D which really isn’t the best way to project proportions as they are liable to misinterpretation; I can’t really say it any better than Drew Skau on the wonderful Visual.ly blog.
  • It’s not in any order, again making it harder to read (I personally prefer a pie chart in descending order with highest proportion first).

In fact, I think it breaks every rule in this Eager Eyes piece.

So I thought I’d see if I could improve it by turning it in to a bar chart instead…

Bar Chart - click to enlarge

Gosh, that looks much better! It’s in descending order and so is easier to compare categories and to see the categories containing the most/least offences. It’s probably not the prettiest chart but it’s what I could muster up in Excel in 10 mins.

It’s really interesting data and it’s definitely a set I’d like to explore more – for example what would fall under “Other theft”? I’d also be interested to compare these stats to a time when an average 130,000 people weren’t visiting the area every day for four weeks (the amount of people visiting for the Olympics & Paralympics have to skew the figures right?). But that’s for another time. My goal was to make the Mail’s pie chart easier to read and to allow proper dissemination of the data and I think my solution is definitely a step in the right direction!

Using a UK postcode to find Lat and Long and other useful information

So I have a list of UK postcodes and I want to find out the latitude and longitude of that postcode to plot it on to a map. There are numerous ways to do this, including:

1. Visiting Google Maps

2. Entering the postcode in to the search box

3. Right-clicking on the little point on the map and selecting “What’s here?”

4. Copying and pasting the lat and long that magically appears in the search box.

That’s nice, quick and easy BUT what if I’ve got hundreds or thousands of postcodes to search? I can’t spend the whole sunny weekend doing the same procedure can I?

No.

I’m self-learning Tableau at the moment and it seems to have great location support…if you provide it with US addresses (gnrrr), but I wanted to find a way of plotting a series of points on a map of the UK derived using their addresses. A bit of Google searching led me to UK Postcodes (set up by Stuart Harrison and based on Ordnance Survey OpenData) , a site that lets you enter a UK postcode and returns lots of information about that location (e.g. long, lat, county and right down to district and ward) and what made me excited was that the site had an API allowing you to pass it a postcode via URL (e.g. http://www.uk-postcodes.com/postcode/NW18TQ.json) and it would output the meta data it in either XML, CSV, JSON or RDF for you. PERFECT!

After a further read around the site, I found that Edd Robinson had created a library acting as a wrapper for the API which I could import in to my own Python project. And so, without further ado here is my Python code:

[code language=”Python”]
from postcodes import PostCoder
import csv

f = open(‘Extracted_Data_from_Postcodes.csv’, ‘w’)
i = 0
pc = PostCoder()

loc_lat = ""
loc_long = ""

with open(‘Postcodes_to_Extract.csv’, ‘rU’) as g:
reader = csv.reader(g)

for row in reader:
#Col 0 = ID :: Col 1 = Postcode
result = pc.get(str(row[1]))

if str(result) != "None":
location = result[‘geo’]

for key, value in location.items():
if (key == "lat"):
loc_lat = value
if (key == "lng"):
loc_long = value

#ID, Postcode, Lat, Long
write_to_file = str(row[0]) + ", " + str(row[1]) + ", " + str(loc_lat) + ", " + str(loc_long) + "n"

#Add the iteration count to output to screen to see how far we are up to.
print str(i + 1) + ", " + write_to_file
f.write(write_to_file)
else:
#If there is a problem translating the postcode, output "BROKEN" to file to manually check.
write_to_file = "BROKEN, " + str(result) + ", " + str(row[0]) + ", " + str(row[1]) + "n"

print str(i + 1) + ", BROKEN, " + str(row[0]) + ", " + str(row[1])
f.write(write_to_file)
i = i + 1

f.close()

[/code]

My input file looked like this:

[code]
109484, SG5 4PF
109486, MK44 2DB
109487, LU4 9UJ
109488, LU6 1RE
109489, MK43 8DY
109490, MK45 5JH
109491, MK44 3QD
109492, MK45 3BX
109493, MK17 9QL
109494, MK43 9JT
[/code]

And my screen output looked like this:

[code]
1, BROKEN, 109484, SG5 4PF
2, 109486, MK44 2DB, 52.214741, -0.461977
3, 109487, LU4 9UJ, 51.927696, -0.500824
4, 109488, LU6 1RE, 51.879322, -0.563452
5, 109489, MK43 8DY, 52.164539, -0.623209
6, 109490, MK45 5JH, 51.982376, -0.495111
7, 109491, MK44 3QD, 52.137440, -0.377085
8, 109492, MK45 3BX, 52.080341, -0.446214
9, 109493, MK17 9QL, 51.989906, -0.619803
10, 109494, MK43 9JT, 52.095955, -0.528752
[/code]

Afterwards, I was then able to manually check out why SG5 4PF failed, actually, I’m not sure why it failed but I was able to correct the lat & long via a Google Map search. Exactly what I needed with minimal effort and a chance to flex my muscles with Python again. I then imported the CSV in to Tableau, but that’s the subject of a future blog post…

Setting up Python and Easy_Install on Windows 7

This weekend, I needed to set up Python on a Windows 7 virtual machine that I had set up on my Mac. I don’t know why, but I always seem to have a problem installing Python and setting up the environment and so I’m going to document the process here so that in X months time when I need to do it again, I just need to follow my own instructions!

As part of the process, I also install easy_install to install libraries and so I’ll also document that here.

1. Download the Python Windows installer from here -> http://www.python.org/download/ (I downloaded Python 2.7.5) and run the executable on your machine.

2. Run the installer – I normally leave all the defaults as they are, so end up installing it in C:Python27

3. Once installed, SET UP YOUR ENVIRONMENT VARIABLES, I can not express how important this is as you’ll hit so many problems without do it (try opening the command prompt and typing python to see Windows shrug its shoulders and say “huh…dunno…”)

  • If you haven’t played with environment variables before, just stick to following these instructions as you can set them up through the Windows GUI.
  • Right click on “My Computer”, select “Properties” > “Advanced system settings” and click on the “Environment Variables” button
  • In the System Variables box, find the variable called “path” and click on the “Edit…” button
  • In the “Variable value” box, at the end of the entry, add the following text: ;C:Python27;C:Python27Scripts (note the leading semi colon, also note that I’m using the default installation folder – if you have installed Python elsewhere, just change the path structure).
  • Click “OK” a couple of times and hey presto, your environment variables are set up.
  • To test it has worked, open a new command window and type in python and if you see this, you’re on to a winner…

Microsoft Windows [Version 6.1.7600]
Copyright (c) 2009 Microsoft Corporation. All rights reserved.

C:Usersadelegilpin>python
Python 2.7.5 (default, May 15 2013, 22:43:36) [MSC v.1500 32 bit (Intel)] on win32
Type “help”, “copyright”, “credits” or “license” for more information.
>>>

4. The next step in the process is to set up easy_install and so we need to go to the setuptools page (links to version 0.8) and download the ez_setup.py script (I normally put it straight in to my Python27 folder).

5. Open a command prompt and type python ez_setup.py install – you’ll see a load of code whizz by which will hopefully end as follows;

Processing dependencies for setuptools==0.8
Finished processing dependencies for setuptools==0.8

C:Python27>

6. easy_install has now been set up and you can test to see if it is there, by typing easy_install in to a command prompt, which will throw an error about no URLs, you know that the tool has been set up successfully

Done!

To use easy_install to get new libraries, just use the following syntax: easy_install <library name> (e.g. easy_install postcodes)

Displaying English districts in Google Earth

The other day at work, I was asked to find out how many X* were in a particular district in England. Given we seem to have SO MANY ways that we slice & dice the country up with Ceremonial Counties, Parliamentary Constituencies, Local Government Districts (which can be boroughs, cities or Royal Boroughs) you can see why I started to have a small panic! But then I remembered you could search for districts in Google and it would outline them with a little red boundary line, so I gave it go.

“Woking District”

Woking Outline

Ta dahhhh!

Great…except I then wanted to overlay the data I already had mapped in a Google Fusion table with longitudes & latitudes. FAIL. I could find no way of either exporting this boundary data, or overlaying the longitude & latitude points I had setup in a fusion table. So I Googled around the subject and found this wonderful post on Stack Overflow – so I needed to head on over the OpenStreetMap, download the boundary data for my district, additionally convert the gpx file to a KML and then whack it in to Google Earth…so I did that!

1) Find the district you want in OpenStreetMap

2)  Download the gpx file and then convert it to a KML on Gpx2KML.com

3) Load your resulting KML in to Google Earth and hey presto…

Google Earth - Woking

I then exported my data with the longitude & latitude points in it from my Google Fusion table as a KML file and then imported it in to Google Earth and all the little points appear on a layer over my district boundary.

Woking Schools - Google Earth

Hooray! Fortunately, there were only four points within the boundary – I need to think about what would happen if we were talking hundreds or thousands – how would I manage that? Is there a way to filter in Google Earth to say give me all the X in boundary Y? I’m not sure but I’ll be darn sure to find out!

* Trying to maintain a bit of anonymity here!