Category Archives: UK Map Viz

Visualising data on Cambridgeshire council wards with Carto

“Data is the new oil…” is a phrase often used by those in the industry to describe the vast,  valuable and extremely powerful resource available to us. We generate an astronomical amount of data, in fact 90% of the data in the world today has been created in the last two years alone according to IBM. However, there is a real skills gap when it comes to working with the data – people often perceive it as being too complicated or intimidating to work with, while those who can extract it, process it and refine it will reap the rewards.

Data is readily available if you know where to look for it. The UK Government’s data.gov.uk site opens up datasets on the environment, health and education and often these datasets come cleansed without the need to process it too much (though I would always advise the cleansing and scrutiny of data regardless of source!). As a Cambridgeshire resident, I am very lucky to have access to Cambridgeshire Insight Open Data, which is a repository of open data relating to local matters and is managed by Cambridgeshire County Council and run by a local partnership. I therefore wanted to have a go at playing around with the data and utilising the numerous free tools available to explore and visualise it with the hope of making it more accessible – let’s face it, there’s more to life than staring at a spreadsheet!

My first project will be looking at crime figures in the county and projecting them on to a map, by council ward, using an online tool called Carto. The data will be sourced from the Cambridgeshire Crime Rates and Counts 2015/16 data set on the Cambridgeshire Insight Open Data site and I will be focusing on the data related to the rate of crimes per 1,000 population at ward and district level. The data I’m using can be found in column “CU” in the Excel sheet.

The benefit of looking at the rate instead of raw numbers is that it means the stats aren’t skewed by heavily populated areas e.g. I would expect the number of crimes in heavily populated areas to be higher than those in remote rural communities, but as the rate takes in to account the population within the area, it provides a more accurate measure.

The mapping processes behind Carto aren’t sophisticated enough to recognise a council ward name in data (yet), but it does allow you to overlay KML data meaning that the wards can be “drawn” on to the map in a new layer. Where does this KML data come from? Surely we don’t have to create it ourselves? Heh, don’t be silly, someone has already done all of the hard work for us (phew!) – thank you to Alasdair Rae, a Lecturer in the Department of Town and Regional Planning at the University of Sheffield who created it for an article in the Guardian and a spreadsheet containing all of the KML shapes of all council wards in England can be downloaded via a Google Fusion table.

I then matched up the wards, by name, between the two sheets and pulled out the KML code for each one and the resulting sheet can be found here. From here, everything can be done from within Carto so create yourself a free account and prepare to be dazzled!

There are plenty of Carto tutorials but there are essentially two sides to the tool: the dataset side (where the data can be uploaded) and the mapping side (where the maps connect to your existing datasets). The great thing about Carto is that your maps can be embedded in your own website so hey presto, here’s the resulting map…

Total crime rate per 1,000 people by Cambridgeshire ward (2015/2016)

Source: Cambridgeshire Insight Open Data :: Full page view of the map

I hope to make this a regular series of small tutorials about data visualisation and the use of open data so check back in a few months to see what else I have been up to!

Disclaimer: I am employed by Cambridgeshire County Council, though I have no links with the Insight Open Data team.

Quick links

Mapping Country Links in Tableau

Map I wanted!  

I’ve been looking to do this sort of visualisation with map data for a while; to show links between countries and to have the chunkiness of the lines correspond to particular amounts. I am still very, very new to Tableau but thought their tutorials may be a great place to start. The Advanced Mapping Techniques tutorial had exactly what I wanted, though bizarrely their training workbooks below the video don’t contain that specific example. However, by scrolling through the video and working it out myself – thank GOODNESS there was a shot of the data behind the scenes around the 1 minute mark – I managed to put together my own version, and it works! Woo hoo!

So, how did I get there? Here’s how to do it step-by-step:

1. Create a spreadsheet with the following data (you can copy & paste mine from below).

IMPORTANT – each line is a one-way path but in order for it work, there must also be a return path. For example, the data in the first row goes from the UK to US, but there also needs to be a return line back from the US to the UK (row 4). If you don’t do this, all the points in the first three rows will show in the UK unless they have return points – which is a bit pointless (fnarr!)

Country Path Total
UK UK-US 200
UK UK-Brazil 450
UK UK-China 700
US UK-US 200
Brazil UK-Brazil 450
China UK-China 700

2. Create your workbook in Tableau and import your data (I connect live so I can tweak my data behind the scenes).

3. Tableau will automatically generate the Longitude and Latitude for you, but make sure you spell the country names correctly! Then set-up the following dimensions:

Columns: Longitude
Rows: Latitude
Color: Path
Size: SUM(Total)

Here’s a screenshot to help! Oh and make sure it’s set to a “Line” graph.

Tableau setup

Hooray you now have your map set-up!

You can now do funky things like creating (sort of) arrows instead of fat lines by altering the data like this…

Country Path Total
UK UK-US 200
UK UK-Brazil 450
UK UK-China 700
US UK-US
Brazil UK-Brazil
China UK-China

This means the line width starts from nothing at the point of origin and grows as it reaches its destination. Your map should now look like this:

Map pointy lines

See, that was fairly painless, though I still want to continue having a play around with it, I’m sure there is a lot more you can do. I’d like to find a definitive list of country names that can be used in Tableau, can initials work too I wonder? Who knows, that’s for another time…

Using a UK postcode to find Lat and Long and other useful information

So I have a list of UK postcodes and I want to find out the latitude and longitude of that postcode to plot it on to a map. There are numerous ways to do this, including:

1. Visiting Google Maps

2. Entering the postcode in to the search box

3. Right-clicking on the little point on the map and selecting “What’s here?”

4. Copying and pasting the lat and long that magically appears in the search box.

That’s nice, quick and easy BUT what if I’ve got hundreds or thousands of postcodes to search? I can’t spend the whole sunny weekend doing the same procedure can I?

No.

I’m self-learning Tableau at the moment and it seems to have great location support…if you provide it with US addresses (gnrrr), but I wanted to find a way of plotting a series of points on a map of the UK derived using their addresses. A bit of Google searching led me to UK Postcodes (set up by Stuart Harrison and based on Ordnance Survey OpenData) , a site that lets you enter a UK postcode and returns lots of information about that location (e.g. long, lat, county and right down to district and ward) and what made me excited was that the site had an API allowing you to pass it a postcode via URL (e.g. http://www.uk-postcodes.com/postcode/NW18TQ.json) and it would output the meta data it in either XML, CSV, JSON or RDF for you. PERFECT!

After a further read around the site, I found that Edd Robinson had created a library acting as a wrapper for the API which I could import in to my own Python project. And so, without further ado here is my Python code:

[code language=”Python”]
from postcodes import PostCoder
import csv

f = open(‘Extracted_Data_from_Postcodes.csv’, ‘w’)
i = 0
pc = PostCoder()

loc_lat = ""
loc_long = ""

with open(‘Postcodes_to_Extract.csv’, ‘rU’) as g:
reader = csv.reader(g)

for row in reader:
#Col 0 = ID :: Col 1 = Postcode
result = pc.get(str(row[1]))

if str(result) != "None":
location = result[‘geo’]

for key, value in location.items():
if (key == "lat"):
loc_lat = value
if (key == "lng"):
loc_long = value

#ID, Postcode, Lat, Long
write_to_file = str(row[0]) + ", " + str(row[1]) + ", " + str(loc_lat) + ", " + str(loc_long) + "n"

#Add the iteration count to output to screen to see how far we are up to.
print str(i + 1) + ", " + write_to_file
f.write(write_to_file)
else:
#If there is a problem translating the postcode, output "BROKEN" to file to manually check.
write_to_file = "BROKEN, " + str(result) + ", " + str(row[0]) + ", " + str(row[1]) + "n"

print str(i + 1) + ", BROKEN, " + str(row[0]) + ", " + str(row[1])
f.write(write_to_file)
i = i + 1

f.close()

[/code]

My input file looked like this:

[code]
109484, SG5 4PF
109486, MK44 2DB
109487, LU4 9UJ
109488, LU6 1RE
109489, MK43 8DY
109490, MK45 5JH
109491, MK44 3QD
109492, MK45 3BX
109493, MK17 9QL
109494, MK43 9JT
[/code]

And my screen output looked like this:

[code]
1, BROKEN, 109484, SG5 4PF
2, 109486, MK44 2DB, 52.214741, -0.461977
3, 109487, LU4 9UJ, 51.927696, -0.500824
4, 109488, LU6 1RE, 51.879322, -0.563452
5, 109489, MK43 8DY, 52.164539, -0.623209
6, 109490, MK45 5JH, 51.982376, -0.495111
7, 109491, MK44 3QD, 52.137440, -0.377085
8, 109492, MK45 3BX, 52.080341, -0.446214
9, 109493, MK17 9QL, 51.989906, -0.619803
10, 109494, MK43 9JT, 52.095955, -0.528752
[/code]

Afterwards, I was then able to manually check out why SG5 4PF failed, actually, I’m not sure why it failed but I was able to correct the lat & long via a Google Map search. Exactly what I needed with minimal effort and a chance to flex my muscles with Python again. I then imported the CSV in to Tableau, but that’s the subject of a future blog post…

Displaying English districts in Google Earth

The other day at work, I was asked to find out how many X* were in a particular district in England. Given we seem to have SO MANY ways that we slice & dice the country up with Ceremonial Counties, Parliamentary Constituencies, Local Government Districts (which can be boroughs, cities or Royal Boroughs) you can see why I started to have a small panic! But then I remembered you could search for districts in Google and it would outline them with a little red boundary line, so I gave it go.

“Woking District”

Woking Outline

Ta dahhhh!

Great…except I then wanted to overlay the data I already had mapped in a Google Fusion table with longitudes & latitudes. FAIL. I could find no way of either exporting this boundary data, or overlaying the longitude & latitude points I had setup in a fusion table. So I Googled around the subject and found this wonderful post on Stack Overflow – so I needed to head on over the OpenStreetMap, download the boundary data for my district, additionally convert the gpx file to a KML and then whack it in to Google Earth…so I did that!

1) Find the district you want in OpenStreetMap

2)  Download the gpx file and then convert it to a KML on Gpx2KML.com

3) Load your resulting KML in to Google Earth and hey presto…

Google Earth - Woking

I then exported my data with the longitude & latitude points in it from my Google Fusion table as a KML file and then imported it in to Google Earth and all the little points appear on a layer over my district boundary.

Woking Schools - Google Earth

Hooray! Fortunately, there were only four points within the boundary – I need to think about what would happen if we were talking hundreds or thousands – how would I manage that? Is there a way to filter in Google Earth to say give me all the X in boundary Y? I’m not sure but I’ll be darn sure to find out!

* Trying to maintain a bit of anonymity here!

Using Python & BeautifulSoup to scrape a Wikipedia table

Well, it was only a couple of weeks ago that I set myself a challenge to complete the Python course on Codecademy and I did it – I completed the Python track and it was fantastic! I was given the opportunity to put my newly found Python skills in to action this week as I needed to scrape some data from a Wikipedia page – I have a table of addresses and need to compare the County in the list that has been provided to the one that it really should be. This page on Wikipedia contains the data I need, for each Postcode district there’s a Postal County and I could use this data as a comparison – formatted in an HTML table like this:

WikiCapture

Normally, I’d just copy & paste the table in to Excel for use later on BUT it’s not as easy as that (oh no!), as there can be are multiple Postcode Districts within a row which is slightly annoying! To be of any use to me, I need the data to be formatted so that there is a row for each Postcode District like so (I don’t necessarily need the Postcode Area & Town but I’ll keep them anyway – I don’t like throwing away data!):

Postcode Area Postcode District Post Town Former Postal County
AB AB10 ABERDEEN Aberdeenshire
AB AB11 ABERDEEN Aberdeenshire
AB AB13 ABERDEEN Aberdeenshire
AB AB15 ABERDEEN Aberdeenshire

And so I thought this would be the perfect project for me to undertake in Python and to familiarise myself with friend-of-the-screen-scrapers, BeautifulSoup. I won’t jabber on too much about BeautifulSoup as I’m not fully up to speed on it myself yet, but from reading around the subject I gather it’s a great way to grab elements from web pages for further processing.

Step One: Wikipedia doesn’t like you…

Wikipedia doesn’t like this code:

[code language=”Python” highlight=”7″]
from bs4 import BeautifulSoup
import urllib2
wiki = “http://en.wikipedia.org/wiki/List_of_postcode_districts_in_the_United_Kingdom”
page = urllib2.urlopen(wiki)
soup = BeautifulSoup(page)
print soup
#urllib2.HTTPError: HTTP Error 403: Forbidden

[/code]

Wikipedia only allows access to recognised user agents in order to stop bots retrieving bulk content. I am not a bot, I just want to practise my Python and so to get around this you just need some additional code to the header (thanks to Stack Overflow for coming to the rescue).

Step Two: Hunt the table

If you look at the code behind the Wikipedia article, you’ll see that there are multiple tables but only one (thankfully the one we want) uses the “wikitable sortable” class – this is great as we can use BeautifulSoup to find  the table with the “wikitable sortable” class and know that we will only get this table.

[code language=”Python”]
from bs4 import BeautifulSoup
import urllib2
wiki = “http://en.wikipedia.org/wiki/List_of_postcode_districts_in_the_United_Kingdom”
header = {‘User-Agent’: ‘Mozilla/5.0’} #Needed to prevent 403 error on Wikipedia
req = urllib2.Request(wiki,headers=header)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)

area = “”
district = “”
town = “”
county = “”
table = soup.find(“table”, { “class” : “wikitable sortable” })
print table
[/code]

Output looks like this:

TableOutput

Great! This means that we just have the HTML table stored in our variable. Now, it’s just a case of iterating through the rows and columns…easy…*ahem*

Step Three: For your iteration pleasure

We need to do the iteration in two stages – the first stage is to iterate through each row (tr element) and then assign each element in the tr to a variable. At this stage, we will grab everything in the Postcode Districts column and store it in a list for further iteration later. To do this, I used the following code:

[code language=”Python” firstline=”19″]
for row in table.findAll(“tr”):
cells = row.findAll(“td”)
#For each “tr”, assign each “td” to a variable.
if len(cells) == 4:
area = cells[0].find(text=True)
district = cells[1].findAll(text=True)
town = cells[2].find(text=True)
county = cells[3].find(text=True)
[/code]

The .findAll function in Python returns a list and so on line 20, we obtain a list containing four elements, one for each of the columns in the table. This means they can be accessed via the cells[n].find(text=True) syntax. You’ll notice that I’ve used .findAll for the Postal Districts column, this is because I want a list of the items within the cell for iteration purposes later!

After this code executes, I have a value for the area, a list of districts, a town and a county. Now for the second part of my iteration:

[code language=”Python” firstline=”28″]
#district can be a list of lists, so we want to iterate through the top level lists first…
for x in range(len(district)):
#For each list, split the string
postcode_list = district[x].split(“,”)
#For each item in the split list…
for i in range(len(postcode_list)):
#Check it’s a postcode and not other text
if (len(postcode_list[i]) > 2) and (len(postcode_list[i]) <= 5):
#Strip out the “n” that seems to be at the start of some postcodes
write_to_file = area + “,” + postcode_list[i].lstrip(‘n’).strip() + “,” + town + “,” + county + “n”
print write_to_file
[/code]

I found that, instead of district being a standard list of postcodes, in some cases it was a list of lists (oh joy!). I was expecting it to looks like this:

[u’AB10, AB11, AB12, AB15, AB16, nAB21, AB22, AB23, AB24, AB25, nAB99, non-geo’] *

*Ignore the n signs and non-geo text – we’ll deal with them later!

I got this…

[u’AB10, AB11, AB12, AB15, AB16,‘, u’nAB21, AB22, AB23, AB24, AB25,‘, u’nAB99‘, u’non-geo‘]

And so I needed an additional layer of iteration: one for the whole list and then another for the items in the individual lists. Simple.

For each item in the list, the .split(",") function in Python allowed me to split out the comma separated list of postcodes in to a list that could be iterated over. For each item in that list, we just check to see if it’s a postcode (a check on string length sufficed nicely this time!) and then build up our output string. To deal with the n that was appended to some of the postcodes, I just left-stripped the string to remove the n characters and hey presto it worked!

I flushed the output to a CSV file as well as to the screen and it worked beautifully!

Here is the full code:

[code language=”Python”]
from bs4 import BeautifulSoup
import urllib2

wiki = “http://en.wikipedia.org/wiki/List_of_postcode_districts_in_the_United_Kingdom”
header = {‘User-Agent’: ‘Mozilla/5.0’} #Needed to prevent 403 error on Wikipedia
req = urllib2.Request(wiki,headers=header)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)

area = “”
district = “”
town = “”
county = “”

table = soup.find(“table”, { “class” : “wikitable sortable” })

f = open(‘output.csv’, ‘w’)

for row in table.findAll(“tr”):
cells = row.findAll(“td”)
#For each “tr”, assign each “td” to a variable.
if len(cells) == 4:
area = cells[0].find(text=True)
district = cells[1].findAll(text=True)
town = cells[2].find(text=True)
county = cells[3].find(text=True)

#district can be a list of lists, so we want to iterate through the top level lists first…
for x in range(len(district)):
#For each list, split the string
postcode_list = district[x].split(“,”)
#For each item in the split list…
for i in range(len(postcode_list)):
#Check it’s a postcode and not other text
if (len(postcode_list[i]) > 2) and (len(postcode_list[i]) <= 5):
#Strip out the “n” that seems to be at the start of some postcodes
write_to_file = area + “,” + postcode_list[i].lstrip(‘n’).strip() + “,” + town + “,” + county + “n”
print write_to_file
f.write(write_to_file)

f.close()
[/code]

Disclaimer(ish)

This code has no additional error checking or handling and was merely written to solve a small problem I had, and to put in to practise everything I’d learned so far. It does also just work for this particular table on the Wikipedia page – although it could be adapted for use on other tables. But, it was great fun to put the learning in to action and work on a real-life problem to solve. Here’s to more exercises like this!