Category Archives: Data Visualisation

Visualising data on Cambridgeshire council wards with Carto

“Data is the new oil…” is a phrase often used by those in the industry to describe the vast,  valuable and extremely powerful resource available to us. We generate an astronomical amount of data, in fact 90% of the data in the world today has been created in the last two years alone according to IBM. However, there is a real skills gap when it comes to working with the data – people often perceive it as being too complicated or intimidating to work with, while those who can extract it, process it and refine it will reap the rewards.

Data is readily available if you know where to look for it. The UK Government’s site opens up datasets on the environment, health and education and often these datasets come cleansed without the need to process it too much (though I would always advise the cleansing and scrutiny of data regardless of source!). As a Cambridgeshire resident, I am very lucky to have access to Cambridgeshire Insight Open Data, which is a repository of open data relating to local matters and is managed by Cambridgeshire County Council and run by a local partnership. I therefore wanted to have a go at playing around with the data and utilising the numerous free tools available to explore and visualise it with the hope of making it more accessible – let’s face it, there’s more to life than staring at a spreadsheet!

My first project will be looking at crime figures in the county and projecting them on to a map, by council ward, using an online tool called Carto. The data will be sourced from the Cambridgeshire Crime Rates and Counts 2015/16 data set on the Cambridgeshire Insight Open Data site and I will be focusing on the data related to the rate of crimes per 1,000 population at ward and district level. The data I’m using can be found in column “CU” in the Excel sheet.

The benefit of looking at the rate instead of raw numbers is that it means the stats aren’t skewed by heavily populated areas e.g. I would expect the number of crimes in heavily populated areas to be higher than those in remote rural communities, but as the rate takes in to account the population within the area, it provides a more accurate measure.

The mapping processes behind Carto aren’t sophisticated enough to recognise a council ward name in data (yet), but it does allow you to overlay KML data meaning that the wards can be “drawn” on to the map in a new layer. Where does this KML data come from? Surely we don’t have to create it ourselves? Heh, don’t be silly, someone has already done all of the hard work for us (phew!) – thank you to Alasdair Rae, a Lecturer in the Department of Town and Regional Planning at the University of Sheffield who created it for an article in the Guardian and a spreadsheet containing all of the KML shapes of all council wards in England can be downloaded via a Google Fusion table.

I then matched up the wards, by name, between the two sheets and pulled out the KML code for each one and the resulting sheet can be found here. From here, everything can be done from within Carto so create yourself a free account and prepare to be dazzled!

There are plenty of Carto tutorials but there are essentially two sides to the tool: the dataset side (where the data can be uploaded) and the mapping side (where the maps connect to your existing datasets). The great thing about Carto is that your maps can be embedded in your own website so hey presto, here’s the resulting map…

Total crime rate per 1,000 people by Cambridgeshire ward (2015/2016)

Source: Cambridgeshire Insight Open Data :: Full page view of the map

I hope to make this a regular series of small tutorials about data visualisation and the use of open data so check back in a few months to see what else I have been up to!

Disclaimer: I am employed by Cambridgeshire County Council, though I have no links with the Insight Open Data team.

Quick links

Mapping Country Links in Tableau

Map I wanted!  

I’ve been looking to do this sort of visualisation with map data for a while; to show links between countries and to have the chunkiness of the lines correspond to particular amounts. I am still very, very new to Tableau but thought their tutorials may be a great place to start. The Advanced Mapping Techniques tutorial had exactly what I wanted, though bizarrely their training workbooks below the video don’t contain that specific example. However, by scrolling through the video and working it out myself – thank GOODNESS there was a shot of the data behind the scenes around the 1 minute mark – I managed to put together my own version, and it works! Woo hoo!

So, how did I get there? Here’s how to do it step-by-step:

1. Create a spreadsheet with the following data (you can copy & paste mine from below).

IMPORTANT – each line is a one-way path but in order for it work, there must also be a return path. For example, the data in the first row goes from the UK to US, but there also needs to be a return line back from the US to the UK (row 4). If you don’t do this, all the points in the first three rows will show in the UK unless they have return points – which is a bit pointless (fnarr!)

Country Path Total
UK UK-US 200
UK UK-Brazil 450
UK UK-China 700
US UK-US 200
Brazil UK-Brazil 450
China UK-China 700

2. Create your workbook in Tableau and import your data (I connect live so I can tweak my data behind the scenes).

3. Tableau will automatically generate the Longitude and Latitude for you, but make sure you spell the country names correctly! Then set-up the following dimensions:

Columns: Longitude
Rows: Latitude
Color: Path
Size: SUM(Total)

Here’s a screenshot to help! Oh and make sure it’s set to a “Line” graph.

Tableau setup

Hooray you now have your map set-up!

You can now do funky things like creating (sort of) arrows instead of fat lines by altering the data like this…

Country Path Total
UK UK-US 200
UK UK-Brazil 450
UK UK-China 700
Brazil UK-Brazil
China UK-China

This means the line width starts from nothing at the point of origin and grows as it reaches its destination. Your map should now look like this:

Map pointy lines

See, that was fairly painless, though I still want to continue having a play around with it, I’m sure there is a lot more you can do. I’d like to find a definitive list of country names that can be used in Tableau, can initials work too I wonder? Who knows, that’s for another time…

GCSE Results

There has been much chatter this week about GCSE results, which not only serves as a means by which to make me feel old (16 years ago people, 16 YEARS!) but it has also been a fabulous opportunity to dive in to the data. News outlets predictably report on the performance gap between girls & boys and the increase/decrease in the uptake of particular subjects…some things never change! I personally wanted to get my hands on the data so that I could whack it in to Tableau and play with it myself (thus also learning how to use Tableau too – double bonus!)

I obtained all of my data from a comprehensive Guardian article and wrangled it a bit get it in to the format that I needed (BIG shout out for the fabulous Tableau add-on for reshaping data in Excel!)

NOTE: As this blog is hosted by WordPress, I can’t embed the graphics due to Javascript restrictions. So if you want to see the graphics in all of their beauty and filter by different criteria, please click the link below each graphic).

GCSE Subject Breakdown

Subjects taken by all candidates in 2013 – interactive graphic allows you to filter by gender and year.

GCSE Subject Treemap

Link to interactive graphic

Percentage of grades by gender

Split by gender and grade – you can see that girls’ entries achieve higher grades than boys’. As widely reported, 8.3% of entries from girls achieved an A* compared to 5.3% of entries from boys. The interactive graphic allows you to filter by year.

Grades for entries by gender

Link to interactive graphic

Change in the percentage of entries achieving each grade

This graphic demonstrates that the percentage of entries achieving top grades has fallen between 2012 and 2013. The interactive graphic allows you to filter by gender.

Change in grades for entries

Link to interactive graphic

Results per subject

For each subject, we have a breakdown of the percentage of entries achieving each grade – for example 16.6% of Chemistry entries achieved an A* compared to 9.8% of French entries. The interactive graphic allows you to filter by gender.

Results per subject

Link to interactive graphic

This was a fantastic opportunity to for me to brush up on my Tableau skills and play with interesting and insightful data. Happy to hear comments on how I can improve my work or further areas I can explore.

Daily Mail pie chart fail, a crime against data visualisation

I have a love/hate relationship with pie charts – when they are used well, they are a brilliant way of showing proportions (x is bigger than y, which is bigger than z) and seeing where a particular slice fits in as part of the whole (mmm pie). I’m certainly not the first person to wax lyrical about pie charts, I’m aware that Nathan Yau has demonstrated a good use of pie charts and I totally agree with him.

However, as I was browsing the web this morning, a story caught my eye in The Mail (don’t judge me) about Britain’s crime hotspots and how Stratford in East London is awarded the country’s worst crime hotspot (I was a Games Maker at the 2012 Olympics and Stratford is very much of interest to me). And then I saw THIS MONSTROSITY…

Daily Mail Pie Fail

Daily Mail Pie Fail
Source: Daily Mail

WHOAAAHHHHH THERE – I’ll give you a couple of minutes to digest that beauty.

This is a perfect example of why I also hate pie charts…I think it should be reported to the data viz police.

Here’s a few of the problems:

  • It contains far too many slices – it’s information overload and it’s really hard to compare categories, it’s just a sea of labels.
  • It’s 3D which really isn’t the best way to project proportions as they are liable to misinterpretation; I can’t really say it any better than Drew Skau on the wonderful blog.
  • It’s not in any order, again making it harder to read (I personally prefer a pie chart in descending order with highest proportion first).

In fact, I think it breaks every rule in this Eager Eyes piece.

So I thought I’d see if I could improve it by turning it in to a bar chart instead…

Bar Chart - click to enlarge

Gosh, that looks much better! It’s in descending order and so is easier to compare categories and to see the categories containing the most/least offences. It’s probably not the prettiest chart but it’s what I could muster up in Excel in 10 mins.

It’s really interesting data and it’s definitely a set I’d like to explore more – for example what would fall under “Other theft”? I’d also be interested to compare these stats to a time when an average 130,000 people weren’t visiting the area every day for four weeks (the amount of people visiting for the Olympics & Paralympics have to skew the figures right?). But that’s for another time. My goal was to make the Mail’s pie chart easier to read and to allow proper dissemination of the data and I think my solution is definitely a step in the right direction!

Using a UK postcode to find Lat and Long and other useful information

So I have a list of UK postcodes and I want to find out the latitude and longitude of that postcode to plot it on to a map. There are numerous ways to do this, including:

1. Visiting Google Maps

2. Entering the postcode in to the search box

3. Right-clicking on the little point on the map and selecting “What’s here?”

4. Copying and pasting the lat and long that magically appears in the search box.

That’s nice, quick and easy BUT what if I’ve got hundreds or thousands of postcodes to search? I can’t spend the whole sunny weekend doing the same procedure can I?


I’m self-learning Tableau at the moment and it seems to have great location support…if you provide it with US addresses (gnrrr), but I wanted to find a way of plotting a series of points on a map of the UK derived using their addresses. A bit of Google searching led me to UK Postcodes (set up by Stuart Harrison and based on Ordnance Survey OpenData) , a site that lets you enter a UK postcode and returns lots of information about that location (e.g. long, lat, county and right down to district and ward) and what made me excited was that the site had an API allowing you to pass it a postcode via URL (e.g. and it would output the meta data it in either XML, CSV, JSON or RDF for you. PERFECT!

After a further read around the site, I found that Edd Robinson had created a library acting as a wrapper for the API which I could import in to my own Python project. And so, without further ado here is my Python code:

[code language=”Python”]
from postcodes import PostCoder
import csv

f = open(‘Extracted_Data_from_Postcodes.csv’, ‘w’)
i = 0
pc = PostCoder()

loc_lat = ""
loc_long = ""

with open(‘Postcodes_to_Extract.csv’, ‘rU’) as g:
reader = csv.reader(g)

for row in reader:
#Col 0 = ID :: Col 1 = Postcode
result = pc.get(str(row[1]))

if str(result) != "None":
location = result[‘geo’]

for key, value in location.items():
if (key == "lat"):
loc_lat = value
if (key == "lng"):
loc_long = value

#ID, Postcode, Lat, Long
write_to_file = str(row[0]) + ", " + str(row[1]) + ", " + str(loc_lat) + ", " + str(loc_long) + "n"

#Add the iteration count to output to screen to see how far we are up to.
print str(i + 1) + ", " + write_to_file
#If there is a problem translating the postcode, output "BROKEN" to file to manually check.
write_to_file = "BROKEN, " + str(result) + ", " + str(row[0]) + ", " + str(row[1]) + "n"

print str(i + 1) + ", BROKEN, " + str(row[0]) + ", " + str(row[1])
i = i + 1



My input file looked like this:

109484, SG5 4PF
109486, MK44 2DB
109487, LU4 9UJ
109488, LU6 1RE
109489, MK43 8DY
109490, MK45 5JH
109491, MK44 3QD
109492, MK45 3BX
109493, MK17 9QL
109494, MK43 9JT

And my screen output looked like this:

1, BROKEN, 109484, SG5 4PF
2, 109486, MK44 2DB, 52.214741, -0.461977
3, 109487, LU4 9UJ, 51.927696, -0.500824
4, 109488, LU6 1RE, 51.879322, -0.563452
5, 109489, MK43 8DY, 52.164539, -0.623209
6, 109490, MK45 5JH, 51.982376, -0.495111
7, 109491, MK44 3QD, 52.137440, -0.377085
8, 109492, MK45 3BX, 52.080341, -0.446214
9, 109493, MK17 9QL, 51.989906, -0.619803
10, 109494, MK43 9JT, 52.095955, -0.528752

Afterwards, I was then able to manually check out why SG5 4PF failed, actually, I’m not sure why it failed but I was able to correct the lat & long via a Google Map search. Exactly what I needed with minimal effort and a chance to flex my muscles with Python again. I then imported the CSV in to Tableau, but that’s the subject of a future blog post…

Displaying English districts in Google Earth

The other day at work, I was asked to find out how many X* were in a particular district in England. Given we seem to have SO MANY ways that we slice & dice the country up with Ceremonial Counties, Parliamentary Constituencies, Local Government Districts (which can be boroughs, cities or Royal Boroughs) you can see why I started to have a small panic! But then I remembered you could search for districts in Google and it would outline them with a little red boundary line, so I gave it go.

“Woking District”

Woking Outline

Ta dahhhh!

Great…except I then wanted to overlay the data I already had mapped in a Google Fusion table with longitudes & latitudes. FAIL. I could find no way of either exporting this boundary data, or overlaying the longitude & latitude points I had setup in a fusion table. So I Googled around the subject and found this wonderful post on Stack Overflow – so I needed to head on over the OpenStreetMap, download the boundary data for my district, additionally convert the gpx file to a KML and then whack it in to Google Earth…so I did that!

1) Find the district you want in OpenStreetMap

2)  Download the gpx file and then convert it to a KML on

3) Load your resulting KML in to Google Earth and hey presto…

Google Earth - Woking

I then exported my data with the longitude & latitude points in it from my Google Fusion table as a KML file and then imported it in to Google Earth and all the little points appear on a layer over my district boundary.

Woking Schools - Google Earth

Hooray! Fortunately, there were only four points within the boundary – I need to think about what would happen if we were talking hundreds or thousands – how would I manage that? Is there a way to filter in Google Earth to say give me all the X in boundary Y? I’m not sure but I’ll be darn sure to find out!

* Trying to maintain a bit of anonymity here!

Week 5 & 6 – A topic of our own

For the final week of the MOOC, we have been given the task of producing an infographic of our own – this means choosing a topic, gathering the information and presenting an idea to show the information in graphic form.

As my previous sketches have been for interactive infographics, I wanted to give a static graphic a go. Having so much freedom was pretty hard – there is a wealth of information and data out there, but choosing which story to go for and what angle to take was going to be hard! It was lucky then that I got a tweet from the team behind the BBC iPlayer pointing me to the latest performance report and that is when inspiration struck.

The BBC produce these performance reports every month and I read them with interest – I am a stats geek and love stuff like this. The report gives stats such as the viewing figures for content on iPlayer, popular programmes, usage by device type and the gender/age group of users. It’s a wealth of information that I find fascinating. But I also love it because it’s about the iPlayer – something I use for at least two hours a day and have a certain affection for, it’s brilliant. For non-UK residents, the iPlayer is a service that the BBC officially launched at the end of 2007 and allows viewers/listeners of BBC TV programmes/radio shows to replay missed content and to watch shows live via the internet. The iPlayer is available on PCs, tablets, mobile phones, via Smart TVs and via cable operators. In essence, it’s brilliant.

I am fairly certain that the report released by the BBC is not aimed at the typical iPlayer user – it feels more for those in the media or for those who have a specific interest in audience figures and so my goal for the infographic was to produce something that everyone could appreciate. Luckily for me, October was a record month for iPlayer usage with 213 million requests for TV or radio content – breaking the 200 million request barrier for the first time and so I had a nice little slant for my infographic. It also meant that the story had been picked up the press too:

BBC iPlayer tops 200 million monthly requests for first time – Digital Spy

iPlayer passes 200 million monthly requests for the first time – Digital TV Europe

Merlin and Jimmy Savile documentary help BBC iPlayer to record month – The Telegraph

BBC enjoys record iPlayer requests in October –

…but no-one had produced an infographic, and so I felt it was my duty to produce one to celebrate!

My goals for the infographic were as followed;

  • Produce something for everyone – using the stats from the October performance report but make them easier to read and emphasise their relevance.
  • What were the most popular shows in October? Why did it break the 200 million request barrier in October and not, say, during the Olympics?
  • Who and what is using the iPlayer service? What proportion of requests are coming from tablets?
  • Make a static graphic that could serve as a template for every performance report so that non-industry readers could glean the key information easier on one page as opposed to trawling through the report.

And so with all of this in mind (and not a lot of time to complete the task – despite two weeks to work on it, December is a crazy busy time at work!), here is what I have come up with…

October 2012: A record iPlayer month for the BBC (PDF)

Notes about the graphic

  • This is a static graphic which uses the figures from the October 2012 iPlayer Performance report but could be used as a template for other monthly reports.
  • I extracted the information that I thought would be interesting such as iPlayer requests since 2009 (as far back as the report goes), the gender breakdown of users, the devices used to access the service and the popular TV and radio shows in October. I have also put a few stats in the blurb at the top.
  • The graphic style is largely similar to my last task with minimal use of colour –  I stuck to pink as that is the predominant colour in the iPlayer branding.
  • If I had more time, I would have liked to explore the peaks and troughs around the end of 2010 and beginning of 2011. Do peaks relate to the release of iPlayer apps on mobile and tablet devices for example?
  • This graphic could be made interactive and this is a project I would like to work on in the future – especially to see the variation in the share of the device types – so watch this space! 🙂
I am pretty happy with this graphic but feel there are plenty more angles to explore with this data – but this is good as it gives me something to tinker with over the Christmas holidays. Now, do you think I’ve been good enough for Santa to bring me a copy of Adobe Illustrator?

Week 4: Interactive graphic based on US unemployment stats

Our goal this week was to think about what kind of interactive graphic we could create based on the data used in the Guardian’s piece about unemployment in the US ->

There is a lot of data used behind the scenes of this graphic which is great but is also slightly frustrating. For example, if you click on a particular state, you get a wealth of additional information – but it doesn’t allow you to easily compare it to other states. The same goes for the drop-down at the top of the graphic – it’s great that you can view the unemployment rate at particular point in time, but it’s really hard to compare unless you are focussing on a particular state. I do however like the range of comparisons that have been made with the data, especially the ability to visualise the percentage point difference from the national figure – I shall have to remember that one in future 🙂

And so, I jotted down some thoughts about what I would like to see on an interactive graphic like this and came up with the following list;

  • The Guardian piece focuses on the unemployment rate in the US since Obama came to power…what about further back?
  • Is state level in-depth enough? What about within the state – how does the unemployment rate differ within the states themselves?
  • In the accompanying course material, we were told not to add more than 6 colours to a choropleth map (which makes total sense for comparison) but what about viewing a small list of those counties with the very lowest & highest unemployment rates that would normally be enclosed in ranges?
  • Based on feedback from last week’s assignment – I wanted to focus more on type, colour and “interactiveness” of the graphic – this is definitely where I need more practise.

And so with all of this in mind, I scribbled down possible graph/map/info ideas and arranged them on the table (see last week’s post for an idea of how it looked!) and I came up with this:

Unemployment Rates in the US (PDF)

Unemployment Rates in the US – with notes (PDF)

Notes about the graphic

  • The user is able to scroll back in time to see how the unemployment data differs on the map of the US. I added a line graph so that it was clear to see years when the unemployment rate was particularly high/low. I did think about adding an overlay to show the years that a new President came in to power – incidentally there does seem to be a trend of the unemployment rate dropping in the year this happens – but I did not progress along this line of investigation for this project. Maybe another time 🙂
  • The map at the top is interactive and allows the user to click on a particular county to see detailed information about it as well as the state in which is belongs. The small bar chart on the left would become active when a county is selected.
  • The user also has the ability to tick the boxes and add lines to the graph showing the county and state unemployment rates and compare them to the national figures.
  • I have taken on board comments from last week about colour, type and making it appear more interactive. It was VERY hard being so restrained with colour (I’m not used to this!) but I actually found working with Colorbrewer for the map colours gave me a base to start from and I didn’t stray from there.

I am really happy with this graphic and I didn’t rush as much as I did last week. I took my time, didn’t faff around with Illustrator too much and so had more time to concentrate on what I wanted to do and actually what I’d want to see on an interactive visualisation like this.

22 days left on my Illustrator trial…will I be adding it to my Christmas list (as well as Alberto Cairo’s book and Andy Kirk’s too)? YES!

Week 3: Sketch an interactive graphic

The goal for this week was to think about how an interactive graphic based on a particular report by Publish What You Fund, and also published in a Guardian blog, would look. The data in question relates to how transparent major donor organisations are with their own data and so each organisation has been rated using a distinct set of criteria created by Publish What You Fund, therefore producing an overall transparency index.

This assignment has really stretched me this week and made me take full advantage of the sketching/note-taking apps on my tablet as I found I was coming up with ideas in random places and needed to get them down for exploration.

My first task was to find out what the heck “transparency” actually meant and how it was actually measured and I was thankful that the data originated from a very well organised website. I then looked at both source websites and noted down what I thought was missing and how I would like to play with the data myself. This took about three or four days – and this is where a lot of sketching and brain storming came in; thinking of the “what ifs….” and “oooh how about I just change this…” scenarios.

I toyed with the data in Excel to see if I could find any interesting correlations such as splitting the data right down to individual indicators, looking at the annual resources and budget of each donor and in turn where the money goes but what I was really missing was information about the donor itself. I was very pleased to see the UK’s Department for International Development at the top of the list but in all honesty, I really knew nothing about them  and so I wanted to build that in to the graphic.

And so I started by jotting down potential graphs/data to include in my final interactive graphic and started arranging the sketches until I had something that I thought could work. Incidentally, I find jotting things down on paper like this so helpful as you invest very little time in it and it allows easy rearranging of elements – paper prototypes FTW!



From there, I installed the trial version of Illustrator CS6 and started playing around. To cut a long story short (it really was a long story as I battled with Illustrator’s graphs – I won in the end though!) I came up with the following design;

Aid Transparency Graphic (PDF)                  

Aid Transparency Graphic + Notes (PDF)

Notes about the graphic

  • The bar chart that can be seen at the top of the graphic can be manipulated by the buttons on the right hand side and the user can select to show the results of individual aid information levels or all of them (the total).
  • The user can also select to show particular countries instead of having everything on the graph which I found really hard to read in the Guardian blog.
  • If a user clicks on a donor’s name or the bar associated with that donor, the panel at the bottom will display additional information about the organisation. I added a space for some text about the organisation to add a bit of context and also a timeline to chart their major accomplishments so that users would be able to relate to an organisation’s particular focus. Both pieces of information could be scraped from donors’ websites and annual reports.
  • I have tried to minimise the use of the word “transparency” and instead used “openness” where possible as I personally wasn’t very clear about what this meant at first.

I am personally really pleased with this, as the work involved way more that playing around with a few graphs. I had to think about what I wanted to say, how I was going to represent it in a prototype form that would communicate how an interactive version of it would work. But I’m doing something that I love and time did indeed fly when I was tinkering all weekend!

Week 2: A critique of the “Convention Word Counts” visualisation in the NYT

Source material:

A comparison of how often speakers at the two presidential nominating conventions used different words and phrases, based on an analysis of transcripts from the Federal News Service.

Although I very much like the look of this graphic at first glance I feel it includes too much information and too many layers of abstraction and hides the beauty of what is quite a high impact piece.

The main graphic serves two purposes:

1. It acts like a word cloud and illustrates the frequency of words by resizing bubbles accordingly.

2. Shows how the usage of words is split between the two parties.

Therefore it presents the reader with the ability to see that both parties have used the words ‘Tax’, ‘Energy’ and ‘Families’ in equal measures, but Democrats have used the word ‘Health’ more than the Democrats, though they themselves have used the word ‘Leadership’ more. The reader is clearly able to see this by comparing the size of the bubbles and they are able to identify the split in usage between the two parties.

However, I do feel that it presents the reader with too much information. I don’t think it is necessary for the numbers to be present in the bubbles as they serve as a distraction – the blue/red split in the bubble itself should be enough to allow someone to see the proportion of the word’s usage. The numbers themselves are also per 25,000 words, which bombard the reader with unnecessary information. Is the average reader really interested to know that the word ‘Health’ is used 38 times per 25,000 words by the Democrats vs. 9 times per 25,000 words by the Republicans? I’d hazard a guess and say “no”, but I think they are more interested in seeing that the Democrats used it more than the Republicans overall. But I do think the numbers are interesting and so maybe they should only be displayed when a bubble has been clicked on.

I feel the text, which is placed below the bubble, describing the words, does not need to be present the whole time. It’s taking up room and in actual fact I didn’t even bother to read it when I was playing around. Also, while I love the ability to add your own words to the collection, it does allow you to add words (e.g. “UK”) that have no mentions on either side – I personally didn’t find this very interesting and found that the 0-words cluttered up the visualisation.

I think the visualisation would benefit from altering the shade of blue/red depending on where the bubble is located. If for example we take the word ‘Forward’ which is far over on the Democrats’ side, I think the blue should be a lot darker than for a word such as ‘Success’ , which is more prominently in the Republicans’ side. This would help to reinforce the fact that there are two extremes to the viz and a middle ground shared by both parties.

Another level of abstraction I would love to see would be the ability to see who said the word and how many times. This data is used below the bubbles but it’s not really used effectively. How fascinating would it be to see how many times Mitt Romney said Obama (and vise versa) without having to count it up yourself?!? Then if you clicked on a bar in this chart, it would take you down to the section of the person to show a breakdown of the paragraphs.

And finally, while it’s not a criticism of the visualisation itself, I am an avid follower of the Guardian Data Blog and am used to seeing a link where I can browse and download the raw data. I’m not sure what the NYT’s policy is about this, but I think the visualisation would benefit from a link at the bottom so that data geeks like us can tinker with the data ourselves.

And so here is my rough sketch to show how I would tidy it up with the main changes listed below:

BEFORE                                                                                      AFTER


  • I have removed the figures inside the bubbles that displayed the share per 25,000 words to give the graphic a cleaner finish.
  • I have removed the descriptive text that relates to particular bubbles.
  • I have added the percentage share of the word used by each party at the top.
  • I have added two graphs below the main graphic that will be displayed when a bubble has been clicked on. They will show who in each party has used the word and their proportional usage as a whole per party.
  • If you were to click on the person’s name or a particular bar on the graph, it would take you straight down to that person’s section in the blurb below.