How to make a pretty choropleth (coloured map) with python

Posted on May 10, 2020

A choropleth is a shaded or coloured map like the one below. This guide will show you how to create a choropleth in < 5 minutes. This particular map is coloured by the number of characters in each country’s government website domain name, e.g. gov.uk is 6 characters. Darker countries represent longer domains and lighter countries represent shorter domains.

alt text

Firstly, there are three python packages that will make our lives much easier. To install these dependencies, we’ll use pip.

pip install plotly pandas pycountry

Our input data for the map will be in CSV format. Each row will contain the country name and associated government website URL separated by a comma. Here’s a sample of what the data looks like:

Afghanistan,https://president.gov.af/
Albania,http://www.parlament.al/
...

We will write a function plot_choropleth that takes one input: the path to a CSV containing our map data. The first step in this function is to read the CSV into a dataframe using pandas and set the column names, in this case country and url.

import pandas as pd
import plotly.express as px
import pycountry

def plot_choropleth(csv_path):
	data = pd.read_csv(csv_path, names=['country', 'url'])

Next we will create three new columns and set placeholder values for these:

  • country_iso - the three-letter ISO country code
  • url_length - the length of the URL in characters
  • short_url - a shortened version of the url without http:// or any trailing /
	data['country_iso'] = 'XXX'
	data['length'] = 0
	data['short_domain'] = 'XXX'

Now we begin the meat of the function. We will iterate over each country/row in our data, match the country name to an ISO three letter country code, process the URL, determine the URL length and add this information back into the pandas dataframe.

	for index, row in data.iterrows():

		try:
			# Search for three-letter country code
			country_search = pycountry.countries.search_fuzzy(row['country'])
			data.at[index,'country_iso'] = country_search[0].alpha_3
		except:
			pass

		temp_url = row['url']
		if 'http://' in temp_url:
			temp_url = temp_url.replace('http://', '')
		if 'https://' in temp_url:
			temp_url = temp_url.replace('https://', '')
		if 'www.' in temp_url:
			temp_url = temp_url.replace('www.', '')
		if '/' in temp_url:
			temp_url = temp_url.replace('/', '')

		temp_length = len(temp_url)

		data.at[index,'length'] = temp_length
		data.at[index,'short_domain'] = temp_url
		 

There’s a few things going on here, so let’s break it down. We use pycountry's fuzzy search to match additional information to the country name we already have. We wrap this in a try/except as we cannot guarantee that the country names in our CSV will match with the country data in pycountry. We then take the top match (country_search is a ranked list of matches to the search) and take the three-letter ISO country code (.alpha_3).

		try:
			# Search for three-letter country code
			country_search = pycountry.countries.search_fuzzy(row['country'])
			# Choose the top match and set the country_iso for this row
			data.at[index,'country_iso'] = country_search[0].alpha_3
		except:
			pass

We then process the URL to remove http://, https://, www. and any trailing / before calculating the URL length.

		temp_url = row['url']
		if 'http://' in temp_url:
			temp_url = temp_url.replace('http://', '')
		if 'https://' in temp_url:
			temp_url = temp_url.replace('https://', '')
		if 'www.' in temp_url:
			temp_url = temp_url.replace('www.', '')
		if '/' in temp_url:
			temp_url = temp_url.replace('/', '')

		data.at[index,'length'] = len(temp_url)
		data.at[index,'short_domain'] = temp_url

Now we’ve processed our data and we should have a dataframe with containing all the data we need to create our choropleth. This is where plotly express makes our lives much easier. We can create our map using a single call to: plotly.express.choropleth.

	fig = px.choropleth(data, locations="country_iso", # use our three-letter ISO codes as the locations
						color="length", # color the map by length
						hover_name="country",
						hover_data=['length', 'short_domain'], # add the character length and short_url as hover information
						labels={'length': 'Characters'}, # change the display name for length to Characters
						color_continuous_scale=px.colors.sequential.Brwnyl) # We'll use a brown-yellow colour palette

Add a title and legend title:

	fig.update_layout(
    title_text='Number of characters in government website domain name',
    legend_title_text='Characters'
	)

	fig.show()

And we’re done! Below is how everything looks when it’s put together.

import pandas as pd
import plotly.express as px
import pycountry

def plot_choropleth(csv_path):
	data = pd.read_csv(csv_path, names=['country', 'url'])

	data['country_iso'] = 'XXX'
	data['length'] = 0
	data['short_domain'] = 'XXX'

	for index, row in data.iterrows():

		try:
			# Search for three-letter country code
			country_search = pycountry.countries.search_fuzzy(row['country'])
			data.at[index,'country_iso'] = country_search[0].alpha_3
		except:
			pass

		temp_url = row['url']
		if 'http://' in temp_url:
			temp_url = temp_url.replace('http://', '')
		if 'https://' in temp_url:
			temp_url = temp_url.replace('https://', '')
		if 'www.' in temp_url:
			temp_url = temp_url.replace('www.', '')
		if '/' in temp_url:
			temp_url = temp_url.replace('/', '')

		data.at[index,'length'] = len(temp_url)
		data.at[index,'short_domain'] = temp_url

	fig = px.choropleth(data, locations="country_iso", # use our three-letter ISO codes as the locations
					color="length", # color the map by length
					hover_name="country",
					hover_data=['length', 'short_domain'], # add the character length and short_url as hover information
					labels={'length': 'Characters'}, # change the display name for length to Characters
					color_continuous_scale=px.colors.sequential.Brwnyl) # We'll use a brown-yellow colour palette

	fig.update_layout(
    title_text='Number of characters in government website domain name',
    legend_title_text='Characters'
	)

	fig.show()

plot_choropleth('country-websites.csv')

You can find the code above (named plot.py) and the CSV data (country-websites.csv) at this gist link:

https://gist.github.com/arshamg/33fe4283d174d0b8a6202074f042024f