Combining Demographic Data Points
This ‘how to’ guide will follow the process for creating our ‘Does poverty drive obesity?‘ showcase in which we observed a positive correlation between obesity and deprivation in Leeds. In it we will focus on:
- Extracting the data from our hosting platform
- Joining with our spatial data to create a map
- Creating the graphs
Extracting the data
Our data can be extracted in various ways depending on your use case. In this scenario, we’ll be pulling the desired data from our hosting platform via a simple SQL query. What’s not so simple is deciding which geographical areas to use as Obesity is recorded at the output area (OA) level and deprivation at the lower super output area (LSOA) level.
OAs fit within LSOAs and both levels are comparable as they hold a roughly consistent number of people. So which one should we use? Usually, you would want to use the most granular level possible as it allows a more detailed comparison. Below is a visualisation of how OAs fit within LSOAs. A full guide on OAs and LSOAs can be found here.
In this example, we are measuring deprivation by the Indexes of multiple deprivation (IMD) rank. As this is not an absolute number of people in poverty for the given area we can give each OA in an LSOA the same IMD rank and plot our map at OA level, removing the need to scale up to LSOA.
As we will be mapping the IMD onto OA polygons we will need to join IMD ranks onto OA codes. To allow scaling up or down geography levels we have a ref_spine table that contains all the geography levels you could possibly need. We need to join the IMD rank to the ref_spine by their LSOA column, then join the obesity data by the OA column. The resulting SQL query is written below.
WITH IMD_rank AS (SELECT B.oa_2011, A.imd_rank FROM doordastats_v2_snapshot.doordastats_v2_snapshot.england_deprivation_index A JOIN (SELECT DISTINCT oa_2011, lsoa_2011 FROM doordastats_v2_snapshot.doordastats_v2_snapshot.ref_spine WHERE local_authority_district_description = 'Leeds') B ON A.lsoa = B.lsoa_2011 ), Obesity AS (SELECT oa, ((female_18_to_24_obese + male_18_to_24_obese +female_25_to_29_obese +male_25_to_29_obese +female_30_to_34_obese +male_30_to_34_obese +female_35_to_39_obese +male_35_to_39_obese +female_40_to_44_obese +male_40_to_44_obese +female_45_to_49_obese +male_45_to_49_obese +female_50_to_54_obese +male_50_to_54_obese +female_55_to_59_obese +male_55_to_59_obese +female_60_to_64_obese +male_60_to_64_obese +female_65_to_69_obese +male_65_to_69_obese +female_70_to_74_obese +male_70_to_74_obese +female_75_to_79_obese +male_75_to_79_obese +female_80_to_84_obese +male_80_to_84_obese +female_85_plus_obese +male_85_plus_obese)/28)*100 AS Obesity_Propensity FROM insight_health_snapshot.insight_health_snapshot.obesity_propensity) SELECT Obs.oa, Obs.Obesity_Propensity, IMD.imd_rank FROM IMD_rank IMD JOIN Obesity Obs ON Obs.oa = IMD.oa_2011
There are a few points to note here. The third SELECT statement is used to dismiss any duplicate values. The local_authority_district is used to filter the results to just Leeds as it is usually closely related to the UK average for many socio-economic factors and the local authority is a good-sized area that includes urban and rural areas.
Obesity propensity is split by gender and age group. While this allows us to make a much more detailed analysis, for this article we just want the average propensity so we take an average. We also multiplied this by 100 to turn it into a percentage.
Here is a visual representation of the joins that took place.
To create the map we will also need OA polygon data which can be found in our spatial product. Our polygon data comes in WKT format to allow row-level filtering. We will be creating the map in Carto so we will need to convert this to a Shapefile. This can be done using online software or a python programme which we can provide on request.
Creating the map
Once we have extracted our demographic and polygon data it is time to upload them to your favourite mapping software, we will be using Carto. You can create a free account with Carto here.
Firstly, you will need to upload your two datasets then join the the_geom column in the Shapefile to the demographic data. In Carto, you do this via a SQL query. Once your data is joined you can create a dataset from the query and click ‘create map’.
You can then create 2 layers for obesity and deprivation to allow comparison between the two.
Top tip: When choosing the colouring for your map, remember that a low IMD rank means an area is more deprived.
Creating the graphs
Because our data is historical and can be easily joined, there were many graphs we could have created. We could have looked at obesity over time in the most impoverished areas or which areas of the UK are the most obese.
For this article, we wanted to compare the IMD rank in the most and least obese areas in Leeds. We found these by a simple SQL query then uploaded the data CSV into our favourite data visualisation software, Tableau. It is then a matter of filtering to the selected OAs and creating a dual-axis to show IMD rank and obesity propensity.
Having thousands of joined data points means we can easily compare obesity to other measures of poverty such as income and as we have many OAs to compare, a scatter plot is a good choice. Household income can be found in the same table as IMD rank so it is easy to extract and create the graph.
Top tip: You can get a statistical analysis of the correlation by describing the trendline.
Hopefully this tutorial helped you see how quick and easy it is to create fascinating insights from our data. If you have any questions or would like to arrange a demo please feel free to contact us and we’ll be happy to help.