Definition of Open Data
Ultimate Definition of Open Data
There is a lot of confusion around what open data is, some lat/longs are, some aren’t, some addresses are, some aren’t. So what is the actual definition of open data and how does it differ from data being made available in the public domain?
Open data is often misconstrued as being anything in the public domain, however, there are a few very important factors you need to consider before leaping to this conclusion. As a rule of thumb open data is normally released by the public sector and does not include personal data. Public information, characterised as open data, is normally released to allow citizens to hold Government to account, as in the case of Spend Data. Commercial organisations are also encouraged to reuse data to drive innovation such as property transactions as released by Land Registry which is heavily used by online portals such as Zoopla and Rightmove.
This article will attempt to clarify what open data is, however as there isn’t any legal definition there will always be differences of opinion in what it is. and what it isn’t. Let’s start with the basics of what is commonly accepted as open data open, along with the type of data we are referring to.
What is the accepted definition of Open Data?
For our purposes, a summary of open data should be:
Data that can be freely used, re-used and redistributed by anyone – subject only, at most, to the requirement to attribute.
To clarify what we mean by this we need to go a little deeper, any Open data release should contain:
- Availability and Access: the data must be available in a machine-readable, open format (e.g. CSV) preferably by downloading over the internet.
- Re-use and Redistribution: the data must be provided under terms that permit re-use and redistribution, ideally in the UK open data will be released under an Open Government Licence
- Universal Participation: everyone must be able to use, re-use and redistribute – there should be no discrimination against fields of endeavour or against persons or groups. For example, ‘non-commercial’ restrictions that would prevent ‘commercial’ use, or restrictions of use for certain purposes (e.g. only in education), are not allowed.
If you’re wondering why it is so important to be clear about what open data means and why this definition is used, there’s a simple answer “commercial reuse and contaminated data”.
When is open data not open data?
You will come across many publishers who define data as open even when it isn’t, this would normally be defined as contaminated data, i.e. it contains third-party licences which the publisher does not own the licence to. This confusion is especially pertinent when it comes to data containing addresses and exact lat/long.
Historically each Local Authority is tasked with maintaining a database of taxable addresses and locations within their boundaries. Central government use this data to set business rates and council tax levels. However, the creation of a full address is shared between Royal Mail and the Local Authority. The Local Authority is responsible for providing the road name and number whilst Royal Mail provide the postcode. To add further confusion Ordnance Survey will provide the exact Lattitude and Longitude for the address.
Combined this means there are potentially three licence holders, the Local Authority, Royal Mail and Ordnance Survey. The Local Authority can, and regularly do, release data with associated addresses under the Open Government Licence, but this licence explicitly states that they cannot grant commercial reuse of data they do not own the licence to.
Exceptions to the rule
The exception to this rule is the public sector, they have access to special licencing allowing them to reuse the Royal Mail and Ordnance Survey data free of charge. If you are a commercial organisation you will need to be licenced by Royal Mail to reuse the addresses and Ordnance Survey if you want to use exact lat/long.
There are some exceptions to the above, Land Registry has agreed with Royal Mail that their property transaction data can be used when presenting information on Property prices, this means Zoopla and Rightmove amongst others do not need a licence to reuse the data even if they are gaining commercial benefit. Ordnance Survey is a government agency so they release a lot of data under Open Government licences to encourage reuse, you can find out more here one of these datasets includes postcode centroid so you could map an address to a postcode location instead of the more expensive exact lat/long.
In some cases such as Companies House addresses are manually entered when submitting data. In these cases, Royal Mail isn’t involved in the process as they are not the data provider, as such you would not need a licence to reuse this data. However, as the data isn’t validated there tend to be a lot more typos and some of the addresses may not exist.
In Summary
On the surface, Open Data released under an OGL gives the impression you can reuse it as you like, as we now know this isn’t always the case. The best rule of thumb is to check with the data publisher before you reuse data for commercial purposes.
- Government and Open Data for AI Start UpsAI and machine learning start-ups are some of the most exciting and interesting businesses out there today. They are curing diseases, trading trillions of pounds of stock, and making sure your home assistant can understand what you want when you say “OK Google”
- Stop Scraping Open Data!In this guest post by Daniel Cave, a web scraping insider, formerly of two of the worlds most prominent scripting platforms, import.io and Diffbot, we cover the topic of why you might scrape open data, and why these days it's not necessarily the best option for most companies.
- Insuretechs look to Doorda to construct better client and risk profiles
- Home
- BlogWhy I say old chap that is spiffing off his nut arse pear shaped plastered Jeffrey bodge barney some dodgy.!!