What is Open Data?

17 NOVEMBER 2023
« Back to Glossary Index
Ultimate Definition of Open Data

What is Open Data? There is a lot of confusion around what open data is, including what is Open Data in terms of lat/longs, as some are open data and some aren’t. So what is the actual definition of open data and how does it differ from data being made available in the public domain? Understanding what is Open Data is essential.

Open data is often misconstrued as being anything in the public domain, however, there are a few very important factors you need to consider before leaping to this conclusion. As a rule of thumb open data is normally released by the public sector and does not include personal data. Public information, characterised as open data, is normally released to allow citizens to hold Government to account, as in the case of Spend Data. Commercial organisations are also encouraged to reuse data to drive innovation such as property transactions as released by Land Registry which is heavily used by online portals such as Zoopla and Rightmove.

This article will attempt to clarify what open data is, however as there isn’t any legal definition there will always be differences of opinion in what it is. and what it isn’t. Let’s start with the basics of what is commonly accepted as open data open, along with the type of data we are referring to.

Furthermore, knowing what is Open Data can help inform businesses and individuals about the opportunities it presents.

What is the accepted definition of Open Data?

In summary, what is Open Data allows users to understand the scope of what can be accessed and reused. When discussing what is Open Data, it is important to highlight its accessibility.

Data that can be freely used, re-used and redistributed by anyone – subject only, at most, to the requirement to attribute.

To clarify what we mean by this we need to go a little deeper, any Open data release should contain:

  • Availability and Access: the data must be available in a machine-readable, open format (e.g. CSV) preferably by downloading over the internet.
  • Re-use and Redistribution: the data must be provided under terms that permit re-use and redistribution, ideally in the UK open data will be released under an Open Government Licence
  • Universal Participation: everyone must be able to use, re-use and redistribute – there should be no discrimination against fields of endeavour or against persons or groups. For example, ‘non-commercial’ restrictions that would prevent ‘commercial’ use, or restrictions of use for certain purposes (e.g. only in education), are not allowed.

If you’re wondering why it is so important to be clear about what open data means and why this definition is used, there’s a simple answer “commercial reuse and contaminated data”.

When is open data not open data?

The importance of recognising what is Open Data cannot be overstated. Understanding what is Open Data helps in discerning its applications across various sectors.

You will come across many publishers who define data as open even when it isn’t, this would normally be defined as contaminated data, i.e. it contains third-party licences which the publisher does not own the licence to. This confusion is especially pertinent when it comes to data containing addresses and exact lat/long.

In the context of public resources, what is Open Data signifies transparency and accountability.

Historically each Local Authority is tasked with maintaining a database of taxable addresses and locations within their boundaries. Central government use this data to set business rates and council tax levels. However, the creation of a full address is shared between Royal Mail and the Local Authority. The Local Authority is responsible for providing the road name and number whilst Royal Mail provide the postcode.  To add further confusion Ordnance Survey will provide the exact Lattitude and Longitude for the address.

Combined this means there are potentially three licence holders, the Local Authority, Royal Mail and Ordnance Survey. The Local Authority can, and regularly do, release data with associated addresses under the Open Government Licence, but this licence explicitly states that they cannot grant commercial reuse of data they do not own the licence to.

Exceptions to the rule

The exception to this rule is the public sector, they have access to special licencing allowing them to reuse the Royal Mail and Ordnance Survey data free of charge. If you are a commercial organisation you will need to be licenced by Royal Mail to reuse the addresses and Ordnance Survey if you want to use exact lat/long.

There are some exceptions to the above, Land Registry has agreed with Royal Mail that their property transaction data can be used when presenting information on Property prices, this means Zoopla and Rightmove amongst others do not need a licence to reuse the data even if they are gaining commercial benefit. Ordnance Survey is a government agency so they release a lot of data under Open Government licences to encourage reuse, you can find out more here one of these datasets includes postcode centroid so you could map an address to a postcode location instead of the more expensive exact lat/long.

In some cases such as Companies House addresses are manually entered when submitting data. In these cases, Royal Mail isn’t involved in the process as they are not the data provider, as such you would not need a licence to reuse this data. However, as the data isn’t validated there tend to be a lot more typos and some of the addresses may not exist.

In Summary

On the surface, Open Data released under an OGL gives the impression you can reuse it as you like, as we now know this isn’t always the case. The best rule of thumb is to check with the data publisher before you reuse data for commercial purposes.

Frequently Asked Questions

What is open data?

Open data is data that can be freely used, re-used, and redistributed by anyone, subject at most to requirements like attribution.

What are the main criteria for data to be considered open data?

To qualify as open data, the dataset should be (1) available and accessible in a machine-readable open format (e.g. CSV) via download over the internet, (2) licensed to allow re-use and redistribution (ideally under something like the Open Government Licence), and (3) usable by anyone without discrimination (no restrictions by field of endeavour or type of user).

When is data not open data?

Data is not open when it contains personal data, when permissions/license terms limit commercial re-use, or when there are third-party rights the provider does not control. For example, address data may be partially owned/licensed by third parties (Royal Mail, Ordnance Survey) which can constrain how it’s released or reused.

Why is it important to be clear about what open data means?

Because misunderstanding the definition can lead to “contaminated” datasets – that is, data which appear “public” but include components under restrictive licences. Without clarity, users may inadvertently violate licence terms or face legal or ethical issues when re-using or redistributing data.

What are some examples of open data in practice?

Examples include government spend data, property transaction data released by Land Registry, or datasets published under the Open Government Licence. These allow commercial innovation (e.g. property portals like Zoopla or Rightmove) and enable citizens to hold governments accountable.

The companies that thrive on data are those that make smarter decisions, faster. Unlock the intelligence behind one of the UK’s largest data sources and see how our data, tools, and AI chatbots can turn insight into opportunity for your business. Book a Demo

« Back to Glossary Index