IN OPEN DATA WE TRUST?

4 minutes22 November 2023

Data scientists, in their desire for precise knowledge, are faced with a conundrum when it comes to Open Data published by official bodies – because it must be included, yet it is often not in a fit state for purpose.

It has been claimed that poor data quality leads to poor decisions, but while this may be the case for a company’s own internal data, this is an over-simplification of a complex situation – especially in the case where data is obtained from outside sources.

Data governance frameworks and data quality processes need to include valuable Open Data resources in order to have a complete picture in order to make good business decisions, but in reality that data is often equivalent to unrefined ore rather than sparkly ready-to-use gems.

Open Data – by which I mean public data sets around non-personal information related to market trends, demographics, companies or properties – can be freely used, re-used and redistributed by anyone subject only to the requirements to attribute and share.

The quality of Open Data from an official source will not usually be 100%. This is because of the method in which Open Data, even for a national system of reference such as Companies House, is collected. But even when flawed there is often no better source and, despite its quality, value can still be obtained.

Making the most of Open Data

To derive value, we need to be able to objectively trust external sources by augmenting data governance processes associated with the curation of the data. Indeed, for the US Department of Commerce, “structuring the data and tracing the source are just two of many important aspects of data governance that are carefully considered.”

Ideally, a data governance framework will be able to judge open datasets by understanding if there is a problem with the data and quantifying the extent of this problem, at the same time as identifying the source of that data. Importantly, the concept of the data source includes both the provenance and the dates when the data was created, or updated and harvested.

At the point of use, the data must be assessed with respect to its intended use. How it is utilised can then be adjusted – by giving it a lower weighting in a predictive model, for example, or by altering the algorithm.

Essentially, gaining a critical understanding of Open Data, and developing a data framework accordingly, involves the following three crucial areas:

Data provenance: Ensuring datasets are obtained directly from an official source or data publisher, and have not been filled-in, corrected or altered in any way
Freshness of data capture: An important amount of information can be gained by looking at the metadata of a data source. For example, business rates of 2023 vs. 2022.
Data Quality: The ability to quantify the missing or invalid values, quantify the missing records, and identify the inconsistent values and links to other data, etc.

If there are known issues with Open Data, especially around inconsistencies which might be due to differing extraction dates, the data can be linked and combined with other internal and external datasets to obtain a better consensus picture. Of course this requires the aforementioned knowledge to trigger the activity.

“Any job worth doing…”

Open Data is not perfect, but it does contain enormous value. Performing data governance and data quality is by far a task worth undertaking in order to establish the right level of trust in the data, for whatever application.

By establishing the details of provenance, quality and timing through correctly curating and formatting Open Data, firms can use it with confidence, contributing to incredible insights.

Originally published in Financial IT

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

IN OPEN DATA WE TRUST?

Categories

Search the blog

Read A Selection Of Recent Success Stories

118/Market Location Sees Doorda Commercial Property Data Synergy

Insurtechs innovate faster with Doorda data

CARTO Accelerates Customer Analytics with Doorda

SCAS optimises emergency responses using Doorda data

PwC accelerate time to insight

REaD Group improves targeting by up to 20%

Capital on Tap improve marketing responses

Want to know more?
Contact us to discuss how we can help.

Results

Popular searches

ARCADIA LEISURE LIMITED

DoordaBiz

CERTUS SECURITY LLP

DoordaProcurement

ARCADIA LEISURE LIMITED

DoordaLocations

104 NORTHUMBERLAND ROAD, SOUTHAMPTON, SO14 0ER

DoordaProperty

SO14 0ER

DoordaStats

Membership Level

Create your FREE account for access to this and ongoing data updates.

Account information Already have an account? Log in here

More Information

Billing Address

Payment Information We Accept Visa, Mastercard, American Express, and Discover

Terms and Conditions of use

IN OPEN DATA WE TRUST?

Categories

Search the blog

Read A Selection Of Recent Success Stories

118/Market Location Sees Doorda Commercial Property Data Synergy

Insurtechs innovate faster with Doorda data

CARTO Accelerates Customer Analytics with Doorda

SCAS optimises emergency responses using Doorda data

PwC accelerate time to insight

REaD Group improves targeting by up to 20%

Capital on Tap improve marketing responses

Want to know more?Contact us to discuss how we can help.

Results

Popular searches

ARCADIA LEISURE LIMITED

DoordaBiz

CERTUS SECURITY LLP

DoordaProcurement

ARCADIA LEISURE LIMITED

DoordaLocations

104 NORTHUMBERLAND ROAD, SOUTHAMPTON, SO14 0ER

DoordaProperty

SO14 0ER

DoordaStats

Membership Level

Create your FREE account for access to this and ongoing data updates.

Account information Already have an account? Log in here

More Information

Billing Address

Payment Information We Accept Visa, Mastercard, American Express, and Discover

Terms and Conditions of use

Want to know more?
Contact us to discuss how we can help.