Discover and Classify Unstructured Data at Scale
- Type: Blogs
- Date: 08/03/2022
- Author: John Barnes
- Tags: Data Governance, big data, Data protection, Data mapping
Earlier this week I was on a call with the Head of Data at one of the world’s largest financial institutions when he mentioned the “known knowns” quote by Donald Rumsfeld. However, he specifically attached this to discovering and classifying unstructured data at scale.
Regardless of political opinion, it got me thinking, as this quote when applied to data highlights significant levels of risk and untapped potential within large enterprises.
The full quote is the following:
“There are known knowns — there are things we know we know. We also know there are known unknowns — that is to say, we know there are some things we do not know. But there are also unknown unknowns, the ones we don’t know we don’t know.”
So I thought I’d take the opportunity to break this down and bring it into the context of data and what we do to remedy this challenge at Ohalo.
KNOWN KNOWNS – WELL, THIS IS SIMPLE. THESE ARE THINGS YOU CLEARLY KNOW ABOUT.
But has this data been classified correctly?
Do you know who has access to it?
How is this data governed?
What policies are applicable?
Many businesses we speak to (government agencies, banks, insurers, energy companies, etc.) are aware that there is data throughout their respective organizations that has not been classified let alone discovered.
This is leading to significant levels of compliance and operational risk, not to mention exposure. You only need to look at the $400m regulatory fine handed to Citi for “longstanding failure to fix its data and risk management systems”.
How are you going to discover unstructured data and classify it (within a file, down to a word level) at scale?
What are you doing to mitigate compliance and operational risk?
What is the implication of doing nothing, not only for the business you represent but for your customers as well?
Unknown unknowns – The final part of the quote could arguably be pointed towards Dark Data.
This is data you may not know about or data which you fail to use for other purposes (analytics, monetisation, etc.). The main point to make here is summarized nicely in the title of a book on the subject, “What You Don’t Know Matters”.
What are the implications of not knowing if you have for instance, PII data stored in various regions in disparate systems across your business?
What could be the benefits of discovering and classifying, potentially redacting this data?
What would a regulator say or do if they uncovered this without you even knowing that it existed?
What happens if this data was targeted as part of a breach?
According to research, over 80%+ of data within large organizations is unstructured, structured is only the tip of the iceberg.
At Ohalo, it’s our mission to partner with businesses and enable them to discover, classify and redact unstructured, sensitive data at scale, quickly.
The Data X-Ray platform runs in real-time to help our clients address all of the questions above and significantly mitigate risk. In some cases we have seen the monetisation of data (including dark data) to open up new revenue streams.
Ultimately, discovery and classification should be front and center of your mind when embarking on large innovative projects and this is what we specialize in.