The General Data Protection Regulation (GDPR) is coming into force, for the first time establishing a regime involving large fines if your organization is not handling personal information properly. How should an organization with a black box architecture and potentially thousands of data sources go about complying?
The GDPR is fast-becoming a problem for many. The regulation is long and burdensome, while compliance may mean a variety of process, HR, and technical interventions. It is hard to know where to get started.
In order to better understand our clients’ needs, we recently completed a survey of around 25 Chief Data Officers (CDOs) and Chief Information Security Officers (CISOs), mainly at large financial institutions. The results of this survey indicated that the top three technical problems that they have in order of frequency are:
Accessing data across different data sources in their organization for data access, rectification, and erasure (mapping to GDPR Articles 13-15, 16, and 17, respectively)
Tracing the lineage of where data is going
Understanding where sensitive data that needs to be protected is in the first place
An illustrative quote from one CISO at a large bank was:
My magic wand would be a system that, from creation to consumption to dissemination, is able to label an inventory of my critical information and keep the telemetry information of its usage.
Sounds good. However how do you achieve this future when your systems are a black box? Thinking further about the three issues above, although the immediate regulatory problems the interviewees were experiencing from a technical viewpoint are achieving access, rectification, and erasure, in the short term the most important of the three is actually the last–i.e., understanding where sensitive data is.
Without knowing where your sensitive data is, how would it be possible to control that sensitive personally identifiable information (PII) per the regulations in the first place? And, it makes the problem even more complex when that data is dispersed across possibly thousands of data sources in black box architectures.
How did we get here?
I’m reminded of Jaron Lanier’s book, Who Owns the Future?, where he posits that the rise of large complex architectures that consume data are effective at monopolizing a market towards one end but are essentially sloppy engineering from a data management view. Whether this is true or not is a moot point. Because we can copy data at will instead of keeping data in a single original source, that data is copied to the thousands of applications that might need to process it. Simply because that is the most convenient engineering method, data gets copied, processed, transformed, and retransmitted thousands of times.
What is the result of business models that are dependent upon data? Essentially a very complex black box architecture that somehow achieves the goals of the organization, most of the time (there are a number of counterexamples where large and complex architectures like Equifax were defeated by nimbler foes, but that’s not really the point of this particular post).
But these black boxes were not built for modern times. Recent regulations like the GDPR introduce a fundamental difference in how we treat data. GDPR insists that data is not something to be processed at will by an organization but rather something that belongs to an end user who an organization serves. The user is in essence lending their data to the organization. And, as you would with an asset with value, the organization is now required to treat that data as something valuable. The organization is a custodian of the user’s data. If that custodianship is not appropriately managed, your organization could be on the hook for very large fines up to €20M or 4% of global annual revenue, whichever is greater. Suffice it to say, GDPR is the biggest change in data protection regulation in 20 years.
Making black box architectures speak GDPR
The IT architecture of most organizations was not built for data privacy and GDPR is forcing them to adapt. So, going back to the third point mentioned above by the heroes of our story, how do organizations start to get their black box systems in order for GDPR? How does the organization understand where their sensitive data is so that they can manage it properly?
One of the highest leverage actions that an organization can take to prepare for the upcoming GDPR regulations in May 2018 is to show regulators (and customers if you are a Data Processor, see our other blog post) that the organization is doing something to try to get their data in order. This may not be easy since you have to accommodate not only different data types (addresses, names, phone numbers, etc.) but different data sources where sensitive PII data might be stored (SQL databases, cloud file storage, cloud service providers, etc.).
To do this today a given organization might first initiate a data mapping exercise from a data privacy regulation standpoint. The first subtask in a data mapping exercise is to get a list of data sources and work out how to access those data sources for review. The second subtask is to look at the data inside each data source to see what is sensitive and what is not–it may often be that data is sensitive for reasons beyond being PII. Finally, the third subtask is to put a management system around this exercise–since black box architectures tend to morph over time, it is important to keep regulatory data mapping up to date so that you can make sure that data is being properly managed over time. This framework ensures that you can always know where your sensitive PII data is so that you can then make sure you are addressing the more direct regulatory requirements of GDPR such as access, erasure, and rectification.
Ohalo’s Data X-Ray is a tool to take that first step. It is a low-friction, high-impact way to make GDPR-ready versions of the various data sources you have. It connects to many data source types, scans those data sources for PII using a machine learning algorithm, and keeps your organization’s regulatory data map up to date over time. You can get started with the your login details with the cloud services you use, or if you have native databases, simply provide a connection string and we can get you ready for GDPR.
Ohalo provides the Data X-Ray tool to make simple a baseline evaluation of where your sensitive data is. By simply signing up for a trial here you can get started soon. If you’re not quite ready, you can get a demo of the tool first here. Once you have established a baseline of where your most sensitive data is, you can track where that data is going with the Data Protection Router in order to assist your clients in Data Subject requests like access, rectification, erasure, and breach notification.