GDPR Article 30 Data Lineage and Why Blockchain Can Help

This article is a bit technical. We discuss the regulatory principles of the General Data Protection Regulation (GDPR) and how a blockchain architecture makes it easy to comply with those principles.

Article 30 of the GDPR requires that data controllers and data processors maintain systems to record data sharing activity. It is aimed towards larger organizations of over 250 employees that have more complex data infrastructure across multiple databases both internally and externally. How should a company go about maintaining all of this metadata both inside and outside their organization? We think blockchain may hold the answer.

The Requirements

“Metadata is the new black” (in enterprise data management). The regulations (e.g. GDPR Article 30) require enterprises to track metadata about how data is being shared and used. Such a record is important to demonstrate the state of data management but is also vital to meet the functional requirements of other Articles like data erasure, rectification, and access–without a record of where data is, it is very difficult or impossible to manage that data. The information that needs to be available upon request to authorities includes:

  • Purposes of processing

  • Description of categories of data subjects and categories of personal data maintained about the subjects

  • Categories of recipients of the data including those recipients in third countries

  • Documentation around suitable safeguards

  • Time limits (and presumably start date of when data was obtained) for envisaged erasure of the data

  • General description of technical or organizational measures

The Problem with Implementing Article 30 within Large Organizations

At first glance, this Article seems relatively straightforward for a single data source–simply maintain a generalized record somewhere and you should be fine. It is essentially requiring a glorified log file. But this simplicity breaks down quickly in a modern IT architecture that may include complex interactions between 1000s of applications, databases, and data processors. Even within a single Data Controller data is managed across multiple business lines, legal entities, brands, and more. When you then extend this to Data Processors as well, the complexity becomes exponential in nature.

Embracing the Complexity of Distributed Systems

Tracking data in complex systems is not a new problem, but now it is a regulated problem. I speak to many Chief Data Officers, Chief Information Security Officers, IT architects, etc. at Ohalo clients about large enterprise IT architecture. They often have ongoing projects that are in one way or another related to knowing what data is where and tracing the usage of that data. Such data relationships are complex and therefore the natural reaction is to reduce the complexity. They work on migrating data to a centralized source that is meant to be a single source of truth within an organization for a particular data set. However over time that data tends to dissipate back down into constituent applications–it is just easier to store data locally on an individual application level so that’s what tends to happen. So how does a larger enterprise achieve tracking of data in what looks like the simplicity of a single data store while allowing that enterprise architectures are naturally complex distributed systems?

At Ohalo we believe that the answer is actually to allow systems to stay distributed. When considering the requirements of Article 30 metadata tracking across multiple entities and borders, there is really no other option barring the creation of some supranational consortium that maintains a central database of all the world’s personal data on behalf of its members. That is why we created the Data Protection Router.

Tracking GDPR Metadata in Distributed Systems using Blockchain

The Data Protection Router is an abstraction layer on top of existing databases that tracks how data is moving not only within a single organization but between multiple organizations. It relies on the Data X-Ray to integrate with underlying databases in seconds (or through an API) and suck out the relevant metadata about when/where/how/what data is being stored in any particular database or cloud service.

We employ a blockchain to do this because we think that it is the only tool fit for the purpose of tracing data between and within the distributed system that is a modern IT architecture. The blockchain affords us three essential characteristics:

  1. Architecture: As noted above, Article 30 requirements (as well as simply wanting to make cross-enterprise data usable) include the ability to trace data movement between disparate databases. While a centralized database would achieve this, it is practically and politically impossible to implement. The blockchain allows us to easily create a completely peer to peer architecture where individual nodes within a distributed system become addressable. We can maintain metadata about the activity of all nodes in the system and even individual records within databases without the trouble of setting up a central system. At Ohalo we call this concept “mini-consortium”. Mini-consortiums are the ability to create a metadata record for an individual data transaction that persists across multiple databases and applications.

  2. Authentication: Without a central authority, it is difficult to ensure that a peer is who they say they are. In the Ohalo architecture we rely on the public key cryptography native in the Ethereum protocol (secp-256k1 ECDSA) to achieve this. Whenever any two peers interact with each other, they carry out a handshake that confirms the peer identity and therefore links a cryptographic address to the peer profiles maintained in the smart contracts as explained in 3, below.

  3. Accountability: Even if you can implement a peer to peer system, you want to be able to ensure the characteristics of the peer that you are interacting with. Indeed this is required by Article 30. Ohalo’s Data Protection Router uses smart contracts that link blockchain cryptographic address to smart contract-managed profiles of peers (peers normally being a database or application) so that the metadata requirements of Article 30 such as name, owner, controller, representative, data subject and personal data categories, and more are immediately referenceable through a cryptographic proof. We also maintain separate smart contracts to log and prove peer activity for demonstration to auditors and supervisory authorities.

Although tracking data in complex systems is not new, it is now regulated. GDPR Article 30 requires the maintenance of metadata about data subjects across Data Controllers and Data Processors. However existing systems are distributed in nature, and it is near impossible to migrate existing systems into a manageable centralized architecture to achieve these Article 30 requirements. We believe that the Ohalo implementation of the blockchain is the answer because it allows for current distributed models while also meeting the metadata management requirements inherent in Article 30.

About Ohalo

Ohalo automates data governance for GDPR requirements. The Data X-Ray connects in seconds and scans for sensitive data on a regular basis so that you are always up to date with where your sensitive data is and can demonstrate this to authorities–simply sign up here. Once you have established a baseline of where your most sensitive data is, you can track where that data is going with the Data Protection Router to meet Article 30 requirements and requirements around access, rectification, erasure, and breach notification.