Streamlining Records and File Management with ROT Analysis for Enterprises

  • Type: Product Insights
  • Date: 18/08/2023
  • Author: Alistair Jones
  • Tags: Data Governance, Data minimisation, Data Management, Data mapping

As the saying goes, "Quality over quantity." This wisdom applies just about everywhere but it couldn’t be more true for unstructured data in enterprises. Large organizations can create a sea of files and documents every day, but how much of that is useful and essential data that needs to stay around forever? In our experience, surprisingly little..

Enterprises IT departments all know that they should be reducing their storage footprint, but let’s review how absolutely critical it is for a successful enterprises:

  • Security risk exposure: Every extra file is a potential weak link, increasing the chances that sensitive data could be misused, lost, or exposed in a data breach. Reducing the storage footprint reduces your attack surface.

  • Reduce operational noise: Searching through a pile of trivial and obsolete files wastes employee time. Declutter, retain only what's necessary and your employee will thank you.

  • Reduce cost: Storage is only inexpensive when it’s cold storage, but most of an enterprise's data is hot, readily accessible data. Storage costs quickly climb to astronomical amounts when documents and data are repeatedly used in ETL operations, data migration initiatives, searchable indices, and increasingly MLOps and generative AI pipelines. Performing these on the most relevant data can greatly reduce costs.

  • Regulatory compliance: Finally, many industries operate under strict data retention guidelines. Holding onto files longer than required, or not long enough, can result in compliance violations and hefty fines

Sounds like a problem well worth dealing with. Your enterprise will have many options for reducing their footprint, whether it’s moving documents to cold storage, archiving them, or deleting them, but the hardest part is getting started and finding which files to address first. That's where the ROT analysis steps in, offering a structured methodology to refine, declutter, and optimize.

Finding the ROT

ROT stands for Redundant, Obsolete, and Trivial. In the context of records and file management, ROT analysis is a process employed by organizations to identify and manage content falling under these categories:

  1. Redundant: These are duplicate files that might be found scattered across the cloud or on premise in various document storage systems. They're probably copies that were created by email documents back and forth, drafting different versions with collaborators, or shuffling them from one device to another. Note that these versions will have small discrepancies, so it’s important to not only find duplicates but also near duplicates!

  2. Obsolete: This category includes outdated files or records that no longer have relevance to current operations. For instance, old marketing materials from a campaign five years ago might fall here. But more importantly, a customer service interaction that might contain PII data would fall under this category – depending on your data jurisdiction and policies, this could require deleting after a certain amount of time.

  3. Trivial: These are the files or pieces of data that may never have been of substantial value. Think of those files that got created, saved, but never accessed again after that day – a temporary image that was added to a report. Or consider auto-generated system files that could be recreated automatically by your application if ever they were ever needed again. For example, if you have developer teams in your enterprise, you are likely to find large code library packages that take up a large amount of storage, but could be easily downloaded again in a few minutes.

Scan for ROT with Data X-Ray

Data X-Ray is well-established in the data governance scene, where it is known for its speed and accuracy on discovery and classification of unstructured data. Some of its key functions include a variety of enterprise datasource connectors, deep metadata collection of files, and a scalable search dashboard. More recently, Ohalo has launched features to make it the perfect ROT analysis tool.

Detecting the duplicates

In the newest release of Data X-Ray, you can start detecting redundant information right in the dashboard. The screenshot below shows a list of duplicated files found in an organization. Many tools find duplicates using file size and file types, but Data X-Ray goes further than the competition by matching on select AI and regular expression annotations inside the document, allowing it to properly find near-duplicates documents.

Detecting the duplicates

Out with the old

The centerpiece of Data X-Ray is an intelligent search dashboard that gives you a view over your entire enterprise data landscape. You can view every file with a long list of datasource-specific metadata, including last modified, last accessed and last created dates.

Out with the old

Trivial pursuit

The definition of trivial files is often specific to your enterprise and its domain. Data X-Ray smart labels can be pre-configured to detect common sources of trivial files. Common list of trivial files include:

  • Re-downloadable code library directories such as “node_modules” or “.env”/”.venv” directories.

  • Temporary files since as files with “tmp” or “temp” in the file name and extension

  • Empty 0-byte files

  • Very low content files, with no Data X-Ray annotations found

Trivial pursuit

Wrapping up

Smart labels can also be used to combine all the above ROT criteria to capture to further refine your rot detection, allowing you to find all duplicates that have been opened in a year, or all auto-generated files that were created 5 years ago.

With the Data X-Ray export features you will be able to generate excel reports of ROT files and take the right action with downstream processes.

Learn more about these features in the Data X-Ray documentation portal.

DataGuardian: Enterprise level

If you are an enterprise dealing with a particularly massive data landscape, it will require hands-on expertise and executive-level reporting. Ohalo has partnered with Coltblue Associates who have decades of experience in enterprise data management, including cloud migration, security, and compliance. Not only are they expert Data X-Ray power users, they’ve also built a companion product DataGuardian, with highly performance dashboards and gorgeous and professional reporting C-Suite and board members.

DataGaurdian: Enterprise level

Learn more about our DataGuardian and Coltblue Associates

Subscribe to our newsletter

Subscribe now