Data X-Ray delivers unstructured data insights in minutes.

Govern your AI training pipelines

Before inference comes training. Before training comes governance.

Training Generative AI Large language models (LLMs) requires that you know what data you are putting in your training pipelines both from a content perspective but also from a governance perspective. Data X-Ray allows you to pull back file content, understand its metadata such as provenance, entitlements, and age that can be the crucial factor in successfully discovering and pushing unstructured data into your LLM pipelines.


Our Clients

Data X-Ray platform trusted by Home Office.
Health and Safety Executive trusts Ohalo's Data X-Ray platform.
Wood partners with Ohalo Data X-Ray
Data X-Ray trusted by HomeServe
Veolia uses Ohalo's Data X-Ray platform.
Costain chooses Data X-Ray.

Connect and extract text from all data sources

Data X-Ray automatically connects to a wide variety of enterprise datasources to avoid the hassle of you building connectors, including:

  • File Shares
  • S3 Buckets
  • Azure Blobs
  • Office 365
  • Content Management Systems
  • Cloud Storage
  • and more

Discover and classify contextually from all data sources

Data X-Ray leverages petabyte scale discovery and classification to pull back all metadata about your files, classify the content with NLP processing, and build a repository of data ready to push into your training pipelines:

  • File context
  • Regulatory requirements of data (privacy, security, and more)
  • File entitlements and ownership leveraging enterprise Active Directory
  • Content analysis to push the most relevant data into your models for whatever the use case

Effortlessly query ElasticSearch and retrieve full file contents

Along with the metadata generated, Data X-Ray can store your full file contents in text form to easily push text and its metadata into your training pipelines.


Power automated unstructured data discovery, classification, and metadata ingestion for Generative AI at petabyte scale.

– 01

Connect to the data you care about

Connects to all of your datasources, on prem, in the cloud, or in managed SaaS providers.

– 02

Auto-classify files

Uses machine learning to suggest file categories and classify down to the token level.

– 03

Easily generate metadata

Automatically generates metadata about your physical unstructured data, such as file names, entitlements, sizes, and creation dates.

– 04

Built for petabyte scale

Crawls all of your data at scale to pull in all relevant data from across the enterprise.

– 05

Build safety into your models

Metadata including entitlements that are linked to your Active Directory powers least access privilege controls in your models.

– 06

Respect existing file entitlements

Trains your models to only respond to queries with valid user permissions structures.

Get in touch with us

Know more about how Data X-Ray can accelerate Generative AI adoption and provide peace of mind for data owners.

Subscribe to our newsletter

Subscribe now