Is Lakehouse Monitoring worth it?

I've created a toy Lakehouse Monitoring in Databricks setup to explore its features and capabilities. The goal is to understand how it works and what benefits it can bring. Here's an overview of what I cover in this post:

How to Setup a toy Lakehouse Monitoring
- Dashboard
- Alerts
Pricing
My opinion on what I've seen
Where's Databricks going?
- My2C

If you want to know more about what Databricks' Lakehouse Monitoring can do, I recommend checking out the official documentation. I have prepared a basic map of concepts that can help you get started.

Lakehouse Monitoring Concepts Map

How to Setup a toy Lakehouse Monitoring

Let's start by creating a table we can work with. It should be a time-series table

create table workspace.default.sales (
    timestamp TIMESTAMP,
    amount DOUBLE
)

I then create a basic notebook insert 1h of data.ipynb to fill table with data. Then, setup a job to run that notebook every hour.

I'll not add the code here because it is quite basic. It randomly adds records to the table with random values (within the time windown of the hour).

select * from workspace.default.sales
limit 10

timestamp	amount
2025-08-22T08:58:07.929Z	22.570402586080093
2025-08-22T08:51:51.929Z	20.713874028846366
2025-08-22T09:03:54.929Z	21.97633174572098
2025-08-22T08:28:44.929Z	27.94416169489641
2025-08-22T09:05:29.929Z	21.307407500066127
2025-08-22T08:17:03.929Z	22.37476392747984
2025-08-22T09:05:05.929Z	26.446829879517953
2025-08-22T08:42:32.929Z	27.86840740526422
2025-08-22T08:33:55.929Z	27.236961570798947
2025-08-22T08:42:30.929Z	25.395336538015343

Then, let's create the Monitor via Unity Catalog Explorer 👇

I set up the monitor as TimeSeries profile. I pointed out the timestamp column and a granularity of 1 hour. The schedule of the monitor is actually daily.

Below, a screenshot of the Unity Catalog Explorer page to create the Lakehouse Monitoring.

Screenshot of Unity Catalog Explorer page to create the Lakehouse Monitoring

What happens after the creation of the Monitoring? By default, two new tables are created

<table_name>_profile_metrics
<table_name>_drift_metrics

Let's inspect them

SHOW TABLES IN workspace.default;

database	tableName	isTemporary
default	sales	false
default	sales_drift_metrics	false
default	sales_profile_metrics	false
	_sqldf	true

select * from workspace.default.sales_profile_metrics

window	log_type	logging_table_commit_version	granularity	slice_key	slice_value	column_name	count	data_type	num_nulls	avg	min	max	stddev	num_zeros	num_nan	min_length	max_length	avg_length	non_null_columns	frequent_items	median	distinct_count	percent_nan	percent_null	percent_zeros	percent_distinct
List(2025-08-22T08:00:00.000Z, 2025-08-22T09:00:00.000Z)	INPUT	26	1 hour	null	null	:table	1344	null	null	null	null	null	null	null	null	null	null	null	List(timestamp, amount)	null	null	null	null	null	null	null
List(2025-08-22T08:00:00.000Z, 2025-08-22T09:00:00.000Z)	INPUT	26	1 hour	null	null	amount	1344	double	0	25.059042979855878	20.0007185120017	29.99500143216646	2.8817003714622857	0	0	null	null	null	null	null	25.133808306174608	1277	0.0	0.0	0.0	95.01488095238095
List(2025-08-22T08:00:00.000Z, 2025-08-22T09:00:00.000Z)	INPUT	26	1 hour	null	null	timestamp	1344	timestamp	0	null	1.755850122929519E9	1.755853196019946E9	null	null	null	null	null	null	null	null	1.755851952929519E9	1161	null	0.0	null	86.38392857142857
List(2025-08-22T09:00:00.000Z, 2025-08-22T10:00:00.000Z)	INPUT	26	1 hour	null	null	timestamp	1192	timestamp	0	null	1.755853200929519E9	1.755856792565437E9	null	null	null	null	null	null	null	null	1.755854777019946E9	997	null	0.0	null	83.64093959731544
List(2025-08-22T09:00:00.000Z, 2025-08-22T10:00:00.000Z)	INPUT	26	1 hour	null	null	:table	1192	null	null	null	null	null	null	null	null	null	null	null	List(timestamp, amount)	null	null	null	null	null	null	null
List(2025-08-22T09:00:00.000Z, 2025-08-22T10:00:00.000Z)	INPUT	26	1 hour	null	null	amount	1192	double	0	24.847526487373074	20.01063581181195	29.99539398790598	2.8880160456500867	0	0	null	null	null	null	null	24.694267025212234	1192	0.0	0.0	0.0	100.0
List(2025-08-22T10:00:00.000Z, 2025-08-22T11:00:00.000Z)	INPUT	26	1 hour	null	null	amount	941	double	0	24.969054277784952	20.015936515703217	29.981071930502402	2.846773237150427	0	0	null	null	null	null	null	24.969920953185955	925	0.0	0.0	0.0	98.29968119022317
List(2025-08-22T10:00:00.000Z, 2025-08-22T11:00:00.000Z)	INPUT	26	1 hour	null	null	timestamp	941	timestamp	0	null	1.755856803565437E9	1.755860399121952E9	null	null	null	null	null	null	null	null	1.755858763121952E9	857	null	0.0	null	91.07332624867162
List(2025-08-22T10:00:00.000Z, 2025-08-22T11:00:00.000Z)	INPUT	26	1 hour	null	null	:table	941	null	null	null	null	null	null	null	null	null	null	null	List(timestamp, amount)	null	null	null	null	null	null	null
List(2025-08-22T11:00:00.000Z, 2025-08-22T12:00:00.000Z)	INPUT	26	1 hour	null	null	:table	995	null	null	null	null	null	null	null	null	null	null	null	List(timestamp, amount)	null	null	null	null	null	null	null

The profile table has a row for each pair

window (the beginning and end of every hour)
column_name every column of the table. In addition, it adds a special row :table to compute the table-level profile.

Optionally, it can slice on column values when specified at the time of the creation of the Monitor

For each row, it computes a bunch of statistics like avg, quantiles, min, max, etc. (when applicable, eg for float columns).

select * from workspace.default.sales_drift_metrics

window	granularity	slice_key	slice_value	column_name	data_type	window_cmp	drift_type	count_delta	avg_delta	percent_null_delta	percent_zeros_delta	percent_distinct_delta	non_null_columns_delta	js_distance	ks_test	wasserstein_distance	population_stability_index	chi_squared_test	tv_distance	l_infinity_distance
List(2025-08-22T11:00:00.000Z, 2025-08-22T12:00:00.000Z)	1 hour	null	null	:table	null	List(2025-08-22T10:00:00.000Z, 2025-08-22T11:00:00.000Z)	CONSECUTIVE	-418	null	null	null	null	List(0, 0)	null	null	null	null	null	null	null
List(2025-08-22T09:00:00.000Z, 2025-08-22T10:00:00.000Z)	1 hour	null	null	:table	null	List(2025-08-22T08:00:00.000Z, 2025-08-22T09:00:00.000Z)	CONSECUTIVE	-152	null	null	null	null	List(0, 0)	null	null	null	null	null	null	null
List(2025-08-22T10:00:00.000Z, 2025-08-22T11:00:00.000Z)	1 hour	null	null	:table	null	List(2025-08-22T09:00:00.000Z, 2025-08-22T10:00:00.000Z)	CONSECUTIVE	-251	null	null	null	null	List(0, 0)	null	null	null	null	null	null	null
List(2025-08-22T11:00:00.000Z, 2025-08-22T12:00:00.000Z)	1 hour	null	null	timestamp	timestamp	List(2025-08-22T10:00:00.000Z, 2025-08-22T11:00:00.000Z)	CONSECUTIVE	-418	null	0.0	null	-5.604875005841791	null	null	null	null	null	null	null	null
List(2025-08-22T09:00:00.000Z, 2025-08-22T10:00:00.000Z)	1 hour	null	null	timestamp	timestamp	List(2025-08-22T08:00:00.000Z, 2025-08-22T09:00:00.000Z)	CONSECUTIVE	-152	null	0.0	null	-2.742988974113132	null	null	null	null	null	null	null	null
List(2025-08-22T10:00:00.000Z, 2025-08-22T11:00:00.000Z)	1 hour	null	null	timestamp	timestamp	List(2025-08-22T09:00:00.000Z, 2025-08-22T10:00:00.000Z)	CONSECUTIVE	-251	null	0.0	null	7.4323866513561825	null	null	null	null	null	null	null	null
List(2025-08-22T11:00:00.000Z, 2025-08-22T12:00:00.000Z)	1 hour	null	null	amount	double	List(2025-08-22T10:00:00.000Z, 2025-08-22T11:00:00.000Z)	CONSECUTIVE	-418	0.013276990751066364	0.0	0.0	-0.21172707932451829	null	null	List(0.049, 0.3829208808885818)	0.16058939377867895	0.028216393260236203	null	null	null
List(2025-08-22T10:00:00.000Z, 2025-08-22T11:00:00.000Z)	1 hour	null	null	amount	double	List(2025-08-22T09:00:00.000Z, 2025-08-22T10:00:00.000Z)	CONSECUTIVE	-251	0.12152779041187856	0.0	0.0	-1.7003188097768316	null	null	List(0.038, 0.4228041687817168)	0.14481617304902772	0.021676869284417942	null	null	null
List(2025-08-22T09:00:00.000Z, 2025-08-22T10:00:00.000Z)	1 hour	null	null	amount	double	List(2025-08-22T08:00:00.000Z, 2025-08-22T09:00:00.000Z)	CONSECUTIVE	-152	-0.21151649248280435	0.0	0.0	4.985119047619051	null	null	List(0.062, 0.014853915612309707)	0.21231645982023184	0.022438267042335924	null	null	null

The drift table is similar to the profile table. The drift table has a row for each pair

window (the beginning and end of every hour)
column_name every column of the table. In addition, it adds a special row :table to compute the table-level profile.

In addition, it has the window_cmp, where cmp stands for compare. All the statistics are compared against another window (the previous one). There are various statistics like

count_delta
ks_test, in statistics, the Kolmogorov–Smirnov can be used to test whether two samples came from the same distribution

Dashboard

Lakehouse Monitoring creates also a dashboard automatically that displays the data in these profile and drift tables.

😓 However, I find this dashboard too crowded and not ready to use. You need to work on it to customize it by yourself.

Alerts

Monitor alerts are created and used the same way as other Databricks SQL alerts. You create a Databricks SQL query on the monitor profile metrics table or drift metrics table. You then create a Databricks SQL alert for this query.

Pricing

Lakehouse Monitoring is billed under a serverless jobs SKU. You can monitor its usage via system.billing.usage table or via the Usage dashboard at Account console.

You need to pay attention. I expect that the costs may rise for columns with high number of columns if you don't finetune the monitor.

SELECT usage_date, sum(usage_quantity) as dbus
FROM system.billing.usage
WHERE
  usage_date >= DATE_SUB(current_date(), 30) AND
  sku_name like "%JOBS_SERVERLESS%" AND
  custom_tags["LakehouseMonitoring"] = "true"
GROUP BY usage_date
ORDER BY usage_date DESC

usage_date	dbus
2025-08-22	1.852757467777777736

My opinion on what I've seen

Lakehouse monitoring is all about these two profile and drift tables. It is a kind of brute force approach that runs standardized monitoring over the specified table and stores the output in the profiling tables. Is it convenient? It depends on what you're looking for. It is not a free lunch.

Pros 🟢

It takes little effort to setup. By default common controls are applied to all columns in the monitored table.
Most common monitoring scenarios are covered by TimeSeries profile or by Snapshot profile (I left apart the inference-ML for the sake of simplicity). The setup time is shorter when compared to anything made by yourself.
You have a framework ready to use. You save the time required designing it, and you avoid reinventing the wheel. You can focus on your business needs rather than on data engineering stuff.
I like the simple but effective design of the drift metric table and of the windowing. Making something like this by yourself will probably let you hit against some hidden edge-case (like anytime you work with time and dates).

Cons 🔴

Once the metrics are computed in the profile and drift tables, only half of the job is done. You still have to decide what to monitor and how to do it. You're probably not interested in monitor any single column in any row of the metric tables (otherwise you may alerted by too many false alarms). A finetuning of the actual alerts is still required, and it is not coming for free.
You can't know in advance the overall cost of the monitoring. You need to try with a realistic (production-alike) scenario and monitor soon how much you're paying. I expect it to depend mainly on
- the data volume
- the columns in the table
- the frequency of the controls

Where's Databricks going?

In addition to Lakehouse Monitoring, Databricks has released a feature (in Beta) of data quality monitoring. This new monitoring

is quicker to setup. It is toggle on an entire Schema and monitors all the tables in the schema.
monitors only simple freshness and completeness quality controls
has no parametrization
still needs alerts to be set manually

I made a short recap here.

Feature	Lakehouse Monitoring	Data Quality Monitoring (Beta)
Scope	Table. It is set at table level. It monitors the table and its columns.	Schema. It is set at schema level and monitors all tables in such schema.
Setup	Choose the profile, eventual slicing, window and frequency.	On-off on the schema.
What is monitored	Various statistics as snapshot, time series, and inference.	Freshness (is data recent?) and completeness (is the volume as expected?)
Customization	Limited	No
Alert	To be set manually on the output table.	To be set manually on the output table.

My2C

🟢 I think Databricks is going in the right direction. Fast adoption of basic quality controls. Avoid the "didn't notice data is old in production" moments with little effort.

🔴 The alerting setup is still quite SQL-based and there is some trial-and-error around it. I would expect that a basic alert should be enabled by default.