Is Lakehouse Monitoring worth it?

I've created a toy Lakehouse Monitoring in Databricks setup to explore its features and capabilities. The goal is to understand how it works and what benefits it can bring. Here's an overview of what I cover in this post:

If you want to know more about what Databricks' Lakehouse Monitoring can do, I recommend checking out the official documentation. I have prepared a basic map of concepts that can help you get started.

Lakehouse Monitoring Concepts Map

How to Setup a toy Lakehouse Monitoring

Let's start by creating a table we can work with. It should be a time-series table

create table workspace.default.sales (
    timestamp TIMESTAMP,
    amount DOUBLE
)

I then create a basic notebook insert 1h of data.ipynb to fill table with data. Then, setup a job to run that notebook every hour.

I'll not add the code here because it is quite basic. It randomly adds records to the table with random values (within the time windown of the hour).

select * from workspace.default.sales
limit 10
timestampamount
2025-08-22T08:58:07.929Z22.570402586080093
2025-08-22T08:51:51.929Z20.713874028846366
2025-08-22T09:03:54.929Z21.97633174572098
2025-08-22T08:28:44.929Z27.94416169489641
2025-08-22T09:05:29.929Z21.307407500066127
2025-08-22T08:17:03.929Z22.37476392747984
2025-08-22T09:05:05.929Z26.446829879517953
2025-08-22T08:42:32.929Z27.86840740526422
2025-08-22T08:33:55.929Z27.236961570798947
2025-08-22T08:42:30.929Z25.395336538015343

Then, let's create the Monitor via Unity Catalog Explorer 👇

I set up the monitor as TimeSeries profile. I pointed out the timestamp column and a granularity of 1 hour. The schedule of the monitor is actually daily.

Below, a screenshot of the Unity Catalog Explorer page to create the Lakehouse Monitoring.

Screenshot of Unity Catalog Explorer page to create the Lakehouse Monitoring

What happens after the creation of the Monitoring? By default, two new tables are created

  • <table_name>_profile_metrics
  • <table_name>_drift_metrics

Let's inspect them

SHOW TABLES IN workspace.default;
databasetableNameisTemporary
defaultsalesfalse
defaultsales_drift_metricsfalse
defaultsales_profile_metricsfalse
_sqldftrue
select * from workspace.default.sales_profile_metrics
windowlog_typelogging_table_commit_versionmonitor_versiongranularityslice_keyslice_valuecolumn_namecountdata_typenum_nullsavgminmaxstddevnum_zerosnum_nanmin_lengthmax_lengthavg_lengthnon_null_columnsfrequent_itemsmediandistinct_countpercent_nanpercent_nullpercent_zerospercent_distinct
List(2025-08-22T08:00:00.000Z, 2025-08-22T09:00:00.000Z)INPUT2601 hournullnull:table1344nullnullnullnullnullnullnullnullnullnullnullList(timestamp, amount)nullnullnullnullnullnullnull
List(2025-08-22T08:00:00.000Z, 2025-08-22T09:00:00.000Z)INPUT2601 hournullnullamount1344double025.05904297985587820.000718512001729.995001432166462.881700371462285700nullnullnullnullnull25.13380830617460812770.00.00.095.01488095238095
List(2025-08-22T08:00:00.000Z, 2025-08-22T09:00:00.000Z)INPUT2601 hournullnulltimestamp1344timestamp0null1.755850122929519E91.755853196019946E9nullnullnullnullnullnullnullnull1.755851952929519E91161null0.0null86.38392857142857
List(2025-08-22T09:00:00.000Z, 2025-08-22T10:00:00.000Z)INPUT2601 hournullnulltimestamp1192timestamp0null1.755853200929519E91.755856792565437E9nullnullnullnullnullnullnullnull1.755854777019946E9997null0.0null83.64093959731544
List(2025-08-22T09:00:00.000Z, 2025-08-22T10:00:00.000Z)INPUT2601 hournullnull:table1192nullnullnullnullnullnullnullnullnullnullnullList(timestamp, amount)nullnullnullnullnullnullnull
List(2025-08-22T09:00:00.000Z, 2025-08-22T10:00:00.000Z)INPUT2601 hournullnullamount1192double024.84752648737307420.0106358118119529.995393987905982.888016045650086700nullnullnullnullnull24.69426702521223411920.00.00.0100.0
List(2025-08-22T10:00:00.000Z, 2025-08-22T11:00:00.000Z)INPUT2601 hournullnullamount941double024.96905427778495220.01593651570321729.9810719305024022.84677323715042700nullnullnullnullnull24.9699209531859559250.00.00.098.29968119022317
List(2025-08-22T10:00:00.000Z, 2025-08-22T11:00:00.000Z)INPUT2601 hournullnulltimestamp941timestamp0null1.755856803565437E91.755860399121952E9nullnullnullnullnullnullnullnull1.755858763121952E9857null0.0null91.07332624867162
List(2025-08-22T10:00:00.000Z, 2025-08-22T11:00:00.000Z)INPUT2601 hournullnull:table941nullnullnullnullnullnullnullnullnullnullnullList(timestamp, amount)nullnullnullnullnullnullnull
List(2025-08-22T11:00:00.000Z, 2025-08-22T12:00:00.000Z)INPUT2601 hournullnull:table995nullnullnullnullnullnullnullnullnullnullnullList(timestamp, amount)nullnullnullnullnullnullnull

The profile table has a row for each pair

  • window (the beginning and end of every hour)
  • column_name every column of the table. In addition, it adds a special row :table to compute the table-level profile.

Optionally, it can slice on column values when specified at the time of the creation of the Monitor

For each row, it computes a bunch of statistics like avg, quantiles, min, max, etc. (when applicable, eg for float columns).

select * from workspace.default.sales_drift_metrics
windowgranularitymonitor_versionslice_keyslice_valuecolumn_namedata_typewindow_cmpdrift_typecount_deltaavg_deltapercent_null_deltapercent_zeros_deltapercent_distinct_deltanon_null_columns_deltajs_distanceks_testwasserstein_distancepopulation_stability_indexchi_squared_testtv_distancel_infinity_distance
List(2025-08-22T11:00:00.000Z, 2025-08-22T12:00:00.000Z)1 hour0nullnull:tablenullList(2025-08-22T10:00:00.000Z, 2025-08-22T11:00:00.000Z)CONSECUTIVE-418nullnullnullnullList(0, 0)nullnullnullnullnullnullnull
List(2025-08-22T09:00:00.000Z, 2025-08-22T10:00:00.000Z)1 hour0nullnull:tablenullList(2025-08-22T08:00:00.000Z, 2025-08-22T09:00:00.000Z)CONSECUTIVE-152nullnullnullnullList(0, 0)nullnullnullnullnullnullnull
List(2025-08-22T10:00:00.000Z, 2025-08-22T11:00:00.000Z)1 hour0nullnull:tablenullList(2025-08-22T09:00:00.000Z, 2025-08-22T10:00:00.000Z)CONSECUTIVE-251nullnullnullnullList(0, 0)nullnullnullnullnullnullnull
List(2025-08-22T11:00:00.000Z, 2025-08-22T12:00:00.000Z)1 hour0nullnulltimestamptimestampList(2025-08-22T10:00:00.000Z, 2025-08-22T11:00:00.000Z)CONSECUTIVE-418null0.0null-5.604875005841791nullnullnullnullnullnullnullnull
List(2025-08-22T09:00:00.000Z, 2025-08-22T10:00:00.000Z)1 hour0nullnulltimestamptimestampList(2025-08-22T08:00:00.000Z, 2025-08-22T09:00:00.000Z)CONSECUTIVE-152null0.0null-2.742988974113132nullnullnullnullnullnullnullnull
List(2025-08-22T10:00:00.000Z, 2025-08-22T11:00:00.000Z)1 hour0nullnulltimestamptimestampList(2025-08-22T09:00:00.000Z, 2025-08-22T10:00:00.000Z)CONSECUTIVE-251null0.0null7.4323866513561825nullnullnullnullnullnullnullnull
List(2025-08-22T11:00:00.000Z, 2025-08-22T12:00:00.000Z)1 hour0nullnullamountdoubleList(2025-08-22T10:00:00.000Z, 2025-08-22T11:00:00.000Z)CONSECUTIVE-4180.0132769907510663640.00.0-0.21172707932451829nullnullList(0.049, 0.3829208808885818)0.160589393778678950.028216393260236203nullnullnull
List(2025-08-22T10:00:00.000Z, 2025-08-22T11:00:00.000Z)1 hour0nullnullamountdoubleList(2025-08-22T09:00:00.000Z, 2025-08-22T10:00:00.000Z)CONSECUTIVE-2510.121527790411878560.00.0-1.7003188097768316nullnullList(0.038, 0.4228041687817168)0.144816173049027720.021676869284417942nullnullnull
List(2025-08-22T09:00:00.000Z, 2025-08-22T10:00:00.000Z)1 hour0nullnullamountdoubleList(2025-08-22T08:00:00.000Z, 2025-08-22T09:00:00.000Z)CONSECUTIVE-152-0.211516492482804350.00.04.985119047619051nullnullList(0.062, 0.014853915612309707)0.212316459820231840.022438267042335924nullnullnull

The drift table is similar to the profile table. The drift table has a row for each pair

  • window (the beginning and end of every hour)
  • column_name every column of the table. In addition, it adds a special row :table to compute the table-level profile.

In addition, it has the window_cmp, where cmp stands for compare. All the statistics are compared against another window (the previous one). There are various statistics like

  • count_delta
  • ks_test, in statistics, the Kolmogorov–Smirnov can be used to test whether two samples came from the same distribution

Dashboard

Lakehouse Monitoring creates also a dashboard automatically that displays the data in these profile and drift tables.

😓 However, I find this dashboard too crowded and not ready to use. You need to work on it to customize it by yourself.

Alerts

Monitor alerts are created and used the same way as other Databricks SQL alerts. You create a Databricks SQL query on the monitor profile metrics table or drift metrics table. You then create a Databricks SQL alert for this query.

Pricing

Lakehouse Monitoring is billed under a serverless jobs SKU. You can monitor its usage via system.billing.usage table or via the Usage dashboard at Account console.

You need to pay attention. I expect that the costs may rise for columns with high number of columns if you don't finetune the monitor.

SELECT usage_date, sum(usage_quantity) as dbus
FROM system.billing.usage
WHERE
  usage_date >= DATE_SUB(current_date(), 30) AND
  sku_name like "%JOBS_SERVERLESS%" AND
  custom_tags["LakehouseMonitoring"] = "true"
GROUP BY usage_date
ORDER BY usage_date DESC
usage_datedbus
2025-08-221.852757467777777736

My opinion on what I've seen

Lakehouse monitoring is all about these two profile and drift tables. It is a kind of brute force approach that runs standardized monitoring over the specified table and stores the output in the profiling tables. Is it convenient? It depends on what you're looking for. It is not a free lunch.

Pros 🟢

  • It takes little effort to setup. By default common controls are applied to all columns in the monitored table.
  • Most common monitoring scenarios are covered by TimeSeries profile or by Snapshot profile (I left apart the inference-ML for the sake of simplicity). The setup time is shorter when compared to anything made by yourself.
  • You have a framework ready to use. You save the time required designing it, and you avoid reinventing the wheel. You can focus on your business needs rather than on data engineering stuff.
  • I like the simple but effective design of the drift metric table and of the windowing. Making something like this by yourself will probably let you hit against some hidden edge-case (like anytime you work with time and dates).

Cons 🔴

  • Once the metrics are computed in the profile and drift tables, only half of the job is done. You still have to decide what to monitor and how to do it. You're probably not interested in monitor any single column in any row of the metric tables (otherwise you may alerted by too many false alarms). A finetuning of the actual alerts is still required, and it is not coming for free.
  • You can't know in advance the overall cost of the monitoring. You need to try with a realistic (production-alike) scenario and monitor soon how much you're paying. I expect it to depend mainly on
    • the data volume
    • the columns in the table
    • the frequency of the controls

Where's Databricks going?

In addition to Lakehouse Monitoring, Databricks has released a feature (in Beta) of data quality monitoring. This new monitoring

  • is quicker to setup. It is toggle on an entire Schema and monitors all the tables in the schema.
  • monitors only simple freshness and completeness quality controls
  • has no parametrization
  • still needs alerts to be set manually

I made a short recap here.

Feature Lakehouse Monitoring Data Quality Monitoring (Beta)
Scope Table. It is set at table level. It monitors the table and its columns. Schema. It is set at schema level and monitors all tables in such schema.
Setup Choose the profile, eventual slicing, window and frequency. On-off on the schema.
What is monitored Various statistics as snapshot, time series, and inference. Freshness (is data recent?) and completeness (is the volume as expected?)
Customization Limited No
Alert To be set manually on the output table. To be set manually on the output table.

My2C

🟢 I think Databricks is going in the right direction. Fast adoption of basic quality controls. Avoid the "didn't notice data is old in production" moments with little effort.

🔴 The alerting setup is still quite SQL-based and there is some trial-and-error around it. I would expect that a basic alert should be enabled by default.

social