Error when restarting Databricks streaming job

This is an error I encountered when I have a Spark Streaming job running on Databricks 6.1. Consider the case I have to update a running streaming query. Databricks recommends to always start (and restart too?) a streaming query on a new dedicated cluster. However, in some scenario you might not be able to do so, and you may want to:

  • cancel the job run
  • update the notebooks
  • restart the job run

By taking these steps, I encountered these error:

Concurrent update to the log. Multiple streaming jobs detected for ...

# or

Multiple streaming queries are concurrently using ... [checkpoint]

They did not occur every time I restarted the query, but most of the times. When restarting 2-3 times, the issue was solved and the streaming query run smoothly. By investigating a bit more the error, we found that cancelling a job run via Databricks CLI was not letting the stream query close smoothly. What happened? The running query was not closing cleanly the checkpoints. So, when a new job run started, it raised an error because it found a corrupted checkpoint.

Solution

You can

social