Monday, September 1, 2025

Spark : Cancelling potential speculative or zombie tasks for this job

 

>PROBLEM

Running a long iteration, Spark was returning messages like these:

[dag-scheduler-event-loop] INFO org.apache.spark.scheduler.DAGScheduler - ResultStage 433 (save at PostgresRepository.java:64) finished in 328 ms

[dag-scheduler-event-loop] INFO org.apache.spark.scheduler.DAGScheduler - Job 433 is finished. Cancelling potential speculative or zombie tasks for this job

[dag-scheduler-event-loop] INFO org.apache.spark.scheduler.TaskSchedulerImpl - Canceling stage 433

[dag-scheduler-event-loop] INFO org.apache.spark.scheduler.TaskSchedulerImpl - Killing all running tasks in stage 433: Stage finished


>SOLUTION


It was due to the SparkSession’s memory leak.

A zombie remembers C/C++ programming when the reference is lost, or Java programming when the resource is not closed.

The SparkSession must be treated like a Java resource.

Rule of thumb: 

Make sure that you are not returning any Spark’s object after closing a SparkSession.


TIP:
If it is necessary to return data among different sessions, convert to a POJO.
For instance, Dataset<>  to   MyPojo.

>ENV

Spark 2.x
Java 17

Spark : Cancelling potential speculative or zombie tasks for this job

  >PROBLEM Running a long iteration, Spark was returning messages like these: [dag-scheduler-event-loop] INFO org.apache.spark.scheduler....