Member-only story
How to Debug Queries by Just Using Spark UI
You already have the thing you need to debug a query
Spark is the most widely used big data computation engine, capable of running jobs on petabytes of data. Spark provides a suite of web user interfaces (UIs) that you can use to monitor the status and resource consumption of your Spark cluster. Most of the issues that we encounter while running a job can be debugged by heading to the spark UI.
spark2-shell --queue=P0 --num-executors 20Spark context Web UI available at http://<hostname>:<port>
Spark context available as 'sc'
Spark session available as 'spark'
In this document, I will try to showcase how to debug a spark job just by using the Spark UI. I will run a few Spark jobs and show how the Spark UI reflects the run of the job. I will also add some tips and tricks along the way
This is how a Spark UI looks like
We will start with the SQL tab, which encompasses a lot of info to do an initial review. If using RDDs, you may not see the SQL tab in some cases.
Here is a query I ran for reference
spark.sql("select id, count(1) from table1 group by id”).show(10, false)
