The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +772K…

Follow publication

Member-only story

How to Debug Queries by Just Using Spark UI

Cinto
The Startup
Published in
9 min readAug 23, 2020

--

Spark is the most widely used big data computation engine, capable of running jobs on petabytes of data. Spark provides a suite of web user interfaces (UIs) that you can use to monitor the status and resource consumption of your Spark cluster. Most of the issues that we encounter while running a job can be debugged by heading to the spark UI.

spark2-shell --queue=P0 --num-executors 20Spark context Web UI available at http://<hostname>:<port>
Spark context available as 'sc'
Spark session available as 'spark'

In this document, I will try to showcase how to debug a spark job just by using the Spark UI. I will run a few Spark jobs and show how the Spark UI reflects the run of the job. I will also add some tips and tricks along the way

This is how a Spark UI looks like

We will start with the SQL tab, which encompasses a lot of info to do an initial review. If using RDDs, you may not see the SQL tab in some cases.

Here is a query I ran for reference

spark.sql("select id, count(1) from table1 group by id”).show(10, false)

--

--

The Startup
The Startup

Published in The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +772K followers.

Cinto
Cinto

Written by Cinto

An engineer, a keen observer, writer about tech, life improvement, motivation, humor, and more. Hit the follow button if you want a weekly dose of awesomeness.

Responses (1)

Write a response