The simplest way is to do thread dump which doesn't require any fancy tool
(it's available on Spark UI).
Without thread dump it's hard to say anything...

> Here a is another tool I use Logic Analyser  7:55
> you could take some suggestions for improving performance  queries.
> This tool may be useful for you to trouble shoot your problems away.
> "APM tools typically use a waterfall-type view to show the blocking time
> of different components cascading through the control flow within an
> application.
> These types of visualizations are useful, and AppOptics has them, but they
> can be difficult to understand for those of us without a PhD."
> Especially  helpful if you want to understand through visualisation and
> you do not have a phD.
> You seem to be implying the error is intermittent.
> You seem to be implying data is being ingested  via JDBC. So the
> connection has proven itself to be working unless no data is arriving from
> the  JDBC channel at all.  If no data is arriving then one could say it
> could be  the JDBC.
> If the error is intermittent  then it is likely a resource involved in
> processing is filling to capacity.
> Try reducing the data ingestion volume and see if that completes, then
> increase the data ingested  incrementally.
> I assume you have  run the job on small amount of data so you have
> completed your prototype stage successfully.
> Hi,
> Have you checked your JDBC connections from Spark to Oracle. What is
> Oracle saying? Is it doing anything or hanging?
> set pagesize 9999
> set linesize 140
> set heading off
> select SUBSTR(name,1,8) || ' sessions as on '||TO_CHAR(CURRENT_DATE, 'MON
> DD YYYY HH:MI AM') from v$database;
> set heading on
> column spid heading "OS PID" format a6
> column process format a13 heading "Client ProcID"
> column username  format a15
> column sid       format 999
> column serial#   format 99999
> column STATUS    format a3 HEADING 'ACT'
> column last      format 9,999.99
> column TotGets   format 999,999,999,999 HEADING 'Logical I/O'
> column phyRds    format 999,999,999 HEADING 'Physical I/O'
> column total_memory format 999,999,999 HEADING 'MEM/KB'
> --
>           substr(a.username,1,15) "LOGIN"
>         , substr(a.sid,1,5) || ','||substr(a.serial#,1,5) AS "SID/serial#"
>         , TO_CHAR(a.logon_time, 'DD/MM HH:MI') "LOGGED IN SINCE"
>         , substr(a.machine,1,10) HOST
>         , substr(p.username,1,8)||'/'||substr(p.spid,1,5) "OS PID"
>         , substr(a.osuser,1,8)||'/'||substr(a.process,1,5) "Client PID"
>         , substr(a.program,1,15) PROGRAM
>         --,ROUND((CURRENT_DATE-a.logon_time)*24) AS "Logged/Hours"
>         , (
>                 select round(sum(ss.value)/1024) from v$sesstat ss,
> v$statname sn
>                 where ss.sid = a.sid and
>                         sn.statistic# = ss.statistic# and
>                         -- in ('session pga memory')
>                in ('session pga memory','session uga
> memory')
>           ) AS total_memory
>         , (b.block_gets + b.consistent_gets) TotGets
>         , b.physical_reads phyRds
>         , decode(a.status, 'ACTIVE', 'Y','INACTIVE', 'N') STATUS
>         , CASE WHEN a.sid in (select sid from v$mystat where rownum = 1)
> THEN '<-- YOU' ELSE ' ' END "INFO"
>          v$process p
>         ,v$session a
>         ,v$sess_io b
> a.paddr = p.addr
> AND p.background IS NULL
> --AND  a.sid NOT IN (select sid from v$mystat where rownum = 1)
> AND a.sid = b.sid
> AND a.username is not null
> --AND (a.last_call_et < 3600 or a.status = 'ACTIVE')
> --AND CURRENT_DATE - logon_time > 0
> --AND a.sid NOT IN ( select sid from v$mystat where rownum=1)  -- exclude
> me
> --AND (b.block_gets + b.consistent_gets) > 0
> ORDER BY a.username;
> exit
> On Fri, 10 Apr 2020 at 17:37, Ruijing Li <> wrote:
> Hi all,
> I am on spark 2.4.4 and using scala 2.11.12, and running cluster mode on
> mesos. I am ingesting from an oracle database using I am
> seeing a strange issue where spark just hangs and does nothing, not
> starting any new tasks. Normally this job finishes in 30 stages but
> sometimes it stops at 29 completed stages and doesn’t start the last stage.
> The spark job is idling and there is no pending or active task. What could
> be the problem? Thanks.
> Cheers,
> Ruijing Li

