No, that's not a thing to apologize for. It's just your call - less context would bring less reaction and interest.
On Wed, Apr 22, 2020 at 11:50 AM Ruijing Li <liruijin...@gmail.com> wrote: > I apologize, but I cannot share it, even if it is just typical spark > libraries. I definitely understand that limits debugging help, but wanted > to understand if anyone has encountered a similar issue. > > On Tue, Apr 21, 2020 at 7:12 PM Jungtaek Lim <kabhwan.opensou...@gmail.com> > wrote: > >> If there's no third party libraries in the dump then why not share the >> thread dump? (I mean, the output of jstack) >> >> stack trace would be more helpful to find which thing acquired lock and >> which other things are waiting for acquiring lock, if we suspect deadlock. >> >> On Wed, Apr 22, 2020 at 2:38 AM Ruijing Li <liruijin...@gmail.com> wrote: >> >>> After refreshing a couple of times, I notice the lock is being swapped >>> between these 3. The other 2 will be blocked by whoever gets this lock, in >>> a cycle of 160 has lock -> 161 -> 159 -> 160 >>> >>> On Tue, Apr 21, 2020 at 10:33 AM Ruijing Li <liruijin...@gmail.com> >>> wrote: >>> >>>> In thread dump, I do see this >>>> - SparkUI-160- acceptor-id-ServerConnector@id(HTTP/1.1) | RUNNABLE | >>>> Monitor >>>> - SparkUI-161-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED | >>>> Blocked by Thread(Some(160)) Lock >>>> - SparkUI-159-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED | >>>> Blocked by Thread(Some(160)) Lock >>>> >>>> Could the fact that 160 has the monitor but is not running be causing a >>>> deadlock preventing the job from finishing? >>>> >>>> I do see my Finalizer and main method are waiting. I don’t see any >>>> other threads from 3rd party libraries or my code in the dump. I do see >>>> spark context cleaner has timed waiting. >>>> >>>> Thanks >>>> >>>> >>>> On Tue, Apr 21, 2020 at 9:58 AM Ruijing Li <liruijin...@gmail.com> >>>> wrote: >>>> >>>>> Strangely enough I found an old issue that is the exact same issue as >>>>> mine >>>>> https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-18343 >>>>> >>>>> However I’m using spark 2.4.4 so the issue should have been solved by >>>>> now. >>>>> >>>>> Like the user in the jira issue I am using mesos, but I am reading >>>>> from oracle instead of writing to Cassandra and S3. >>>>> >>>>> >>>>> On Thu, Apr 16, 2020 at 1:54 AM ZHANG Wei <wezh...@outlook.com> wrote: >>>>> >>>>>> The Thread dump result table of Spark UI can provide some clues to >>>>>> find out thread locks issue, such as: >>>>>> >>>>>> Thread ID | Thread Name | Thread State | Thread >>>>>> Locks >>>>>> 13 | NonBlockingInputStreamThread | WAITING | Blocked >>>>>> by Thread Some(48) Lock(jline.internal.NonBlockingInputStream@103008951 >>>>>> }) >>>>>> 48 | Thread-16 | RUNNABLE | >>>>>> Monitor(jline.internal.NonBlockingInputStream@103008951}) >>>>>> >>>>>> And echo thread row can show the call stacks after being clicked, >>>>>> then you can check the root cause of holding locks like this(Thread 48 of >>>>>> above): >>>>>> >>>>>> org.fusesource.jansi.internal.Kernel32.ReadConsoleInputW(Native >>>>>> Method) >>>>>> >>>>>> org.fusesource.jansi.internal.Kernel32.readConsoleInputHelper(Kernel32.java:811) >>>>>> >>>>>> org.fusesource.jansi.internal.Kernel32.readConsoleKeyInput(Kernel32.java:842) >>>>>> >>>>>> org.fusesource.jansi.internal.WindowsSupport.readConsoleInput(WindowsSupport.java:97) >>>>>> jline.WindowsTerminal.readConsoleInput(WindowsTerminal.java:222) >>>>>> <snip...> >>>>>> >>>>>> Hope it can help you. >>>>>> >>>>>> -- >>>>>> Cheers, >>>>>> -z >>>>>> >>>>>> On Thu, 16 Apr 2020 16:36:42 +0900 >>>>>> Jungtaek Lim <kabhwan.opensou...@gmail.com> wrote: >>>>>> >>>>>> > Do thread dump continuously, per specific period (like 1s) and see >>>>>> the >>>>>> > change of stack / lock for each thread. (This is not easy to be >>>>>> done in UI >>>>>> > so maybe doing manually would be the only option. Not sure Spark UI >>>>>> will >>>>>> > provide the same, haven't used at all.) >>>>>> > >>>>>> > It will tell which thread is being blocked (even it's shown as >>>>>> running) and >>>>>> > which point to look at. >>>>>> > >>>>>> > On Thu, Apr 16, 2020 at 4:29 PM Ruijing Li <liruijin...@gmail.com> >>>>>> wrote: >>>>>> > >>>>>> > > Once I do. thread dump, what should I be looking for to tell >>>>>> where it is >>>>>> > > hanging? Seeing a lot of timed_waiting and waiting on driver. >>>>>> Driver is >>>>>> > > also being blocked by spark UI. If there are no tasks, is there a >>>>>> point to >>>>>> > > do thread dump of executors? >>>>>> > > >>>>>> > > On Tue, Apr 14, 2020 at 4:49 AM Gabor Somogyi < >>>>>> gabor.g.somo...@gmail.com> >>>>>> > > wrote: >>>>>> > > >>>>>> > >> The simplest way is to do thread dump which doesn't require any >>>>>> fancy >>>>>> > >> tool (it's available on Spark UI). >>>>>> > >> Without thread dump it's hard to say anything... >>>>>> > >> >>>>>> > >> >>>>>> > >> On Tue, Apr 14, 2020 at 11:32 AM jane thorpe >>>>>> <janethor...@aol.com.invalid> >>>>>> > >> wrote: >>>>>> > >> >>>>>> > >>> Here a is another tool I use Logic Analyser 7:55 >>>>>> > >>> https://youtu.be/LnzuMJLZRdU >>>>>> > >>> >>>>>> > >>> you could take some suggestions for improving performance >>>>>> queries. >>>>>> > >>> >>>>>> https://dzone.com/articles/why-you-should-not-use-select-in-sql-query-1 >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> Jane thorpe >>>>>> > >>> janethor...@aol.com >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> -----Original Message----- >>>>>> > >>> From: jane thorpe <janethor...@aol.com.INVALID> >>>>>> > >>> To: janethorpe1 <janethor...@aol.com>; mich.talebzadeh < >>>>>> > >>> mich.talebza...@gmail.com>; liruijing09 <liruijin...@gmail.com>; >>>>>> user < >>>>>> > >>> user@spark.apache.org> >>>>>> > >>> Sent: Mon, 13 Apr 2020 8:32 >>>>>> > >>> Subject: Re: Spark hangs while reading from jdbc - does nothing >>>>>> Removing >>>>>> > >>> Guess work from trouble shooting >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> This tool may be useful for you to trouble shoot your problems >>>>>> away. >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> https://www.javacodegeeks.com/2020/04/simplifying-apm-remove-the-guesswork-from-troubleshooting.html >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> "APM tools typically use a waterfall-type view to show the >>>>>> blocking >>>>>> > >>> time of different components cascading through the control flow >>>>>> within an >>>>>> > >>> application. >>>>>> > >>> These types of visualizations are useful, and AppOptics has >>>>>> them, but >>>>>> > >>> they can be difficult to understand for those of us without a >>>>>> PhD." >>>>>> > >>> >>>>>> > >>> Especially helpful if you want to understand through >>>>>> visualisation and >>>>>> > >>> you do not have a phD. >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> Jane thorpe >>>>>> > >>> janethor...@aol.com >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> -----Original Message----- >>>>>> > >>> From: jane thorpe <janethor...@aol.com.INVALID> >>>>>> > >>> To: mich.talebzadeh <mich.talebza...@gmail.com>; liruijing09 < >>>>>> > >>> liruijin...@gmail.com>; user <user@spark.apache.org> >>>>>> > >>> CC: user <user@spark.apache.org> >>>>>> > >>> Sent: Sun, 12 Apr 2020 4:35 >>>>>> > >>> Subject: Re: Spark hangs while reading from jdbc - does nothing >>>>>> > >>> >>>>>> > >>> You seem to be implying the error is intermittent. >>>>>> > >>> You seem to be implying data is being ingested via JDBC. So the >>>>>> > >>> connection has proven itself to be working unless no data is >>>>>> arriving from >>>>>> > >>> the JDBC channel at all. If no data is arriving then one >>>>>> could say it >>>>>> > >>> could be the JDBC. >>>>>> > >>> If the error is intermittent then it is likely a resource >>>>>> involved in >>>>>> > >>> processing is filling to capacity. >>>>>> > >>> Try reducing the data ingestion volume and see if that >>>>>> completes, then >>>>>> > >>> increase the data ingested incrementally. >>>>>> > >>> I assume you have run the job on small amount of data so you >>>>>> have >>>>>> > >>> completed your prototype stage successfully. >>>>>> > >>> >>>>>> > >>> ------------------------------ >>>>>> > >>> On Saturday, 11 April 2020 Mich Talebzadeh < >>>>>> mich.talebza...@gmail.com> >>>>>> > >>> wrote: >>>>>> > >>> Hi, >>>>>> > >>> >>>>>> > >>> Have you checked your JDBC connections from Spark to Oracle. >>>>>> What is >>>>>> > >>> Oracle saying? Is it doing anything or hanging? >>>>>> > >>> >>>>>> > >>> set pagesize 9999 >>>>>> > >>> set linesize 140 >>>>>> > >>> set heading off >>>>>> > >>> select SUBSTR(name,1,8) || ' sessions as on >>>>>> '||TO_CHAR(CURRENT_DATE, >>>>>> > >>> 'MON DD YYYY HH:MI AM') from v$database; >>>>>> > >>> set heading on >>>>>> > >>> column spid heading "OS PID" format a6 >>>>>> > >>> column process format a13 heading "Client ProcID" >>>>>> > >>> column username format a15 >>>>>> > >>> column sid format 999 >>>>>> > >>> column serial# format 99999 >>>>>> > >>> column STATUS format a3 HEADING 'ACT' >>>>>> > >>> column last format 9,999.99 >>>>>> > >>> column TotGets format 999,999,999,999 HEADING 'Logical I/O' >>>>>> > >>> column phyRds format 999,999,999 HEADING 'Physical I/O' >>>>>> > >>> column total_memory format 999,999,999 HEADING 'MEM/KB' >>>>>> > >>> -- >>>>>> > >>> SELECT >>>>>> > >>> substr(a.username,1,15) "LOGIN" >>>>>> > >>> , substr(a.sid,1,5) || ','||substr(a.serial#,1,5) AS >>>>>> > >>> "SID/serial#" >>>>>> > >>> , TO_CHAR(a.logon_time, 'DD/MM HH:MI') "LOGGED IN SINCE" >>>>>> > >>> , substr(a.machine,1,10) HOST >>>>>> > >>> , substr(p.username,1,8)||'/'||substr(p.spid,1,5) "OS >>>>>> PID" >>>>>> > >>> , substr(a.osuser,1,8)||'/'||substr(a.process,1,5) >>>>>> "Client PID" >>>>>> > >>> , substr(a.program,1,15) PROGRAM >>>>>> > >>> --,ROUND((CURRENT_DATE-a.logon_time)*24) AS >>>>>> "Logged/Hours" >>>>>> > >>> , ( >>>>>> > >>> select round(sum(ss.value)/1024) from v$sesstat >>>>>> ss, >>>>>> > >>> v$statname sn >>>>>> > >>> where ss.sid = a.sid and >>>>>> > >>> sn.statistic# = ss.statistic# and >>>>>> > >>> -- sn.name in ('session pga memory') >>>>>> > >>> sn.name in ('session pga >>>>>> memory','session uga >>>>>> > >>> memory') >>>>>> > >>> ) AS total_memory >>>>>> > >>> , (b.block_gets + b.consistent_gets) TotGets >>>>>> > >>> , b.physical_reads phyRds >>>>>> > >>> , decode(a.status, 'ACTIVE', 'Y','INACTIVE', 'N') STATUS >>>>>> > >>> , CASE WHEN a.sid in (select sid from v$mystat where >>>>>> rownum = 1) >>>>>> > >>> THEN '<-- YOU' ELSE ' ' END "INFO" >>>>>> > >>> FROM >>>>>> > >>> v$process p >>>>>> > >>> ,v$session a >>>>>> > >>> ,v$sess_io b >>>>>> > >>> WHERE >>>>>> > >>> a.paddr = p.addr >>>>>> > >>> AND p.background IS NULL >>>>>> > >>> --AND a.sid NOT IN (select sid from v$mystat where rownum = 1) >>>>>> > >>> AND a.sid = b.sid >>>>>> > >>> AND a.username is not null >>>>>> > >>> --AND (a.last_call_et < 3600 or a.status = 'ACTIVE') >>>>>> > >>> --AND CURRENT_DATE - logon_time > 0 >>>>>> > >>> --AND a.sid NOT IN ( select sid from v$mystat where rownum=1) >>>>>> -- >>>>>> > >>> exclude me >>>>>> > >>> --AND (b.block_gets + b.consistent_gets) > 0 >>>>>> > >>> ORDER BY a.username; >>>>>> > >>> exit >>>>>> > >>> >>>>>> > >>> HTH >>>>>> > >>> >>>>>> > >>> Dr Mich Talebzadeh >>>>>> > >>> >>>>>> > >>> LinkedIn * >>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>> > >>> < >>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>> >* >>>>>> > >>> >>>>>> > >>> http://talebzadehmich.wordpress.com >>>>>> > >>> >>>>>> > >>> *Disclaimer:* Use it at your own risk. Any and all >>>>>> responsibility for >>>>>> > >>> any loss, damage or destruction of data or any other property >>>>>> which may >>>>>> > >>> arise from relying on this email's technical content is >>>>>> explicitly >>>>>> > >>> disclaimed. The author will in no case be liable for any >>>>>> monetary damages >>>>>> > >>> arising from such loss, damage or destruction. >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> On Fri, 10 Apr 2020 at 17:37, Ruijing Li <liruijin...@gmail.com> >>>>>> wrote: >>>>>> > >>> >>>>>> > >>> Hi all, >>>>>> > >>> >>>>>> > >>> I am on spark 2.4.4 and using scala 2.11.12, and running >>>>>> cluster mode on >>>>>> > >>> mesos. I am ingesting from an oracle database using >>>>>> spark.read.jdbc. I am >>>>>> > >>> seeing a strange issue where spark just hangs and does nothing, >>>>>> not >>>>>> > >>> starting any new tasks. Normally this job finishes in 30 stages >>>>>> but >>>>>> > >>> sometimes it stops at 29 completed stages and doesn’t start the >>>>>> last stage. >>>>>> > >>> The spark job is idling and there is no pending or active task. >>>>>> What could >>>>>> > >>> be the problem? Thanks. >>>>>> > >>> -- >>>>>> > >>> Cheers, >>>>>> > >>> Ruijing Li >>>>>> > >>> >>>>>> > >>> -- >>>>>> > > Cheers, >>>>>> > > Ruijing Li >>>>>> > > >>>>>> >>>>> -- >>>>> Cheers, >>>>> Ruijing Li >>>>> >>>> -- >>>> Cheers, >>>> Ruijing Li >>>> >>> -- >>> Cheers, >>> Ruijing Li >>> >> -- > Cheers, > Ruijing Li >