Mich: The OutOfOrderScannerNextException indicated problem with read from hbase.
How did you know connection to Spark cluster was lost ? Cheers On Fri, Oct 28, 2016 at 8:47 AM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Looks like it lost the connection to Spark cluster. > > What mode you are using with Spark, Standalone, Yarn or others. The issue > looks like a resource manager issue. > > I have seen this when running Zeppelin with Spark on Hbase. > > HTH > > Dr Mich Talebzadeh > > > > LinkedIn * https://www.linkedin.com/profile/view?id= > AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd > OABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 28 October 2016 at 16:38, Pat Ferrel <p...@occamsmachete.com> wrote: > > > I’m getting data from HBase using a large Spark cluster with parallelism > > of near 400. The query fails quire often with the message below. > Sometimes > > a retry will work and sometimes the ultimate failure results (below). > > > > If I reduce parallelism in Spark it slows other parts of the algorithm > > unacceptably. I have also experimented with very large RPC/Scanner > timeouts > > of many minutes—to no avail. > > > > Any clues about what to look for or what may be setup wrong in my tables? > > > > Job aborted due to stage failure: Task 44 in stage 147.0 failed 4 times, > > most recent failure: Lost task 44.3 in stage 147.0 (TID 24833, > > ip-172-16-3-9.eu-central-1.compute.internal): org.apache.hadoop.hbase. > DoNotRetryIOException: > > Failed after retry of OutOfOrderScannerNextException: was there a rpc > > timeout?+details > > Job aborted due to stage failure: Task 44 in stage 147.0 failed 4 times, > > most recent failure: Lost task 44.3 in stage 147.0 (TID 24833, > > ip-172-16-3-9.eu-central-1.compute.internal): org.apache.hadoop.hbase. > DoNotRetryIOException: > > Failed after retry of OutOfOrderScannerNextException: was there a rpc > > timeout? at org.apache.hadoop.hbase.client.ClientScanner.next( > ClientScanner.java:403) > > at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue( > > TableRecordReaderImpl.java:232) at org.apache.hadoop.hbase. > > mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138) at > > >