It can be large yes. But, that still does not resolve the question of why it works in smaller environment, i.e. Local[32] or in cluster mode when using SQLContext instead of HiveContext.
The process in general, is a RowNumber() hiveQL operation, that is why I need HiveContext. I have the feeling there is something wrong with HiveContext. I dont have a Hive Hadoop database, I only enabled HiveContext to use its functions in my JSON loaded dataframe. I am new at spark, please dont hesitate to ask for more information as I still not sure what would be relevant. Saif -----Original Message----- From: Sean Owen [mailto:so...@cloudera.com] Sent: Wednesday, October 07, 2015 2:38 PM To: Ellafi, Saif A. Cc: user Subject: Re: Spark standalone hangup during shuffle flatMap or explode in cluster -dev Is r.getInt(ind) very large in some cases? I think there's not quite enough info here. On Wed, Oct 7, 2015 at 6:23 PM, <saif.a.ell...@wellsfargo.com> wrote: > When running stand-alone cluster mode job, the process hangs up > randomly during a DataFrame flatMap or explode operation, in HiveContext: > > -->> df.flatMap(r => for (n <- 1 to r.getInt(ind)) yield r) > > This does not happen either with SQLContext in cluster, or Hive/SQL in > local mode, where it works fine. > > A couple minutes after the hangup, executors start dropping. I am > attching the logs Saif > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For > additional commands, e-mail: user-h...@spark.apache.org