Re: bug? using withColumn with colName with dot can't replace column

2016-03-15 Thread Jan Štěrba
First off, I would advise against having dots in column names, thats just playing with fire. Second the exception is really strange since spark is complaining about a completely unrelated column. I would like to see the df schema before the exception was thrown. -- Jan Sterba https://twitter.com/h

Re: adding rows to a DataFrame

2016-03-11 Thread Jan Štěrba
It very much depends on the logic that generates the new rows. Is it per row (i.e. without context?) then you can just convert to RDD and perform a map operation on each row. JavaPairRDD> grouped = dataFrame.javaRDD().groupBy( group by what you need, probably ID ); return grouped.mapValues(rowsIt

Re: Spark on YARN memory consumption

2016-03-11 Thread Jan Štěrba
so, > typically the memory increments for YARN containers is 1GB. > > > > This gives a good overview: > http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/ > > > > Thanks, > > Silvio > > > > > > > > From: Jan Š

Spark on YARN memory consumption

2016-03-11 Thread Jan Štěrba
Hello, I am exprimenting with tuning an on demand spark-cluster on top of our cloudera hadoop. I am running Cloudera 5.5.2 with Spark 1.5 right now and I am running spark in yarn-client mode. Right now my main experimentation is about spark.executor.memory property and I have noticed a strange be

Re: updating the Books section on the Spark documentation page

2016-03-08 Thread Jan Štěrba
You could try creating a pull-request on github. -Jan -- Jan Sterba https://twitter.com/honzasterba | http://flickr.com/honzasterba | http://500px.com/honzasterba On Wed, Mar 9, 2016 at 2:45 AM, Mohammed Guller wrote: > Hi - > > > > The Spark documentation page (http://spark.apache.org/document

Re: Saving multiple outputs in the same job

2016-03-08 Thread Jan Štěrba
Hi Andy, its nice to see that we are not the only ones with the same issues. So far we have not gone as far as you have. What we have done is that we cache whatever dataframes/rdds are shared foc computing different output. This has brought us quite the speedup, but we still see that saving some l

Re: 1.6.0 spark.sql datetime conversion problem

2016-03-05 Thread Jan Štěrba
I dont know whats wrong but I can suggest looking up the source of the UDF and debugging from there. I would think this is some JDK API cleveat and not a Spark bug -- Jan Sterba https://twitter.com/honzasterba | http://flickr.com/honzasterba | http://500px.com/honzasterba On Fri, Mar 4, 2016 at 6

Re: [Help]: DataframeNAfunction fill method throwing exception

2016-02-25 Thread Jan Štěrba
just use coalesce function df.selectExpr("name", "coalesce(age, 0) as age") -- Jan Sterba https://twitter.com/honzasterba | http://flickr.com/honzasterba | http://500px.com/honzasterba On Fri, Feb 26, 2016 at 5:27 AM, Divya Gehlot wrote: > Hi, > I have dataset which looks like below > name age

Re: Running executors missing in sparkUI

2016-02-25 Thread Jan Štěrba
5, 2016 at 4:28 AM, Jan Štěrba wrote: >> >> Hello, >> >> I have quite a weird behaviour that I can't quite wrap my head around. >> I am running Spark on a Hadoop YARN cluster. I have Spark configured >> in such a way that it utilizes all free vcores in the c

Running executors missing in sparkUI

2016-02-25 Thread Jan Štěrba
Hello, I have quite a weird behaviour that I can't quite wrap my head around. I am running Spark on a Hadoop YARN cluster. I have Spark configured in such a way that it utilizes all free vcores in the cluster (setting max vcores per executor and number of executors to use all vcores in cluster).