First off, I would advise against having dots in column names, thats
just playing with fire.
Second the exception is really strange since spark is complaining
about a completely unrelated column. I would like to see the df schema
before the exception was thrown.
--
Jan Sterba
https://twitter.com/h
It very much depends on the logic that generates the new rows. Is it
per row (i.e. without context?) then you can just convert to RDD and
perform a map operation on each row.
JavaPairRDD> grouped =
dataFrame.javaRDD().groupBy( group by what you need, probably ID );
return grouped.mapValues(rowsIt
so,
> typically the memory increments for YARN containers is 1GB.
>
>
>
> This gives a good overview:
> http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
>
>
>
> Thanks,
>
> Silvio
>
>
>
>
>
>
>
> From: Jan Š
Hello,
I am exprimenting with tuning an on demand spark-cluster on top of our
cloudera hadoop. I am running Cloudera 5.5.2 with Spark 1.5 right now
and I am running spark in yarn-client mode.
Right now my main experimentation is about spark.executor.memory
property and I have noticed a strange be
You could try creating a pull-request on github.
-Jan
--
Jan Sterba
https://twitter.com/honzasterba | http://flickr.com/honzasterba |
http://500px.com/honzasterba
On Wed, Mar 9, 2016 at 2:45 AM, Mohammed Guller wrote:
> Hi -
>
>
>
> The Spark documentation page (http://spark.apache.org/document
Hi Andy,
its nice to see that we are not the only ones with the same issues. So
far we have not gone as far as you have. What we have done is that we
cache whatever dataframes/rdds are shared foc computing different
output. This has brought us quite the speedup, but we still see that
saving some l
I dont know whats wrong but I can suggest looking up the source of the UDF
and debugging from there. I would think this is some JDK API cleveat and
not a Spark bug
--
Jan Sterba
https://twitter.com/honzasterba | http://flickr.com/honzasterba |
http://500px.com/honzasterba
On Fri, Mar 4, 2016 at 6
just use coalesce function
df.selectExpr("name", "coalesce(age, 0) as age")
--
Jan Sterba
https://twitter.com/honzasterba | http://flickr.com/honzasterba |
http://500px.com/honzasterba
On Fri, Feb 26, 2016 at 5:27 AM, Divya Gehlot
wrote:
> Hi,
> I have dataset which looks like below
> name age
5, 2016 at 4:28 AM, Jan Štěrba wrote:
>>
>> Hello,
>>
>> I have quite a weird behaviour that I can't quite wrap my head around.
>> I am running Spark on a Hadoop YARN cluster. I have Spark configured
>> in such a way that it utilizes all free vcores in the c
Hello,
I have quite a weird behaviour that I can't quite wrap my head around.
I am running Spark on a Hadoop YARN cluster. I have Spark configured
in such a way that it utilizes all free vcores in the cluster (setting
max vcores per executor and number of executors to use all vcores in
cluster).
10 matches
Mail list logo