We have also found LIMIT to take an unacceptable amount of time when
reading parquet formatted data from s3.
LIMIT was not strictly needed for our usecase, so we worked around it
--
Asher Krim
Senior Software Engineer
On Fri, Oct 28, 2016 at 5:36 AM, Liz Bai wrote:
> Sorry
Hi;
While training NaiveBayes classification, i am getting OOM.
What is wrong with these parameters?
Here is the spark-submit command: ./spark-submit --class main.scala.Test1
--master local[*] --driver-memory 60g /home/user1/project_2.11-1.0.jar
Ps: Os is Ubuntu 14.04 and system has
I tested your claims that "it used to work that way", and was unable
to reproduce them. As far as I can tell, streams have always failed
the very first time you start them in that situation. As Chris and I
pointed out, there are good reasons for that.
If you don't wan't to operationalize topic
Avg is an aggregation function. You need to write XYZ as user defined
aggregate function (UDAF).
On Sat, Oct 29, 2016 at 9:28 PM, Manjunath, Kiran
wrote:
> Is there a way to get user defined operation to be used for rolling window
> operation?
>
>
>
> Like – Instead of
>
>
Cody,
Thanks for your comments.
The way I'm reading the Kafka documentation (
https://kafka.apache.org/documentation) is that auto.create.topics.enable
is set to true by default. Right now it's not set in our server.properties
on the Kafka broker side so I would imagine that the first request to
On 27 Oct 2016, at 23:04, adam kramer
> wrote:
Is the version of Spark built for Hadoop 2.7 and later only for 2.x releases?
Is there any reason why Hadoop 3.0 is a non-starter for use with Spark
2.0? The version of aws-sdk in 3.0 actually works for
After i set spark.driver.userClassPathFirst=true, my spark-submit --master
yarn-client fails with below error & it works fine if i remove
userClassPathFirst setting. I need to add this setting to avoid class
conflicts in some other job so trying to make it this setting work in
simple job first &
Another thing I forgot to mention is that it happens after running for
several hours say (4 to 5 hours) I am not sure why it is creating so many
threads? any way to control them?
On Fri, Oct 28, 2016 at 12:47 PM, kant kodali wrote:
> "dag-scheduler-event-loop"
Is there a way to get user defined operation to be used for rolling window
operation?
Like – Instead of
val wSpec1 = Window.orderBy("c1").rowsBetween(-20, +20)
var dfWithMovingAvg = df.withColumn( "Avg",avg(df("c2")).over(wSpec1))
Something like
val wSpec1 =