Re: LIMIT issue of SparkSQL

2016-10-29 Thread Asher Krim
We have also found LIMIT to take an unacceptable amount of time when reading parquet formatted data from s3. LIMIT was not strictly needed for our usecase, so we worked around it -- Asher Krim Senior Software Engineer On Fri, Oct 28, 2016 at 5:36 AM, Liz Bai wrote: > Sorry

Out Of Memory issue

2016-10-29 Thread Kürşat Kurt
Hi; While training NaiveBayes classification, i am getting OOM. What is wrong with these parameters? Here is the spark-submit command: ./spark-submit --class main.scala.Test1 --master local[*] --driver-memory 60g /home/user1/project_2.11-1.0.jar Ps: Os is Ubuntu 14.04 and system has

Re: Reason for Kafka topic existence check / "Does the topic exist?" error

2016-10-29 Thread Cody Koeninger
I tested your claims that "it used to work that way", and was unable to reproduce them. As far as I can tell, streams have always failed the very first time you start them in that situation. As Chris and I pointed out, there are good reasons for that. If you don't wan't to operationalize topic

Re: spark dataframe rolling window for user define operation

2016-10-29 Thread ayan guha
Avg is an aggregation function. You need to write XYZ as user defined aggregate function (UDAF). On Sat, Oct 29, 2016 at 9:28 PM, Manjunath, Kiran wrote: > Is there a way to get user defined operation to be used for rolling window > operation? > > > > Like – Instead of > >

Re: Reason for Kafka topic existence check / "Does the topic exist?" error

2016-10-29 Thread Dmitry Goldenberg
Cody, Thanks for your comments. The way I'm reading the Kafka documentation ( https://kafka.apache.org/documentation) is that auto.create.topics.enable is set to true by default. Right now it's not set in our server.properties on the Kafka broker side so I would imagine that the first request to

Re: Spark 2.0 with Hadoop 3.0?

2016-10-29 Thread Steve Loughran
On 27 Oct 2016, at 23:04, adam kramer > wrote: Is the version of Spark built for Hadoop 2.7 and later only for 2.x releases? Is there any reason why Hadoop 3.0 is a non-starter for use with Spark 2.0? The version of aws-sdk in 3.0 actually works for

spark-submit fails after setting userClassPathFirst to true

2016-10-29 Thread sudhir patil
After i set spark.driver.userClassPathFirst=true, my spark-submit --master yarn-client fails with below error & it works fine if i remove userClassPathFirst setting. I need to add this setting to avoid class conflicts in some other job so trying to make it this setting work in simple job first &

Re: java.lang.OutOfMemoryError: unable to create new native thread

2016-10-29 Thread kant kodali
Another thing I forgot to mention is that it happens after running for several hours say (4 to 5 hours) I am not sure why it is creating so many threads? any way to control them? On Fri, Oct 28, 2016 at 12:47 PM, kant kodali wrote: > "dag-scheduler-event-loop"

spark dataframe rolling window for user define operation

2016-10-29 Thread Manjunath, Kiran
Is there a way to get user defined operation to be used for rolling window operation? Like – Instead of val wSpec1 = Window.orderBy("c1").rowsBetween(-20, +20) var dfWithMovingAvg = df.withColumn( "Avg",avg(df("c2")).over(wSpec1)) Something like val wSpec1 =