OutOfDirectMemoryError for Spark 2.2

2018-03-05 Thread Chawla,Sumit
Hi All I have a job which processes a large dataset. All items in the dataset are unrelated. To save on cluster resources, I process these items in chunks. Since chunks are independent of each other, I start and shut down the spark context for each chunk. This allows me to keep DAG smaller

Re: Silencing messages from Ivy when calling spark-submit

2018-03-05 Thread Nicholas Chammas
Oh, I didn't know about that. I think that will do the trick. Would you happen to know what setting I need? I'm looking here , but it's a bit overwhelming. I'm basically looking for a way to set the overall Ivy log level to WARN or

Re: Welcoming some new committers

2018-03-05 Thread Seth Hendrickson
Thanks all! :D On Mon, Mar 5, 2018 at 9:01 AM, Bryan Cutler wrote: > Thanks everyone, this is very exciting! I'm looking forward to working > with you all and helping out more in the future. Also, congrats to the > other committers as well!! >

Re: [Spark][Scheduler] Spark DAGScheduler scheduling performance hindered on JobSubmitted Event

2018-03-05 Thread Reynold Xin
Rather than using a separate thread pool, perhaps we can just move the prep code to the call site thread? On Sun, Mar 4, 2018 at 11:15 PM, Ajith shetty wrote: > DAGScheduler becomes a bottleneck in cluster when multiple JobSubmitted > events has to be processed as

Re: Silencing messages from Ivy when calling spark-submit

2018-03-05 Thread Bryan Cutler
Hi Nick, Not sure about changing the default to warnings only because I think some might find the resolution output useful, but you can specify your own ivy settings file with "spark.jars.ivySettings" to point to your ivysettings.xml file. Would that work for you to configure it there? Bryan

Spark+AI Summit 2018 - San Francisco June 4-6, 2018

2018-03-05 Thread Scott walent
Early Bird pricing ends on Friday. Book now to save $200+ Full agenda is available: www.databricks.com/sparkaisummit

Re: Spark scala development in Sbt vs Maven

2018-03-05 Thread Anthony May
We use sbt for easy cross project dependencies with multiple scala versions in a mono-repo for which it pretty good albeit with some quirks. As our projects have matured and change less we moved away from cross project dependencies but it was extremely useful early in the projects. We knew that a

Re: Welcoming some new committers

2018-03-05 Thread Bryan Cutler
Thanks everyone, this is very exciting! I'm looking forward to working with you all and helping out more in the future. Also, congrats to the other committers as well!!

Re: Spark scala development in Sbt vs Maven

2018-03-05 Thread Sean Owen
Spark uses Maven as the primary build, but SBT works as well. It reads the Maven build to some extent. Zinc incremental compilation works with Maven (with the Scala plugin for Maven). Myself, I prefer Maven, for some of the reasons it is the main build in Spark: declarative builds end up being a

Silencing messages from Ivy when calling spark-submit

2018-03-05 Thread Nicholas Chammas
I couldn’t get an answer anywhere else, so I thought I’d ask here. Is there a way to silence the messages that come from Ivy when you call spark-submit with --packages? (For the record, I asked this question on Stack Overflow .) Would it be a good

Re: Spark scala development in Sbt vs Maven

2018-03-05 Thread Jörn Franke
I think most of the scala development in Spark happens with sbt - in the open source world. However, you can do it with Gradle and Maven as well. It depends on your organization etc. what is your standard. Some things might be more cumbersome too reach in non-sbt scala scenarios, but this is

CSV reader 2.2.0 issue

2018-03-05 Thread SNEHASISH DUTTA
Hi, I am using spark 2.2 csv reader I have data in following format 123|123|"abc"||""|"xyz" the requirement is || has to be treated as null and "" has to be treated as blank character of length 0 I was using option sep as pipe And option quote as "" Parsed the data and using regex I was able