Re: Is Spark in Java a bad idea?

2014-10-28 Thread Kevin Markey
Don't be too concerned about the Scala hoop.  Before making the commitment to Scala, I had coded up a modest analytic prototype in Hadoop mapreduce.  Once making the commitment, it took 10 days to (1) learn enough Scala, and (2) re-write the prototype in Spark in Scala. 

Re: Running Spark shell on YARN

2014-08-15 Thread Kevin Markey
Sandy and others: Is there a single source of Yarn/Hadoop properties that should be set or reset for running Spark on Yarn? We've sort of stumbled through one property after another, and (unless there's an update I've not yet seen) CDH5 Spark-related properties

Re: Comparative study

2014-07-08 Thread Kevin Markey
When you say "large data sets", how large? Thanks On 07/07/2014 01:39 PM, Daniel Siegmann wrote: From a development perspective, I vastly prefer Spark to MapReduce. The MapReduce API is very constrained; Spark's

Re: Comparative study

2014-07-08 Thread Kevin Markey
, Kevin Markey kevin.mar...@oracle.com wrote: When you say "large data sets", how large? Thanks On 07/07/2014 01:39 PM, Daniel Sieg

Re: Comparative study

2014-07-08 Thread Kevin Markey
anything unusual. Did you do any custom configuration? Any advice would be appreciated. -Suren On Tue, Jul 8, 2014 at 1:54 PM, Kevin Markey kevin.mar

Re: trying to understand yarn-client mode

2014-06-19 Thread Kevin Markey
location by exporting its location as SPARK_JAR. Kevin Markey On 06/19/2014 11:22 AM, Koert Kuipers wrote: i am trying to understand how yarn-client mode works. i am not using spark-submit, but instead launching a spark job

Re: Failed RC-10 yarn-cluster job for FS closed error when cleaning up staging directory

2014-05-22 Thread Kevin Markey
Tom On Wednesday, May 21, 2014 6:10 PM, Kevin Markey kevin.mar...@oracle.com

Failed RC-10 yarn-cluster job for FS closed error when cleaning up staging directory

2014-05-21 Thread Kevin Markey
these two anomalies. Thanks Kevin Markey

Re: Job initialization performance of Spark standalone mode vs YARN

2014-04-03 Thread Kevin Markey
We are now testing precisely what you ask about in our environment. But Sandy's questions are relevant. The bigger issue is not Spark vs. Yarn but "client" vs. "standalone" and where the client is located on the network relative to the cluster. The "client" options

Re: Is there a way to get the current progress of the job?

2014-04-01 Thread Kevin Markey
. But what if -- as occurs in another application -- there's only one or two stages, but lots of data passing through those 1 or 2 stages? Kevin Markey On 04/01/2014 09:55 AM, Mark Hamstra wrote: Some related discussion:https://github.com/apache