Re: Connecting an Application to the Cluster

2014-02-17 Thread Michael (Bach) Bui
ect information. Where can I pick up these > architectural/design concepts for Spark? > I seem to have misunderstood the responsibilities of the master and the > driver. > > > On Mon, Feb 17, 2014 at 10:51 PM, Michael (Bach) Bui > wrote: > Spark has the concept of Driver and

Re: Connecting an Application to the Cluster

2014-02-17 Thread Michael (Bach) Bui
Spark has the concept of Driver and Master Driver is your the spark program that you run in your local machine. SparkContext resides in the driver together with the DAG scheduler. Master is responsible for managing cluster resources, e.g. giving the Driver the workers that it needed. The Master

Re: How to map each line to (line number, line)?

2013-12-30 Thread Michael (Bach) Bui
boundary is a new line character. I think this usage pattern is important, if it is not yet available, I can try to pull it in. Michael (Bach) Bui, PhD, Senior Staff Architect, ADATAO Inc. www.adatao.com On Dec 30, 2013, at 6:28 AM, Aureliano Buendia

Re: debugging NotSerializableException while using Kryo

2013-12-23 Thread Michael (Bach) Bui
What spark version are you using? By looking at the code Executor.scala line195, you will at least know what cause the NPE. We can start from there. On Dec 23, 2013, at 10:21 AM, Ameet Kini wrote: > Thanks Imran. > > I tried setting "spark.closure.serializer" to > "org.apache.spark.seriali

Re: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered

2013-12-20 Thread Michael (Bach) Bui
Check if your worker is “alive” Also take a look at your master log and see if there is error message about worker. This usually can be fixed by restarting Spark. On Dec 20, 2013, at 3:12 PM, Michael Kun Yang wrote: > Hi, > > I really need help, I went through previous posts on the mailin

Re: How to access a sub matrix in a spark task?

2013-12-20 Thread Michael (Bach) Bui
partitions like this [100 x col[1, n*50] ] , [100 x col[(n-1)*50+1, (2n-1)*50] ] … then we can assign each partition to a mapper to do mapPartition on it. Michael (Bach) Bui, PhD, Senior Staff Architect, ADATAO Inc. www.adatao.com On Dec 20, 2013

Re: How to access a sub matrix in a spark task?

2013-12-20 Thread Michael (Bach) Bui
. Michael (Bach) Bui, PhD, Senior Staff Architect, ADATAO Inc. www.adatao.com On Dec 20, 2013, at 12:38 PM, Aureliano Buendia wrote: > > > > On Fri, Dec 20, 2013 at 6:00 PM, Tom Vacek wrote: > Oh, I see. I was thinking that there was a computational dependency on o

Re: SPARK + YARN the general case

2013-11-15 Thread Michael (Bach) Bui
01), but I haven't had > time to actually try these yet. > > Tom > > > On Friday, November 15, 2013 10:45 AM, Michael (Bach) Bui > wrote: > Hi Tom, > > I have another question on SoY. Seems like the current implementation will > not support int

Re: Continued performance issues on a small EC2 Spark cluster

2013-11-15 Thread Michael (Bach) Bui
Hi Gary, What are other frameworks running on your Mesos cluster? If they are all Spark frameworks. Another option you may want to consider (in order to improve your cluster utilization) is to let all of them share a single SparkContext. We also experienced degraded performance while running mu

Re: SPARK + YARN the general case

2013-11-15 Thread Michael (Bach) Bui
Hi Tom, I have another question on SoY. Seems like the current implementation will not support interactive type of application like Shark, right? Thanks. On Nov 15, 2013, at 8:15 AM, Tom Graves wrote: > Hey Bill, > > Currently the Spark on Yarn only supports batch mode where you submit your