Re: Is There Any Benchmarks Comparing C++ MPI with Spark

2014-06-19 Thread ldmtwo
Here is a partial comparison. http://dspace.mit.edu/bitstream/handle/1721.1/82517/MIT-CSAIL-TR-2013-028.pdf?sequence=2 SciDB uses MPI with Intel HW and libraries. Amazing performance at the cost of more work. In case the link stops working: A Complex Analytics Genomics Benchmark Rebecca Taft-,

How do you run your spark app?

2014-06-19 Thread ldmtwo
I want to ask this, not because I can't read endless documentation and several tutorials, but because there seems to be many ways of doing things and I keep having issues. How do you run /your /spark app? I had it working when I was only using yarn+hadoop1 (Cloudera), then I had to get Spark and S

Re: Initial job has not accepted any resources

2014-08-11 Thread ldmtwo
I see this error too. I have never found a fix and I've been working on this for a few months. For me, I have 4 nodes with 46GB and 8 cores each. If I change the executor to use 8GB, if fails. If I use 6GB, it works. I request 2 cores only. On another cluster, I have different limits. My workloa

Spark Akka/actor failures.

2014-08-13 Thread ldmtwo
Need help getting around these errors. I have this program that runs fine on smaller input sizes. As it gets larger, Spark has increasing difficulty of being efficient and functioning without errors. We have about 46GB free on each node. The workers and executors are configured to use this up (th

Re: Spark Akka/actor failures.

2014-08-14 Thread ldmtwo
The reason we are not using MLLib and Breeze is the lack of control over the data and performance. After computing the covariance matrix, there isn't too much we can do after that. Many of the methods are private. For now, we need the max value and the coresponding pair of columns. Later, we may do

experimental solution to nesting RDDs

2014-09-24 Thread ldmtwo
I want to share and brainstorm on an experiment before I try it all the way. I hope that Spark contributors can comment. To be clear, it is not my intent to use MLLib where I get partial control on the work being done and I'm not seeing it scale well enough yet. I have fundamental questions about S