I want to share and brainstorm on an experiment before I try it all the way.
I hope that Spark contributors can comment. To be clear, it is not my intent
to use MLLib where I get partial control on the work being done and I'm not
seeing it scale well enough yet. I have fundamental questions about S
The reason we are not using MLLib and Breeze is the lack of control over the
data and performance. After computing the covariance matrix, there isn't too
much we can do after that. Many of the methods are private. For now, we need
the max value and the coresponding pair of columns. Later, we may do
Need help getting around these errors.
I have this program that runs fine on smaller input sizes. As it gets
larger, Spark has increasing difficulty of being efficient and functioning
without errors. We have about 46GB free on each node. The workers and
executors are configured to use this up (th
I see this error too. I have never found a fix and I've been working on this
for a few months.
For me, I have 4 nodes with 46GB and 8 cores each. If I change the executor
to use 8GB, if fails. If I use 6GB, it works. I request 2 cores only. On
another cluster, I have different limits. My workloa
I want to ask this, not because I can't read endless documentation and
several tutorials, but because there seems to be many ways of doing things
and I keep having issues. How do you run /your /spark app?
I had it working when I was only using yarn+hadoop1 (Cloudera), then I had
to get Spark and S
Here is a partial comparison.
http://dspace.mit.edu/bitstream/handle/1721.1/82517/MIT-CSAIL-TR-2013-028.pdf?sequence=2
SciDB uses MPI with Intel HW and libraries. Amazing performance at the cost
of more work.
In case the link stops working:
A Complex Analytics Genomics Benchmark Rebecca Taft-,