Matt
I’ll create a feature branch of Mahout in my git repo for simplicity (we are in
code freeze for Mahout right now) Then if you could peel off you changes and
make a PR against it. Everyone can have a look before any change is made to the
ASF repos.
Do a PR against this
I should mention that the densisty is currently set quite high, and we've been
discussing a user defined setting for this. Something that we have not worked
in yet.
From: Andrew Palumbo
Sent: Monday, August 21, 2017 2:44:35 PM
To:
We do currently have optimizations based on density analysis in use e.g.: in
AtB.
https://github.com/apache/mahout/blob/08e02602e947ff945b9bd73ab5f0b45863df3e53/math-scala/src/main/scala/org/apache/mahout/math/scalabindings/package.scala#L431
+1 to PR. thanks for pointing this out.
--andy
Is it possible to add it to Mahout so as to get the unit tests run? If so we
also have a bunch of integration tests as well as my real-world data.
Again, I don’t see anything wrong with skipping zeros in any case but this
method is known to be slower for certain types of math (IIRC). So I’d bet
That looks like ancient code from the old mapreduce days. If is passes unit
tests create a PR.
Just a guess here but there are times when this might not speed up thing but
slow them down. However for vey sparse matrixes that you might see in CF this
could work quite well. Some of the GPU
Good question :D
For the dataset I mentioned in my first message, the entire run is almost 10x
faster (I expect that speedup to be non-linear since it nearly eliminates a for
loop...bigger gains for bigger datasets). It's possible there are other
sections of the code I can't override (e.g.
Interesting indeed. What is “massive”? Does the change pass all unit tests?
On Aug 17, 2017, at 1:04 PM, Scruggs, Matt wrote:
Thanks for the remarks guys!
I profiled the code running locally on my machine and discovered this loop is
where these setQuick() and