srowen commented on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-840779747
It comes up a lot. Sparse is important at scale. Anywhere that plugs into
native code it has to be made dense, so can't be applied in some cases.
Anything that can operate on spa
srowen commented on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-839796652
Merged to master.
Thanks again @luhenry for hanging in there - just wanted to be pretty sure
about the change. It's a good one.
@zhengruifeng this change is in.
--
This i
srowen commented on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-839063011
OK, in any event `core` is fine, just not sure why it doesn't show up in the
transitive dependencies then. Yeah we don't want to depend on `all` except
within the profile. I thin
srowen commented on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-836946172
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries
srowen commented on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-836724744
Just catching up on the state here - so we need to put back the netlib-lgpl
profile? anything else pending?
--
This is an automated message from the Apache Git Service.
To resp
srowen commented on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-833142563
Just stating the obvious here, maybe, but @fommil is the author of
`netlib-java` and a far better reviewer of these changes than I would be. He
has done a lot to make native acce
srowen commented on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-832721754
Hm, it does seem very sensitive to partitioning then. That's not good; maybe
kind of understandable if the data set is so small that each partition has just
a few elements. The e
srowen commented on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-832333786
Yeah it does seem like the variation here is due to distributing the
computation. It might even be 'reasonable' to expect given the tiny data set.
But isn't very good for confide
srowen commented on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-832163055
Hm. I know @zhengruifeng increased the iterations in this test to improve
the stability. I wonder if 30 is still not really enough? if you have time and
willingness, what happens
srowen commented on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-832104453
Huh. The last time this was changed was in
https://github.com/apache/spark/pull/27519 which would be in Spark 3.1.1.
That is a very different answer from both of the ones you'r
srowen commented on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-831545290
Right yeah like if there is a comparable implementation in R or sklearn, and
it gives a certain answer, that's decent evidence that it's more correct. Could
be due to different c
srowen commented on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-831519563
It's entirely possible that 93.3 is a more correct log-likelihood. Usually
we check some other implementation if possible to verify.
--
This is an automated message from the Ap
srowen commented on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-831281894
Jenkins retest this please
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the spe
13 matches
Mail list logo