[GitHub] [spark] srowen commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-13 Thread GitBox
srowen commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-840779747 It comes up a lot. Sparse is important at scale. Anywhere that plugs into native code it has to be made dense, so can't be applied in some cases. Anything that can operate on spa

[GitHub] [spark] srowen commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-12 Thread GitBox
srowen commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-839796652 Merged to master. Thanks again @luhenry for hanging in there - just wanted to be pretty sure about the change. It's a good one. @zhengruifeng this change is in. -- This i

[GitHub] [spark] srowen commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-11 Thread GitBox
srowen commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-839063011 OK, in any event `core` is fine, just not sure why it doesn't show up in the transitive dependencies then. Yeah we don't want to depend on `all` except within the profile. I thin

[GitHub] [spark] srowen commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-10 Thread GitBox
srowen commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-836946172 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries

[GitHub] [spark] srowen commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-10 Thread GitBox
srowen commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-836724744 Just catching up on the state here - so we need to put back the netlib-lgpl profile? anything else pending? -- This is an automated message from the Apache Git Service. To resp

[GitHub] [spark] srowen commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-05 Thread GitBox
srowen commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-833142563 Just stating the obvious here, maybe, but @fommil is the author of `netlib-java` and a far better reviewer of these changes than I would be. He has done a lot to make native acce

[GitHub] [spark] srowen commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-05 Thread GitBox
srowen commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-832721754 Hm, it does seem very sensitive to partitioning then. That's not good; maybe kind of understandable if the data set is so small that each partition has just a few elements. The e

[GitHub] [spark] srowen commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-04 Thread GitBox
srowen commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-832333786 Yeah it does seem like the variation here is due to distributing the computation. It might even be 'reasonable' to expect given the tiny data set. But isn't very good for confide

[GitHub] [spark] srowen commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-04 Thread GitBox
srowen commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-832163055 Hm. I know @zhengruifeng increased the iterations in this test to improve the stability. I wonder if 30 is still not really enough? if you have time and willingness, what happens

[GitHub] [spark] srowen commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-04 Thread GitBox
srowen commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-832104453 Huh. The last time this was changed was in https://github.com/apache/spark/pull/27519 which would be in Spark 3.1.1. That is a very different answer from both of the ones you'r

[GitHub] [spark] srowen commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-03 Thread GitBox
srowen commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-831545290 Right yeah like if there is a comparable implementation in R or sklearn, and it gives a certain answer, that's decent evidence that it's more correct. Could be due to different c

[GitHub] [spark] srowen commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-03 Thread GitBox
srowen commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-831519563 It's entirely possible that 93.3 is a more correct log-likelihood. Usually we check some other implementation if possible to verify. -- This is an automated message from the Ap

[GitHub] [spark] srowen commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-03 Thread GitBox
srowen commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-831281894 Jenkins retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe