[GitHub] [spark] luhenry commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-13 Thread GitBox
luhenry commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-840835158 IIUC, Spark uses the CSC representation. @fommil is that format represented in MTJ as well? -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] luhenry commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-13 Thread GitBox
luhenry commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-840777158 @srowen makes perfect sense that such a change gets thoroughly tested! Next step is to add support for Sparse matrices and vectors. In your experience, how widely used are

[GitHub] [spark] luhenry commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-11 Thread GitBox
luhenry commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-839125900 It does show at https://github.com/apache/spark/blob/420802efbf9aabe3d3f709ec21102510b51dcfc0/dev/deps/spark-deps-hadoop-2.7-hive-2.3#L56 and

[GitHub] [spark] luhenry commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-11 Thread GitBox
luhenry commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-839056925 > So the `core` artifact is no longer part of the transitive deps? I would think `all` needs it, still, but, not sure. `core` is still part of the transitive dependencies

[GitHub] [spark] luhenry commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-10 Thread GitBox
luhenry commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-836932214 @srowen the profile is already back at https://github.com/apache/spark/pull/32415/files#diff-9c5fb3d1b7e3b0f54bc5c4182965c4fe1f9023d449017cece3005d3f90e8e4d8R3499. I think it's

[GitHub] [spark] luhenry commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-07 Thread GitBox
luhenry commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-834226560 @fommil @srowen I got `com.github.fommil.netlib:all` running this morning, and here are the results with jmh: ``` Benchmark(implementation) (k) (m)

[GitHub] [spark] luhenry commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-06 Thread GitBox
luhenry commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-833459432 > @luhenry btw netlib-java might still be pulled in transitively from the Breeze project. You might need to send PRs over there too (I'm not sure if it's still a dependency) 

[GitHub] [spark] luhenry commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-05 Thread GitBox
luhenry commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-832996541 > Oh I see, this is native via ludovic netlib. Can you provide benchmarks vs netlib-java to confirm that there is no obvious perf regression? Let me get some numbers for

[GitHub] [spark] luhenry commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-05 Thread GitBox
luhenry commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-832527738 @srowen I am not observing much convergence even when ramping up `maxIter` to 1000. __`local[2]` + `maxIter=1000`:__

[GitHub] [spark] luhenry commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-04 Thread GitBox
luhenry commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-832290654 And the results for the last commits are at https://github.com/apache/spark/runs/2502327271 (I don't know why it doesn't show in the "All checks have passed" at the end of the

[GitHub] [spark] luhenry commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-04 Thread GitBox
luhenry commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-832285023 Some runs with various values of `max_iter`: __100:__ ![image](https://user-images.githubusercontent.com/660779/117075564-4fe1c480-ad35-11eb-889a-6df72eecacbc.png)

[GitHub] [spark] luhenry commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-04 Thread GitBox
luhenry commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-832114209 And AFAIU, that's with sklearn: ![image](https://user-images.githubusercontent.com/660779/117044505-cc14e180-ad0e-11eb-8874-9df9c75dc2e5.png) I've no confidence

[GitHub] [spark] luhenry commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-04 Thread GitBox
luhenry commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-832092686 From an older version of Spark (3.1.1), I get the following results:

[GitHub] [spark] luhenry commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-04 Thread GitBox
luhenry commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-832072136 From looking at https://github.com/luhenry/spark/runs/2500079223 and https://github.com/luhenry/spark/runs/2500065249, it looks like an intermittent failure. I haven't had a

[GitHub] [spark] luhenry commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-03 Thread GitBox
luhenry commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-831542712 > It's entirely possible that 93.3 is a more correct log-likelihood. Usually we check some other implementation if possible to verify. "Other implementation" as in

[GitHub] [spark] luhenry commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-03 Thread GitBox
luhenry commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-831514938 The error is the following: ``` File "/__w/spark/spark/python/pyspark/ml/clustering.py", line 276, in __main__.GaussianMixture Failed example:

[GitHub] [spark] luhenry commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-03 Thread GitBox
luhenry commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-831212439 /cc @srowen I have release `dev.ludovic.netlib:2.0.0` and I've updated this PR accordingly. -- This is an automated message from the Apache Git Service. To respond to the