luhenry commented on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-840835158
IIUC, Spark uses the CSC representation. @fommil is that format represented
in MTJ as well?
--
This is an automated message from the Apache Git Service.
To respond to the
luhenry commented on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-840777158
@srowen makes perfect sense that such a change gets thoroughly tested! Next
step is to add support for Sparse matrices and vectors. In your experience, how
widely used are
luhenry commented on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-839125900
It does show at
https://github.com/apache/spark/blob/420802efbf9aabe3d3f709ec21102510b51dcfc0/dev/deps/spark-deps-hadoop-2.7-hive-2.3#L56
and
luhenry commented on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-839056925
> So the `core` artifact is no longer part of the transitive deps? I would
think `all` needs it, still, but, not sure.
`core` is still part of the transitive dependencies
luhenry commented on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-836932214
@srowen the profile is already back at
https://github.com/apache/spark/pull/32415/files#diff-9c5fb3d1b7e3b0f54bc5c4182965c4fe1f9023d449017cece3005d3f90e8e4d8R3499.
I think it's
luhenry commented on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-834226560
@fommil @srowen I got `com.github.fommil.netlib:all` running this morning,
and here are the results with jmh:
```
Benchmark(implementation) (k) (m)
luhenry commented on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-833459432
> @luhenry btw netlib-java might still be pulled in transitively from the
Breeze project. You might need to send PRs over there too (I'm not sure if it's
still a dependency)
luhenry commented on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-832996541
> Oh I see, this is native via ludovic netlib. Can you provide benchmarks vs
netlib-java to confirm that there is no obvious perf regression?
Let me get some numbers for
luhenry commented on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-832527738
@srowen I am not observing much convergence even when ramping up `maxIter`
to 1000.
__`local[2]` + `maxIter=1000`:__
luhenry commented on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-832290654
And the results for the last commits are at
https://github.com/apache/spark/runs/2502327271 (I don't know why it doesn't
show in the "All checks have passed" at the end of the
luhenry commented on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-832285023
Some runs with various values of `max_iter`:
__100:__
![image](https://user-images.githubusercontent.com/660779/117075564-4fe1c480-ad35-11eb-889a-6df72eecacbc.png)
luhenry commented on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-832114209
And AFAIU, that's with sklearn:
![image](https://user-images.githubusercontent.com/660779/117044505-cc14e180-ad0e-11eb-8874-9df9c75dc2e5.png)
I've no confidence
luhenry commented on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-832092686
From an older version of Spark (3.1.1), I get the following results:
luhenry commented on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-832072136
From looking at https://github.com/luhenry/spark/runs/2500079223 and
https://github.com/luhenry/spark/runs/2500065249, it looks like an intermittent
failure. I haven't had a
luhenry commented on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-831542712
> It's entirely possible that 93.3 is a more correct log-likelihood. Usually
we check some other implementation if possible to verify.
"Other implementation" as in
luhenry commented on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-831514938
The error is the following:
```
File "/__w/spark/spark/python/pyspark/ml/clustering.py", line 276, in
__main__.GaussianMixture
Failed example:
luhenry commented on pull request #32415:
URL: https://github.com/apache/spark/pull/32415#issuecomment-831212439
/cc @srowen I have release `dev.ludovic.netlib:2.0.0` and I've updated this
PR accordingly.
--
This is an automated message from the Apache Git Service.
To respond to the
17 matches
Mail list logo