How to specify “positive class” in sparkml classification?

2021-07-07 Thread Reed Villanueva
How to specify the "positive class" in sparkml binary classification? (Or perhaps: How does a MulticlassClassificationEvaluator

Why does sparkml random forest classifier not support maxBins < number of total categorical values?

2021-06-16 Thread Reed Villanueva
Why does sparkml's random forest classifier not support maxBins (M) < (K) number of total categorical values? My

Re: What happens if a random forest max bins is set too high?

2021-06-16 Thread Reed Villanueva
I *think* solved issue. Will update w/ details after further testing / inspection. On Mon, Jun 14, 2021 at 8:50 PM Reed Villanueva wrote: > What happens if a random forest "max bins" hyperparameter is set too high? > > When training a sparkml random forest ( > https:/

What happens if a random forest max bins is set too high?

2021-06-15 Thread Reed Villanueva
What happens if a random forest "max bins" hyperparameter is set too high? When training a sparkml random forest ( https://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest-classifier ) with maxBins set roughly equal to the max number of distinct categorical values for

Re: sparkml random forest classifier not learning (at all) compared to H2O implementation (on same data)?

2021-06-13 Thread Reed Villanueva
I *think* solved issue. Will update w/ details after further testing / inspection. On Sun, Jun 13, 2021 at 3:29 PM Reed Villanueva wrote: > I am trying to train a random forest classifier w/ sparkml > <https://spark.apache.org/docs/latest/ml-classification-regression.html#rand

sparkml random forest classifier not learning (at all) compared to H2O implementation (on same data)?

2021-06-13 Thread Reed Villanueva
I am trying to train a random forest classifier w/ sparkml and am seeing that the *accuracy etc. is very bad (about the same as the dataset's response distribution itself), yet when using the same