What happens if a random forest "max bins" hyperparameter is set too high?
When training a sparkml random forest ( https://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest-classifier ) with maxBins set roughly equal to the max number of distinct categorical values for any given feature I see OK performance metrics. But when I set it closer to 2x or 3x the number of distinct categorical values, performance is terrible (eg. accuracy (in the case of a binary classifier) being no better than just the actual distribution of responses in the dataset) and the feature importances ( https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.classification.RandomForestClassificationModel.html?highlight=featureimportances#pyspark.ml.classification.RandomForestClassificationModel.featureImportances ) being all zeros (as opposed to when using the lower initial maxBins value where it at does show *something* for the importances). I would not think that there would be such a huge difference just from a change in max bins like this (esp. the difference in seeing *something* vs absolutely nothing / all zeros for the feature importances). What could be happening under the hood of the algo that causes such different outcomes when this parameter is changed like this?