Hi Anoop, The Spark decision tree implementation supports: regression and multi class classification, continuous and categorical features, pruning and does not support missing features at present. You can probably think of it as distributed CART though personally I always find the acronyms confusing.
How much difference are you seeing? There is a very small difference in how the candidate split thresholds are calculated in various libraries (there is no right way) but it should not lead to significant difference in performance. -Manish On Monday, December 29, 2014, Anoop Shiralige <anoop.shiral...@gmail.com> wrote: > Hi All, > > I am trying to do a comparison, by building the model locally using R and > on cluster using spark. > There is some difference in the results. > > Any idea what is the internal implementation of Decision Tree in Spark > MLLib.. (ID3 or C4.5 or C5.0 or CART algorithm). > > Thanks, > AnoopShiralige >