Hi Akshay,

when you don't use the "-p" parameter, the builder loads the whole dataset
in memory in every computing node, so every tree grown is trained on the
whole dataset (of course using bagging to select a subset of it). When
using "-p", every computing node loads a part of the dataset (thus the name
"partial") so the trees are trained on parts of the dataset. The training
algorithm is the same in both implementations, and the partial
implementation is used when the dataset is too big to fit in memory.

On Thu, Jul 5, 2012 at 4:38 AM, Nowal, Akshay <akshay_no...@syntelinc.com>wrote:

> Hi All,
>
>
>
> I am running Decision forest in Mahout, below are the commands that I
> have used to implement the algo:
>
>
>
> Info file:
>
> mahout org.apache.mahout.df.tools.Describe -p
> /user/an32665/KDD/KDDTrain+.arff -f /user/an32665/KDD/KDDTrain+.info -d
> N 3 C 2 N C 4 N C 8 N 2 C 19 N L
>
> Building Forest:
>
> mahout org.apache.mahout.df.mapreduce.BuildForest
> -Dmapred.max.split.size=1874231 -oob -d /user/an32665/KDD/KDDTrain+.arff
> -ds /user/an32665/KDD/KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest
>
> Testing Forest:
>
> mahout org.apache.mahout.df.mapreduce.TestForest -i
> /user/an32665/KDD/KDDTest+.arff -ds /user/an32665/KDD/KDDTrain+.info -m
> nsl-forest -a -mr -o predictions
>
>
>
> So while building the forest we use "-P" for implementing partial
> implementation. I just wanted to know the difference in algorithm when
> we use "-p" and when we don't use "-p".
>
>
>
>
>
> Regards,
>
> Akshay Nowal
>
>
>
>

Reply via email to