That seems to make sense. What do you mean by " Mahout will not report any of those unless the support is strictly greater than 3. " Is there a way for me to get all the patterns with support strictly greater then a particular value?
Thanks Gaurav On Mon, Dec 19, 2011 at 4:58 PM, Tom Pierce <t...@cloudera.com> wrote: > One possible explanation is that Mahout's FPG avoids reporting > patterns that are subsumed by others. > > For example, if you have pattern [a, b, c] with support 3, you clearly > must also have [a, b], [b, c] and [a, c] with support >= 3. Mahout > will not report any of those unless the support is strictly greater > than 3. > > Does that help explain your discrepancies? If not can you share an > example data set along with a missed pattern? > > -tom > > On Mon, Dec 19, 2011 at 1:37 AM, gaurav singh <gauravonlin...@gmail.com> > wrote: > > Hi All, > > > > I am using mahout on Ubuntu 10.04 from the repository and running it > on a > > data set of 1472 row, I am running it in sequential mode with k=200,000 > and > > s= 400. I have implemented fp-growth in php but when I compare the output > > of my implementation of fp-growth and mahout fpg, I find that in mahout > the > > output consists of just 17,500 patterns whereas from my implementation I > > get around 65,000 unique patterns(I have verified there uniqueness!), for > > the same value of support threshold. I have also verified my outputs from > > the actual data set and have found out that all my patterns are correct > and > > do exist in the data set with correct value of their support. > > > > > > Can anyone please explain me the reason?? > > > > Thanks!! > > > > -- > > regards > > Gaurav Singh > > > > > > > > > > > > -- > > regards > > Gaurav Singh > > > > > > > > > > > > -- > > regards > > Gaurav Singh > -- regards Gaurav Singh