You were a real help Tom! Thanks Gaurav
On Mon, Dec 19, 2011 at 5:33 PM, Tom Pierce <t...@cloudera.com> wrote: > Maybe it's easiest to give an example. > > If you have input: > > a b c > a b c d > a c d > a b c > > You should expect Mahout to output (say, for support 2): > > [a, b, c],3 > [a, c, d],2 > [a, c],4 > > You might also expect to see [a],4 or [a, b],3 but these are implied > by the other patterns. Note that [a, b] and [a, c] are both > subpatterns of [a, b, c]. Only [a, c] is emitted because it has > greater support than [a, b, c]; [a, b] is not emitted, since it has > support equal to a reported superpattern. > > If you were to create all possible subsets of each output pattern > (with the same support as the generating pattern), then dedup these > by taking max-support, you'd have" fully-verbose" results. > > Currently there is no trivial way to disable this behavior; it would > require code changes. I'm not sure how easy it would be in the > current code, but I think it'd be reasonably easy in an alternate > implementation I've been trying to contribute. > > -tom > > On Mon, Dec 19, 2011 at 6:34 AM, gaurav singh <gauravonlin...@gmail.com> > wrote: > > That seems to make sense. What do you mean by " Mahout will not report > any > > of those unless the support is strictly greater > > than 3. " Is there a way for me to get all the patterns with support > > strictly greater then a particular value? > > > > Thanks > > Gaurav > > > > On Mon, Dec 19, 2011 at 4:58 PM, Tom Pierce <t...@cloudera.com> wrote: > > > >> One possible explanation is that Mahout's FPG avoids reporting > >> patterns that are subsumed by others. > >> > >> For example, if you have pattern [a, b, c] with support 3, you clearly > >> must also have [a, b], [b, c] and [a, c] with support >= 3. Mahout > >> will not report any of those unless the support is strictly greater > >> than 3. > >> > >> Does that help explain your discrepancies? If not can you share an > >> example data set along with a missed pattern? > >> > >> -tom > >> > >> On Mon, Dec 19, 2011 at 1:37 AM, gaurav singh <gauravonlin...@gmail.com > > > >> wrote: > >> > Hi All, > >> > > >> > I am using mahout on Ubuntu 10.04 from the repository and running it > >> on a > >> > data set of 1472 row, I am running it in sequential mode with > k=200,000 > >> and > >> > s= 400. I have implemented fp-growth in php but when I compare the > output > >> > of my implementation of fp-growth and mahout fpg, I find that in > mahout > >> the > >> > output consists of just 17,500 patterns whereas from my > implementation I > >> > get around 65,000 unique patterns(I have verified there uniqueness!), > for > >> > the same value of support threshold. I have also verified my outputs > from > >> > the actual data set and have found out that all my patterns are > correct > >> and > >> > do exist in the data set with correct value of their support. > >> > > >> > > >> > Can anyone please explain me the reason?? > >> > > >> > Thanks!! > >> > > >> > -- > >> > regards > >> > Gaurav Singh > >> > > >> > > >> > > >> > > >> > > >> > -- > >> > regards > >> > Gaurav Singh > >> > > >> > > >> > > >> > > >> > > >> > -- > >> > regards > >> > Gaurav Singh > >> > > > > > > > > -- > > regards > > Gaurav Singh > -- regards Gaurav Singh