You were a real help Tom!

Thanks
Gaurav

On Mon, Dec 19, 2011 at 5:33 PM, Tom Pierce <t...@cloudera.com> wrote:

> Maybe it's easiest to give an example.
>
> If you have input:
>
> a b c
> a b c d
> a    c d
> a b c
>
> You should expect Mahout to output (say, for support 2):
>
> [a, b, c],3
> [a, c, d],2
> [a, c],4
>
> You might also expect to see [a],4 or [a, b],3 but these are implied
> by the other patterns.  Note that [a, b] and [a, c] are both
> subpatterns of [a, b, c].  Only [a, c] is emitted because it has
> greater support than [a, b, c]; [a, b] is not emitted, since it has
> support equal to a reported superpattern.
>
> If you were to create all possible subsets of each output pattern
> (with the same support as the generating pattern),  then dedup these
> by taking max-support, you'd have" fully-verbose" results.
>
> Currently there is no trivial way to disable this behavior; it would
> require code changes.  I'm not sure how easy it would be in the
> current code, but I think it'd be reasonably easy in an alternate
> implementation I've been trying to contribute.
>
> -tom
>
> On Mon, Dec 19, 2011 at 6:34 AM, gaurav singh <gauravonlin...@gmail.com>
> wrote:
> > That seems to make sense. What do you mean by "  Mahout will not report
> any
> > of those unless the support is strictly greater
> > than 3. " Is there a way for me to get all the patterns with support
> > strictly greater then a particular value?
> >
> > Thanks
> > Gaurav
> >
> > On Mon, Dec 19, 2011 at 4:58 PM, Tom Pierce <t...@cloudera.com> wrote:
> >
> >> One possible explanation is that Mahout's FPG avoids reporting
> >> patterns that are subsumed by others.
> >>
> >> For example, if you have pattern [a, b, c] with support 3, you clearly
> >> must also have [a, b], [b, c] and [a, c] with support >= 3.  Mahout
> >> will not report any of those unless the support is strictly greater
> >> than 3.
> >>
> >> Does that help explain your discrepancies?  If not can you share an
> >> example data set along with a missed pattern?
> >>
> >> -tom
> >>
> >> On Mon, Dec 19, 2011 at 1:37 AM, gaurav singh <gauravonlin...@gmail.com
> >
> >> wrote:
> >> > Hi All,
> >> >
> >> > I am using mahout  on Ubuntu 10.04  from the repository and running it
> >> on a
> >> > data set of 1472 row, I am running it in sequential mode with
> k=200,000
> >> and
> >> > s= 400. I have implemented fp-growth in php but when I compare the
> output
> >> > of my implementation of fp-growth and mahout fpg, I find that in
> mahout
> >> the
> >> > output consists of just 17,500 patterns whereas from my
> implementation I
> >> > get around 65,000 unique patterns(I have verified there uniqueness!),
> for
> >> > the same value of support threshold. I have also verified my outputs
> from
> >> > the actual data set and have found out that all my patterns are
> correct
> >> and
> >> > do exist in the data set with correct value of their support.
> >> >
> >> >
> >> > Can anyone please explain me the reason??
> >> >
> >> > Thanks!!
> >> >
> >> > --
> >> > regards
> >> > Gaurav Singh
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > regards
> >> > Gaurav Singh
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > regards
> >> > Gaurav Singh
> >>
> >
> >
> >
> > --
> > regards
> > Gaurav Singh
>



-- 
regards
Gaurav Singh

Reply via email to