Re: Spark FP-Growth algorithm for frequent sequential patterns

Xiangrui Meng Sun, 28 Jun 2015 22:29:36 -0700

Hi Ping,

FYI, we just merged Feynman's PR:
https://github.com/apache/spark/pull/6997 that adds sequential pattern
support. Please check out master branch and help test. Thanks!


Best,
Xiangrui

On Wed, Jun 24, 2015 at 2:16 PM, Feynman Liang <fli...@databricks.com> wrote:
> There is a JIRA for this which I just submitted a PR for :)
>
> On Tue, Jun 23, 2015 at 6:09 PM, Xiangrui Meng <men...@gmail.com> wrote:
>>
>> This is on the wish list for Spark 1.5. Assuming that the items from
>> the same transaction are distinct. We can still follow FP-Growth's
>> steps:
>>
>> 1. find frequent items
>> 2. filter transactions and keep only frequent items
>> 3. do NOT order by frequency
>> 4. use suffix to partition the transactions (whether to use prefix or
>> suffix doesn't really matter in this case)
>> 5. grow FP-tree locally on each partition (the data structure should
>> be the same)
>> 6. generate frequent sub-sequences
>>
>> +Feynman
>>
>> Best,
>> Xiangrui
>>
>> On Fri, Jun 19, 2015 at 10:51 AM, ping yan <sharon...@gmail.com> wrote:
>> > Hi,
>> >
>> > I have a use case where I'd like to mine frequent sequential patterns
>> > (consider the clickpath scenario). Transaction A -> B doesn't equal
>> > Transaction B->A..
>> >
>> > From what I understand about FP-growth in general and the MLlib
>> > implementation of it, the orders are not preserved. Anyone can provide
>> > some
>> > insights or ideas in extending the algorithm to solve frequent
>> > sequential
>> > pattern mining problems?
>> >
>> > Thanks as always.
>> >
>> >
>> > Ping
>> >
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark FP-Growth algorithm for frequent sequential patterns

Reply via email to