Hi Ping, FYI, we just merged Feynman's PR: https://github.com/apache/spark/pull/6997 that adds sequential pattern support. Please check out master branch and help test. Thanks!
Best, Xiangrui On Wed, Jun 24, 2015 at 2:16 PM, Feynman Liang <fli...@databricks.com> wrote: > There is a JIRA for this which I just submitted a PR for :) > > On Tue, Jun 23, 2015 at 6:09 PM, Xiangrui Meng <men...@gmail.com> wrote: >> >> This is on the wish list for Spark 1.5. Assuming that the items from >> the same transaction are distinct. We can still follow FP-Growth's >> steps: >> >> 1. find frequent items >> 2. filter transactions and keep only frequent items >> 3. do NOT order by frequency >> 4. use suffix to partition the transactions (whether to use prefix or >> suffix doesn't really matter in this case) >> 5. grow FP-tree locally on each partition (the data structure should >> be the same) >> 6. generate frequent sub-sequences >> >> +Feynman >> >> Best, >> Xiangrui >> >> On Fri, Jun 19, 2015 at 10:51 AM, ping yan <sharon...@gmail.com> wrote: >> > Hi, >> > >> > I have a use case where I'd like to mine frequent sequential patterns >> > (consider the clickpath scenario). Transaction A -> B doesn't equal >> > Transaction B->A.. >> > >> > From what I understand about FP-growth in general and the MLlib >> > implementation of it, the orders are not preserved. Anyone can provide >> > some >> > insights or ideas in extending the algorithm to solve frequent >> > sequential >> > pattern mining problems? >> > >> > Thanks as always. >> > >> > >> > Ping >> > > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org