Re: MLlib Prefixspan implementation

Feynman Liang Wed, 26 Aug 2015 00:16:32 -0700

ReversedPrefix is used because scala's List uses a linked list, which has
constant time append to head but linear time append to tail.


I'm aware that there are use cases for the gap constraints. My question was
more about whether any users of Spark/MLlib have an immediate application
for these features.

On Wed, Aug 26, 2015 at 12:10 AM, alexis GILLAIN <ila...@hotmail.com> wrote:

> A first use case of gap constraint is included in the article.
> Another application would be customer-shopping sequence analysis where you
> want to put a constraint on the duration between two purchases for them to
> be considered as a pertinent sequence.
>
> Additional question regarding the code : what's the point of using 
> ReversedPrefix
> in localprefispan ? The prefix is used neither in finding frequent items
> of a projected database or computing a new projected database so it looks
> like it's appended in inverse order just to be reversed when transformed to
> a sequence.
>
> 2015-08-25 12:15 GMT+08:00 Feynman Liang <fli...@databricks.com>:
>
>> CCing the mailing list again.
>>
>> It's currently not on the radar. Do you have a use case for it? I can
>> bring it up during 1.6 roadmap planning tomorrow.
>>
>> On Mon, Aug 24, 2015 at 8:28 PM, alexis GILLAIN <ila...@hotmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I just realized the article I mentioned is cited in the jira and not in
>>> the code so I guess you didn't use this result.
>>>
>>> Do you plan to implement sequence with timestamp and gap constraint as
>>> in :
>>>
>>>
>>> https://people.mpi-inf.mpg.de/~rgemulla/publications/miliaraki13mg-fsm.pdf
>>>
>>> 2015-08-25 7:06 GMT+08:00 Feynman Liang <fli...@databricks.com>:
>>>
>>>> Hi Alexis,
>>>>
>>>> Unfortunately, both of the papers you referenced appear to be
>>>> translations and are quite difficult to understand. We followed
>>>> http://doi.org/10.1109/ICDE.2001.914830 when implementing PrefixSpan.
>>>> Perhaps you can find the relevant lines in there so I can elaborate 
>>>> further?
>>>>
>>>> Feynman
>>>>
>>>> On Thu, Aug 20, 2015 at 9:07 AM, alexis GILLAIN <ila...@hotmail.com>
>>>> wrote:
>>>>
>>>>> I want to use prefixspan so I had a look at the code and the cited
>>>>> paper : "Distributed PrefixSpan Algorithm Based on MapReduce".
>>>>>
>>>>> There is a result in the paper I didn't really undertstand and I
>>>>> could'nt find where it is used in the code.
>>>>>
>>>>> Suppose a sequence database S = {1,2...n}, a sequence <a...> is a
>>>>> length-(L-1) (2≤L≤n) sequential pattern, in projected databases which is a
>>>>> prefix of a length-(L-1) sequential pattern <a...a>, when the support 
>>>>> count
>>>>> of <a> is not less than min_support, it is equal to obtaining a length-L
>>>>> sequential pattern < a ... a > from projected databases that obtaining a
>>>>> length-L sequential pattern < a ... a > from a sequence database S.
>>>>>
>>>>> According to the paper It's supposed to add a pruning step in the
>>>>> reduce function but I couldn't find where.
>>>>>
>>>>> This result seems to come from a previous paper : "Wang Linlin, Fan
>>>>> Jun. Improved Algorithm for Sequential Pattern Mining Based on PrefixSpan
>>>>> [J]. Computer Engineering, 2009, 35(23): 56-61" but it didn't help me to
>>>>> understand it and how it can improve the algorithm.
>>>>>
>>>>
>>>>
>>>
>>
>

Re: MLlib Prefixspan implementation

Reply via email to