Re: How to create concatenated token

Erick Erickson Wed, 17 Jun 2015 06:47:39 -0700

If you used the JIRA I linked, vote for it, add any improvements etc.
Anyone can attach a patch to a JIRA, you just have to create a login.


That said, this may be too rare a use-case to deal with. I just thought
of shingling which I should have suggested before that will work for
concatenating small numbers of tokens which, I'd guess, is the case
here. I mean do you really want to concatenate 50 tokens?

Best,
Erick

On Wed, Jun 17, 2015 at 12:07 AM, Aman Tandon <amantandon...@gmail.com> wrote:
> Dear Erick,
>
> e.g. Solr training
>> *Porter:-*                  "solr"  "train"
>>   Position                     1         2
>> *Concatenated :-*   "solr"  "train"
>>                                            "solrtrain"
>>    Position                     1          2
>
>
> I did implemented the filter as per my requirement. Thank you so much for
> your help and guidance. So how could I contribute it to the solr.
>
> With Regards
> Aman Tandon
>
> On Wed, Jun 17, 2015 at 10:14 AM, Aman Tandon <amantandon...@gmail.com>
> wrote:
>
>> Hi Erick,
>>
>> Thank you so much, it will be helpful for me to learn how to save the
>> state of token. I has no idea of how to save state of previous tokens due
>> to this it was difficult to generate a concatenated token in the last.
>>
>> So is there anything should I read to learn more about it.
>>
>> With Regards
>> Aman Tandon
>>
>> On Wed, Jun 17, 2015 at 9:20 AM, Erick Erickson <erickerick...@gmail.com>
>> wrote:
>>
>>> I really question the premise, but have a look at:
>>> https://issues.apache.org/jira/browse/SOLR-7193
>>>
>>> Note that this is not committed and I haven't reviewed
>>> it so I don't have anything to say about that. And you'd
>>> have to implement it as a custom Filter.
>>>
>>> Best,
>>> Erick
>>>
>>> On Tue, Jun 16, 2015 at 5:55 PM, Aman Tandon <amantandon...@gmail.com>
>>> wrote:
>>> > Hi,
>>> >
>>> > Any guesses, how could I achieve this behaviour.
>>> >
>>> > With Regards
>>> > Aman Tandon
>>> >
>>> > On Tue, Jun 16, 2015 at 8:15 PM, Aman Tandon <amantandon...@gmail.com>
>>> > wrote:
>>> >
>>> >> e.g. Intent for solr training: fq=id: 234, 456, 545 title("solr
>>> training")
>>> >>
>>> >>
>>> >> typo error
>>> >> e.g. Intent for solr training: fq=id:(234 456 545) title:("solr
>>> training")
>>> >>
>>> >> With Regards
>>> >> Aman Tandon
>>> >>
>>> >> On Tue, Jun 16, 2015 at 8:13 PM, Aman Tandon <amantandon...@gmail.com>
>>> >> wrote:
>>> >>
>>> >>> We has some business logic to search the user query in "user intent"
>>> or
>>> >>> "finding the exact matching products".
>>> >>>
>>> >>> e.g. Intent for solr training: fq=id: 234, 456, 545 title("solr
>>> training")
>>> >>>
>>> >>> As we can see it is phrase query so it will took more time than the
>>> >>> single stemmed token query. There are also 5-7 words phrase query. So
>>> we
>>> >>> want to reduce the search time by implementing this feature.
>>> >>>
>>> >>> With Regards
>>> >>> Aman Tandon
>>> >>>
>>> >>> On Tue, Jun 16, 2015 at 6:42 PM, Alessandro Benedetti <
>>> >>> benedetti.ale...@gmail.com> wrote:
>>> >>>
>>> >>>> Can I ask you why you need to concatenate the tokens ? Maybe we can
>>> find
>>> >>>> a
>>> >>>> better solution to concat all the tokens in one single big token .
>>> >>>> I find it difficult to understand the reasons behind tokenising,
>>> token
>>> >>>> filtering and then un-tokenizing again :)
>>> >>>> It would be great if you explain a little bit better what you would
>>> like
>>> >>>> to
>>> >>>> do !
>>> >>>>
>>> >>>>
>>> >>>> Cheers
>>> >>>>
>>> >>>> 2015-06-16 13:26 GMT+01:00 Aman Tandon <amantandon...@gmail.com>:
>>> >>>>
>>> >>>> > Hi,
>>> >>>> >
>>> >>>> > I have a requirement to create the concatenated token of all the
>>> tokens
>>> >>>> > created from the last item of my analyzer chain.
>>> >>>> >
>>> >>>> > *Suppose my analyzer chain is :*
>>> >>>> >
>>> >>>> >
>>> >>>> >
>>> >>>> >
>>> >>>> >
>>> >>>> > * <tokenizer class="solr.WhitespaceTokenizerFactory" />  <filter
>>> >>>> > class="solr.WordDelimiterFilterFactory" catenateAll="1"
>>> >>>> splitOnNumerics="1"
>>> >>>> > preserveOriginal="1"/>    <filter
>>> class="solr.EdgeNGramFilterFactory"
>>> >>>> > minGramSize="2" maxGramSize="15" side="front" />    <filter
>>> >>>> > class="solr.PorterStemmerFilterFactory"/>*
>>> >>>> > I want to create a concatenated token plugin to add at concatenated
>>> >>>> token
>>> >>>> > along with the last token.
>>> >>>> >
>>> >>>> > e.g. Solr training
>>> >>>> >
>>> >>>> > *Porter:-*                  "solr"  "train"
>>> >>>> >   Position                     1         2
>>> >>>> >
>>> >>>> > *Concatenated :-*   "solr"  "train"
>>> >>>> >                                            "solrtrain"
>>> >>>> >    Position                     1          2
>>> >>>> >
>>> >>>> > Please help me out. How to create custom filter for this
>>> requirement.
>>> >>>> >
>>> >>>> > With Regards
>>> >>>> > Aman Tandon
>>> >>>> >
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> --------------------------
>>> >>>>
>>> >>>> Benedetti Alessandro
>>> >>>> Visiting card : http://about.me/alessandro_benedetti
>>> >>>>
>>> >>>> "Tyger, tyger burning bright
>>> >>>> In the forests of the night,
>>> >>>> What immortal hand or eye
>>> >>>> Could frame thy fearful symmetry?"
>>> >>>>
>>> >>>> William Blake - Songs of Experience -1794 England
>>> >>>>
>>> >>>
>>> >>>
>>> >>
>>>
>>
>>

Re: How to create concatenated token

Reply via email to