Re: Percolate feature?

Mark Fri, 09 Aug 2013 10:02:43 -0700

I'll look into this. Thanks for the concrete example as I don't even know which 
classes to start to look at to implement such a feature.


On Aug 9, 2013, at 9:49 AM, Roman Chyla <roman.ch...@gmail.com> wrote:

> On Fri, Aug 9, 2013 at 11:29 AM, Mark <static.void....@gmail.com> wrote:
> 
>>> *All* of the terms in the field must be matched by the query....not
>> vice-versa.
>> 
>> Exactly. This is why I was trying to explain it as a reverse search.
>> 
>> I just realized I describe it as a *large list of known keywords when
>> really its small; no more than 1000. Forgetting about performance  how hard
>> do you think this would be to implement? How should I even start?
>> 
> 
> not hard, index all terms into a field - make sure there are no duplicates,
> as you want to count them - then I can imagine at least two options: save
> the number of terms as a payload together with the terms, or in second step
> (in a collector, for example), load the document and count them terms in
> the field - if they match the query size, you are done
> 
> a trivial, naive implementation (as you say 'forget performance') could be:
> 
> searcher.search(query, null, new Collector() {
>  ...
>  public void collect(int i) throws Exception {
>     d = reader.document(i, fieldsToLoa);
>     if (d.getValues(fieldToLoad).size() == query.size()) {
>        PriorityQueue.add(new ScoreDoc(score, i + docBase));
>     }
>  }
> }
> 
> so if your query contains no duplicates and all terms must match, you can
> be sure that you are collecting docs only when the number of terms matches
> number of clauses in the query
> 
> roman
> 
> 
>> Thanks for the input
>> 
>> On Aug 9, 2013, at 6:56 AM, Yonik Seeley <yo...@lucidworks.com> wrote:
>> 
>>> *All* of the terms in the field must be matched by the query....not
>> vice-versa.
>>> And no, we don't have a query for that out of the box.  To implement,
>>> it seems like it would require the total number of terms indexed for a
>>> field (for each document).
>>> I guess you could also index start and end tokens and then use query
>>> expansion to all possible combinations... messy though.
>>> 
>>> -Yonik
>>> http://lucidworks.com
>>> 
>>> On Fri, Aug 9, 2013 at 8:19 AM, Erick Erickson <erickerick...@gmail.com>
>> wrote:
>>>> This _looks_ like simple phrase matching (no slop) and highlighting...
>>>> 
>>>> But whenever I think the answer is really simple, it usually means
>>>> that I'm missing something....
>>>> 
>>>> Best
>>>> Erick
>>>> 
>>>> 
>>>> On Thu, Aug 8, 2013 at 11:18 PM, Mark <static.void....@gmail.com>
>> wrote:
>>>> 
>>>>> Ok forget the mention of percolate.
>>>>> 
>>>>> We have a large list of known keywords we would like to match against.
>>>>> 
>>>>> Product keyword:  "Sony"
>>>>> Product keyword:  "Samsung Galaxy"
>>>>> 
>>>>> We would like to be able to detect given a product title whether or
>> not it
>>>>> matches any known keywords. For a keyword to be matched all of it's
>> terms
>>>>> must be present in the product title given.
>>>>> 
>>>>> Product Title: "Sony Experia"
>>>>> Matches and returns a highlight: "<em>Sony</em> Experia"
>>>>> 
>>>>> Product Title: "Samsung 52inch LC"
>>>>> Does not match
>>>>> 
>>>>> Product Title: "Samsung Galaxy S4"
>>>>> Matches a returns a highlight: "<em>Samsung Galaxy</em>"
>>>>> 
>>>>> Product Title: "Galaxy Samsung S4"
>>>>> Matches a returns a highlight: "<em> Galaxy  Samsung</em>"
>>>>> 
>>>>> What would be the best way to approach this?
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Aug 5, 2013, at 7:02 PM, Chris Hostetter <hossman_luc...@fucit.org>
>>>>> wrote:
>>>>> 
>>>>>> 
>>>>>> : Subject: Percolate feature?
>>>>>> 
>>>>>> can you give a more concrete, realistic example of what you are
>> trying to
>>>>>> do? your synthetic hypothetical example is kind of hard to make sense
>> of.
>>>>>> 
>>>>>> your Subject line and comment that the "percolate" feature of elastic
>>>>>> search sounds like what you want seems to have some lead people down a
>>>>>> path of assuming you want to run these types of queries as documents
>> are
>>>>>> indexed -- but that isn't at all clear to me from the way you worded
>> your
>>>>>> question other then that.
>>>>>> 
>>>>>> it's also not clear what aspect of the "results" you really care
>> about --
>>>>>> are you only looking for the *number* of documents that "match"
>> according
>>>>>> to your concept of matching, or are you looking for a list of matches?
>>>>>> what multiple documents have all of their terms in the query string --
>>>>> how
>>>>>> should they score relative to eachother?  what if a document contains
>> the
>>>>>> same term multiple times, do you expect it to be a match of a query
>> only
>>>>>> if that term appears in the query multiple times as well?  do you care
>>>>>> about hte ordering of the terms in the query? the ordering of hte
>> terms
>>>>> in
>>>>>> the document?
>>>>>> 
>>>>>> Ideally: describe for us what you wnat to do, w/o assuming
>>>>>> solr/elasticsearch/anything specific about the implementation -- just
>>>>>> describe your actual use case for us, with several real document/query
>>>>>> examples.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> https://people.apache.org/~hossman/#xyproblem
>>>>>> XY Problem
>>>>>> 
>>>>>> Your question appears to be an "XY Problem" ... that is: you are
>> dealing
>>>>>> with "X", you are assuming "Y" will help you, and you are asking about
>>>>> "Y"
>>>>>> without giving more details about the "X" so that we can understand
>> the
>>>>>> full issue.  Perhaps the best solution doesn't involve "Y" at all?
>>>>>> See Also: http://www.perlmonks.org/index.pl?node_id=542341
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -Hoss
>>>>> 
>>>>> 
>> 
>>

Re: Percolate feature?

Reply via email to