I'll look into this. Thanks for the concrete example as I don't even know which classes to start to look at to implement such a feature.
On Aug 9, 2013, at 9:49 AM, Roman Chyla <roman.ch...@gmail.com> wrote: > On Fri, Aug 9, 2013 at 11:29 AM, Mark <static.void....@gmail.com> wrote: > >>> *All* of the terms in the field must be matched by the query....not >> vice-versa. >> >> Exactly. This is why I was trying to explain it as a reverse search. >> >> I just realized I describe it as a *large list of known keywords when >> really its small; no more than 1000. Forgetting about performance how hard >> do you think this would be to implement? How should I even start? >> > > not hard, index all terms into a field - make sure there are no duplicates, > as you want to count them - then I can imagine at least two options: save > the number of terms as a payload together with the terms, or in second step > (in a collector, for example), load the document and count them terms in > the field - if they match the query size, you are done > > a trivial, naive implementation (as you say 'forget performance') could be: > > searcher.search(query, null, new Collector() { > ... > public void collect(int i) throws Exception { > d = reader.document(i, fieldsToLoa); > if (d.getValues(fieldToLoad).size() == query.size()) { > PriorityQueue.add(new ScoreDoc(score, i + docBase)); > } > } > } > > so if your query contains no duplicates and all terms must match, you can > be sure that you are collecting docs only when the number of terms matches > number of clauses in the query > > roman > > >> Thanks for the input >> >> On Aug 9, 2013, at 6:56 AM, Yonik Seeley <yo...@lucidworks.com> wrote: >> >>> *All* of the terms in the field must be matched by the query....not >> vice-versa. >>> And no, we don't have a query for that out of the box. To implement, >>> it seems like it would require the total number of terms indexed for a >>> field (for each document). >>> I guess you could also index start and end tokens and then use query >>> expansion to all possible combinations... messy though. >>> >>> -Yonik >>> http://lucidworks.com >>> >>> On Fri, Aug 9, 2013 at 8:19 AM, Erick Erickson <erickerick...@gmail.com> >> wrote: >>>> This _looks_ like simple phrase matching (no slop) and highlighting... >>>> >>>> But whenever I think the answer is really simple, it usually means >>>> that I'm missing something.... >>>> >>>> Best >>>> Erick >>>> >>>> >>>> On Thu, Aug 8, 2013 at 11:18 PM, Mark <static.void....@gmail.com> >> wrote: >>>> >>>>> Ok forget the mention of percolate. >>>>> >>>>> We have a large list of known keywords we would like to match against. >>>>> >>>>> Product keyword: "Sony" >>>>> Product keyword: "Samsung Galaxy" >>>>> >>>>> We would like to be able to detect given a product title whether or >> not it >>>>> matches any known keywords. For a keyword to be matched all of it's >> terms >>>>> must be present in the product title given. >>>>> >>>>> Product Title: "Sony Experia" >>>>> Matches and returns a highlight: "<em>Sony</em> Experia" >>>>> >>>>> Product Title: "Samsung 52inch LC" >>>>> Does not match >>>>> >>>>> Product Title: "Samsung Galaxy S4" >>>>> Matches a returns a highlight: "<em>Samsung Galaxy</em>" >>>>> >>>>> Product Title: "Galaxy Samsung S4" >>>>> Matches a returns a highlight: "<em> Galaxy Samsung</em>" >>>>> >>>>> What would be the best way to approach this? >>>>> >>>>> >>>>> >>>>> >>>>> On Aug 5, 2013, at 7:02 PM, Chris Hostetter <hossman_luc...@fucit.org> >>>>> wrote: >>>>> >>>>>> >>>>>> : Subject: Percolate feature? >>>>>> >>>>>> can you give a more concrete, realistic example of what you are >> trying to >>>>>> do? your synthetic hypothetical example is kind of hard to make sense >> of. >>>>>> >>>>>> your Subject line and comment that the "percolate" feature of elastic >>>>>> search sounds like what you want seems to have some lead people down a >>>>>> path of assuming you want to run these types of queries as documents >> are >>>>>> indexed -- but that isn't at all clear to me from the way you worded >> your >>>>>> question other then that. >>>>>> >>>>>> it's also not clear what aspect of the "results" you really care >> about -- >>>>>> are you only looking for the *number* of documents that "match" >> according >>>>>> to your concept of matching, or are you looking for a list of matches? >>>>>> what multiple documents have all of their terms in the query string -- >>>>> how >>>>>> should they score relative to eachother? what if a document contains >> the >>>>>> same term multiple times, do you expect it to be a match of a query >> only >>>>>> if that term appears in the query multiple times as well? do you care >>>>>> about hte ordering of the terms in the query? the ordering of hte >> terms >>>>> in >>>>>> the document? >>>>>> >>>>>> Ideally: describe for us what you wnat to do, w/o assuming >>>>>> solr/elasticsearch/anything specific about the implementation -- just >>>>>> describe your actual use case for us, with several real document/query >>>>>> examples. >>>>>> >>>>>> >>>>>> >>>>>> https://people.apache.org/~hossman/#xyproblem >>>>>> XY Problem >>>>>> >>>>>> Your question appears to be an "XY Problem" ... that is: you are >> dealing >>>>>> with "X", you are assuming "Y" will help you, and you are asking about >>>>> "Y" >>>>>> without giving more details about the "X" so that we can understand >> the >>>>>> full issue. Perhaps the best solution doesn't involve "Y" at all? >>>>>> See Also: http://www.perlmonks.org/index.pl?node_id=542341 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -Hoss >>>>> >>>>> >> >>