Starting with the presumption that Solr is a "search engine" for user
queries, what exactly would a user query look like?
Are you really requiring your users to enter long, carefully constructed,
full length product titles??
What kind of application would force its users to do such a thing?
Put another way, if the user has entered what they consider important terms
in their query, why are you being so ready to ignore a lot of those terms?
Or, is this simply a case where some old software had a feature that for
reasons unknown behaved this way and you are merely trying to replicate that
feature merely in the name of compatibility without thinking about whether
the feature actually makes sense in a modern software environment? (Or,
maybe your manager or marketing "invented" this feature and you're just
trying to implement it as stated without trying to decide whether it makes
sense?) The point is that you are making us try to guess what the actual use
case is, rather than simply telling us what it is!
Please clarify what your use case really is. If you would explain the use
case (not some proposed solution), maybe we could offer suggestions for
solutions.
Put another way, what exactly do you perceive to be wrong with normal,
traditional, simply query matching that causes you to go to such great
lengths to avoid using normal, traditional, simple query matching?
IOW, why are you trying to re-invent and re-imagine a wheel that doesn't
appear to need to be re-invented or re-imagined?
I'm sure you must have some reason for doing that, but why not disclose that
reason so that we can utilize it in understanding what you are trying to do?
-- Jack Krupansky
-----Original Message-----
From: Mark
Sent: Friday, August 09, 2013 11:29 AM
To: solr-user@lucene.apache.org
Subject: Re: Percolate feature?
*All* of the terms in the field must be matched by the query....not
vice-versa.
Exactly. This is why I was trying to explain it as a reverse search.
I just realized I describe it as a *large list of known keywords when really
its small; no more than 1000. Forgetting about performance how hard do you
think this would be to implement? How should I even start?
Thanks for the input
On Aug 9, 2013, at 6:56 AM, Yonik Seeley <yo...@lucidworks.com> wrote:
*All* of the terms in the field must be matched by the query....not
vice-versa.
And no, we don't have a query for that out of the box. To implement,
it seems like it would require the total number of terms indexed for a
field (for each document).
I guess you could also index start and end tokens and then use query
expansion to all possible combinations... messy though.
-Yonik
http://lucidworks.com
On Fri, Aug 9, 2013 at 8:19 AM, Erick Erickson <erickerick...@gmail.com>
wrote:
This _looks_ like simple phrase matching (no slop) and highlighting...
But whenever I think the answer is really simple, it usually means
that I'm missing something....
Best
Erick
On Thu, Aug 8, 2013 at 11:18 PM, Mark <static.void....@gmail.com> wrote:
Ok forget the mention of percolate.
We have a large list of known keywords we would like to match against.
Product keyword: "Sony"
Product keyword: "Samsung Galaxy"
We would like to be able to detect given a product title whether or not
it
matches any known keywords. For a keyword to be matched all of it's
terms
must be present in the product title given.
Product Title: "Sony Experia"
Matches and returns a highlight: "<em>Sony</em> Experia"
Product Title: "Samsung 52inch LC"
Does not match
Product Title: "Samsung Galaxy S4"
Matches a returns a highlight: "<em>Samsung Galaxy</em>"
Product Title: "Galaxy Samsung S4"
Matches a returns a highlight: "<em> Galaxy Samsung</em>"
What would be the best way to approach this?
On Aug 5, 2013, at 7:02 PM, Chris Hostetter <hossman_luc...@fucit.org>
wrote:
: Subject: Percolate feature?
can you give a more concrete, realistic example of what you are trying
to
do? your synthetic hypothetical example is kind of hard to make sense
of.
your Subject line and comment that the "percolate" feature of elastic
search sounds like what you want seems to have some lead people down a
path of assuming you want to run these types of queries as documents
are
indexed -- but that isn't at all clear to me from the way you worded
your
question other then that.
it's also not clear what aspect of the "results" you really care
about --
are you only looking for the *number* of documents that "match"
according
to your concept of matching, or are you looking for a list of matches?
what multiple documents have all of their terms in the query string --
how
should they score relative to eachother? what if a document contains
the
same term multiple times, do you expect it to be a match of a query
only
if that term appears in the query multiple times as well? do you care
about hte ordering of the terms in the query? the ordering of hte terms
in
the document?
Ideally: describe for us what you wnat to do, w/o assuming
solr/elasticsearch/anything specific about the implementation -- just
describe your actual use case for us, with several real document/query
examples.
https://people.apache.org/~hossman/#xyproblem
XY Problem
Your question appears to be an "XY Problem" ... that is: you are
dealing
with "X", you are assuming "Y" will help you, and you are asking about
"Y"
without giving more details about the "X" so that we can understand the
full issue. Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341
-Hoss