not matter.
Uwe
Am 19. August 2014 22:05:23 MESZ, schrieb Tri Cao :
>OR operator does that, AND only returns docs with ALL terms present.
>
>Note that you have two options here
>1. Create a BooleanQuery object (see the Java doc I linked below) and
&
Whoops, the constraint should be MUST to force all terms present:
http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/search/BooleanClause.Occur.html#MUST
On Aug 19, 2014, at 01:05 PM, "Tri Cao" wrote:
OR operator does that, AND only returns docs with ALL terms present.
Not
g 19, 2014, at 12:17 PM, Jin Guang Zheng wrote:
Thanks for reply, but won't BooleanQuery return both doc1 and doc2 with
query:
label:States AND label:America AND label:United
Best,
Jin
On Tue, Aug 19, 2014 at 2:07 PM, Tri Cao wrote:
> given that example, the easy way is a bo
given that example, the easy way is a boolean AND query of all the terms:
http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/search/BooleanQuery.html
However, if your corpus is more sophisticated you'll find that relevance
ranking is not always that trivial :)
On Aug 19, 2014, at 11:00
Erick, Solr termfreq implementation also uses DocsEnum with the assumption that
freq are called on ascending
doc IDs which is valid when scoring from from the hit list. If freq is
requested for an out of order doc, a new
DocsEnum has to be created.
Bianca, can you explain your use case in more
On Jul 14, 2014, at 03:09 AM, Ganesh wrote:
How Solr handles this scenario... Is it reopening reader after every
delete OR it maintains the list of delete documents in cache?
Regards
Ganesh
On 7/11/2014 4:00 AM, Tri Cao wrote:
> You need to reopen your searcher after
This is actually a tough problem in general: polysemy sense disambiguation. In your case, I think
it's more like you'll probably need to do some named entity resolution to differentiate
"George Washington" from "George Washington Carver" as they are two different
entities.
Do you have a list o
You need to reopen your searcher after deleting. From Java doc for
SearcherManager:
In addition you should periodically call maybeRefresh. While it's possible to
call this just before running each query, this is discouraged since it
penalizes the unlucky queries that do the reopen. It's better
I think emitting two tokens for "vans" is the right (potentially only) way to
do it. You could
also control the dictionary of terms that require this special treatment.
Any reason makes you not happy with this approach?
On Jul 06, 2014, at 11:48 AM, Arjen van der Meijden
wrote:
Hello list,
I would just use S3 as a data push mechanism. In your servlet's init(), you
could download the index from S3 and unpack it to a local directory, then
initialize your Lucene searcher to that directory.
Downloading from S3 to EC2 instances is free, and 5G would take a minute or two.
Also, if you p
This is an interesting performance problem and I think there is probably not
a single answer here, so I'll just layout the steps I would take to tackle this:
1. What is the variance of the query latency? You said the average is 5 minutes,
but is it due to some really bad queries or most queries h
I ran into this issue before and after some digging, I don't think there is an easy way to accommodate long IDs in Lucene. So I decided to go with sharding documents into multiple indexes. It turned out to be a good decision in my case because I would have to shard the index anyway for performance
analyzer/parser would you recommend? Thank you again, Natalia On Mon, Mar 17, 2014 at 3:35 PM, Tri Cao <tm...@me.com> wrote: Natalia,First make sure that your analyzers (both index and query analyzers) donot filter out these as stop words. I think the standard StopFilter listhas "no&qu
Natalia,First make sure that your analyzers (both index and query analyzers) do not filter out these as stop words. I think the standard StopFilter list has "no" and "not". You can try to see if you index have these terms by querying for "no" as a TermQuery. If there is not match for that query, th
ing if there's anything I should be aware of. Regards, John On 2/14/14 4:37 PM, Tri Cao wrote:As docIDs are ints too, it's most likely he'll hit the limit of 2B documents per index though withthat approach though :)I do agree that indexing huge documents doesn't seem to have
As docIDs are ints too, it's most likely he'll hit the limit of 2B documents per index though withthat approach though :)I do agree that indexing huge documents doesn't seem to have a lot of value, even when youknow a doc is a hit for a certain query, how are you going to display the results to use
If I understand correctly, you'd like to shortcut the execution when you reach the desirednumber of hits. Unfortunately, I don't think there's a graceful way to do that right now inCollector. To stop further collecting, you need to throw an IOException (or a subtype of it)and catch the exception la
If you want to index your hard drive, you'll need to keep a copy
of the current file system's directory/files structure. Otherwise, you
won't be able to remove from your index files that have been deleted.
On Jul 5, 2012, at 12:18 PM, Erick Erickson wrote:
> Hmmm, it's not quite clear what the p
through the TopDocs and apply the constraints I need toI think this will work, but have some concern about performance. What would you think?Thanks,Tri.On Apr 06, 2012, at 10:06 AM, Tri Cao wrote:Hi all,What would be the best approach for a custom scoring that requires a "global" view of
Hi all,What would be the best approach for a custom scoring that requires a "global" view of the result set. For example, I have a field call "color" and I would like to have constraints that there are at most 3 docs with color:red, 4 docs with color:blue in the first 16 hits. And the items should
20 matches
Mail list logo