Re: Zero-position query?

2013-06-02 Thread Israel Tsadok
You can do this with a PhraseQuery[1]. Just add more terms with position 0. [1] http://lucene.apache.org/core/4_3_0/core/org/apache/lucene/search/PhraseQuery.html#add(org.apache.lucene.index.Term, int) On Mon, Jun 3, 2013 at 6:46 AM, Lance Norskog wrote: > What is a Lucene query that will fin

Re: index searcher leading to system freeze ?

2012-07-11 Thread Israel Tsadok
I'm not sure this is at all related, but we've had high cpu loads on our servers due to the leap second kernel bug - http://serverfault.com/questions/403732/ . On Wed, Jul 11,

Re: QueryParser, double quotes and wilcard inside the double quotes

2012-07-05 Thread Israel Tsadok
s/weird/word/ Sorry, autocorrect. On Jul 5, 2012 4:01 PM, "Israel Tsadok" wrote: > A hacky trick I used was put a stop weird instead of the asterisk. If you > search for "foo a test" and use an analyzer that includes a stop filter > (like StandardAnalyzer does

Re: QueryParser, double quotes and wilcard inside the double quotes

2012-07-05 Thread Israel Tsadok
A hacky trick I used was put a stop weird instead of the asterisk. If you search for "foo a test" and use an analyzer that includes a stop filter (like StandardAnalyzer does), it will match docs 1 and 2. On Jul 4, 2012 10:13 AM, "Jochen Hebbrecht" wrote: > > Thanks Ian, I'll give it a try! > > 20

Re: Extracting all documents for a given search

2011-09-19 Thread Israel Tsadok
Ahem, sorry. I quoted an old answer of mine, but HitCollector has been gone for a while now... This is the modern version: final ArrayList docs = new ArrayList(); searcher.search( query, new Collector() { private int docBase; *// ignore scorer* public void setScorer(Scorer scorer) { }

Re: Extracting all documents for a given search

2011-09-19 Thread Israel Tsadok
If you just want to fetch all the matching documents for a given query, implement a collector that just saves the document data. final ArrayList docs = new ArrayList(); searcher.search( query, new HitCollector() { public void collect(int doc, float score) { docs.add(searcher.doc(doc));

Re: How to search multi-indice at the same time?

2011-07-23 Thread Israel Tsadok
http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/index/MultiReader.html

Re: HighFreqTerms for results set

2011-07-20 Thread Israel Tsadok
This is very interesting. Do you know how query faceting is implemented?

Re: Re: HighFreqTerms for results set

2011-07-19 Thread Israel Tsadok
On Tue, Jul 19, 2011 at 12:20 PM, wrote: > Israel, if you have this implemented, I'd appreciate if you can crunch some > numbers so I know how slow it actually is, for future comparison? Let's say > on 100.000 results, each of which have up to 50 words, or 50.000 results > with 100 words each ...

Re: HighFreqTerms for results set

2011-07-19 Thread Israel Tsadok
We faced this problem a long time ago, and ended up just extracting all the matching documents, re-analyzing and counting the terms using a MultiSet. It was very slow, but it worked. You might

Re: basic example of lucene not working(must be user error, but I am just missing it)

2011-06-19 Thread Israel Tsadok
You're creating the TopScoreDocCollector with numHits=1. This means the collector only retains one result, but keep track of the total number of results. Imagine a situation where there's a million hits. You want to know the number, but you usually don't need all their doc ids. That's why scoreDoc

Re: RemoteSearchable deprecated. What to replace it with?

2011-06-16 Thread Israel Tsadok
Great. Thanks!

Re: RemoteSearchable deprecated. What to replace it with?

2011-06-16 Thread Israel Tsadok
Thanks for answering. If I understand it correctly, I can use IndexSearcher concurrently over many IndexReaders. But since there's no RemoteIndexReader, I'm still left with the same basic problem. How to I search across several servers?

RemoteSearchable deprecated. What to replace it with?

2011-06-15 Thread Israel Tsadok
I use a ParallelMultiSearcher to search across a bunch of RemoteSearchables, pretty much as recommended in Lucene In Action, First Edition, with the appropriate adjustments for Lucene 3.0. This seems to be completely deprecated in 3.1. What is the simplest way for me to continue having the ability

Re: Why has PerFieldAnalyzerWrapper been made final in Lucene 3.1 ?

2011-05-03 Thread Israel Tsadok
On Tue, May 3, 2011 at 7:03 PM, Paul Taylor wrote: > We subclassed PerFieldAnalyzerWrapper as follows: > > public class PerFieldEntityAnalyzer extends PerFieldAnalyzerWrapper { > >public PerFieldEntityAnalyzer(Class indexFieldClass) { >super(new StandardUnaccentAnalyzer()); > >

Re: Analyzer which creates terms of one to n words

2011-04-07 Thread Israel Tsadok
Take a look st http://lucene.apache.org/java/3_0_3/api/contrib-analyzers/org/apache/lucene/analysis/shingle/package-summary.html On Thu, Apr 7, 2011 at 11:30 AM, Clemens Wyss wrote: > Is there an analyzer which takes a text and creates search terms based on > the following rules: > - all single

Re: TopDocsCollector and sorting

2011-03-15 Thread Israel Tsadok
TopDocsCollector.topDocs() does return the results sorted, by score. It basically returns a slice of a PriorityQueue. You can take a look at the source, it's one of the easier parts of t

Re: How to combine QueryParser and Wildcard search

2010-11-19 Thread Israel Tsadok
I'm not sure what you're trying to do, but it seems to me that your best bet is to rewrite the query returned from the QueryParser. Just traverse the BooleanQuery clauses, converting any TermQuery to a WildcardQuery. You can then have control over what transformation exactly you want to perform. I

Re: Newbie Question

2010-11-07 Thread Israel Tsadok
(If I may) In Lucene terminology, an "index" is what would be a "database" in RDBMS terminology. It's the whole thing. A document is akin to a row in a table. Most of the interesting stuff in lucene revolves around locating the document, not retrieving the data actually stored inside it. This is do

Re: Any way to ignore repeated terms in TF calculation?

2009-01-15 Thread Israel Tsadok
Hi Umesh, > I am trying to put the problem more concisely. > 1. Fields where term frequency is very very relevant. E.g. > Body: > Example: >if TF of badger in Body of doc 1 > TF of badger in Body of doc 2 > doc 1 scores higher. > > 2. Fields where term frequency is irrevalent >

Re: Any way to ignore repeated terms in TF calculation?

2009-01-11 Thread Israel Tsadok
> > you can solve your problem at search time by passing a custom Similarity > class that looks something like this: > > private Similarity similarity = new DefaultSimilarity() { >>public float tf(float v) { >> return 1f; >>} >>public float tf(int i) { >> return 1f; >>}

Any way to ignore repeated terms in TF calculation?

2008-12-25 Thread Israel Tsadok
A recurring problem I have with Lucene results is when a document contains the same word over and over again. If for some reason I have a document containing "badger badger badger badger badger badger badger badger", it would appear high on the search results for "badger", even though it's usually

Re: Search on tag / category / label / keyword ...

2008-10-27 Thread Israel Tsadok
ew DefaultSimilarity() { public float lengthNorm(String fieldName, int numTerms) { numTerms = numTerms < 15 ? 15 : numTerms; return super.lengthNorm(fieldName, numTerms); } }); The code above eliminates the advantage of documents with less than 15 terms. In your case, you probably want to replace 15 with 1000 (or as high as you need). Note that I'm not sure if this is the preferred method to achieve what you're looking for, but it works for me. Israel Tsadok

Re: Why use RMI search very slow! when have 13 TermQuery! Return cost 7500 ms .

2007-06-13 Thread Israel Tsadok
that returns Hits. (You can use null for the second parameter) Israel Tsadok On 6/12/07, 童小军 <[EMAIL PROTECTED]> wrote: I am use RMI search two server date! When I use one TermQuery return 30ms (very good)! But when I use booleanQuery add tow termQuery return must 150 ms :( And three is

Re: MultiSearcher, Hits and createWeight

2007-06-13 Thread Israel Tsadok
In case anyone was interested, making the change I described surfaced a bug that causes the custom similarity to not work. There is a patch: https://issues.apache.org/jira/browse/LUCENE-789 On 5/29/07, Israel Tsadok <[EMAIL PROTECTED]> wrote: Hi, I am developing a distributed index,

MultiSearcher, Hits and createWeight

2007-05-29 Thread Israel Tsadok
know enough about Lucene to be sure that it's safe. Have I got something wrong? Is it safe to make that change? Thanks, Israel Tsadok