Higher scoring if term is at the beginning of a field/document

2011-05-04 Thread Clemens Wyss
Given the I have 3 documents with exactly one field and the fields have the following contents: This is a moon The moon is bright moon If I analyze these documents they all hit on "moon". But how do I need to analyze/search my index in order to have the following "sort order": moon The moon is b

33 Days left to Berlin Buzzwords 2011

2011-05-04 Thread Simon Willnauer
hey folks, BerlinBuzzwords 2011 is close only 33 days left until the big Search, Store and Scale opensource crowd is gathering in Berlin on June 6th/7th. The conference again focuses on the topics search, data analysis and NoSQL. It is to take place on June 6/7th 2011 in Berlin. We are looking f

Information for index filed created by Lucene whene using Nutch

2011-05-04 Thread Allel BenBrahim
Hello I'm using lucene & nutch, but I don't now witch type of field of documents are created by nutch, I developed this program in java : Directory dir = FSDirectory.open(new File("C:/Users/MyWebPage/index")); IndexSearcher search = new IndexSearcher(dir);

Re: Why has PerFieldAnalyzerWrapper been made final in Lucene 3.1 ?

2011-05-04 Thread Paul Taylor
On 04/05/2011 07:56, Israel Tsadok wrote: On Tue, May 3, 2011 at 7:03 PM, Paul Taylor > wrote: We subclassed PerFieldAnalyzerWrapper as follows: public class PerFieldEntityAnalyzer extends PerFieldAnalyzerWrapper { public PerFieldEntityAnalyzer(

Re: MultiPhraseQuery slowing down over time in Lucene 3.1

2011-05-04 Thread Tomislav Poljak
Hi, seems there is a custom impl of MultiPhraseQuery used in the system, which uses (and maybe misuses) Lucene's MultiPhraseQuery that could be the reason of slowdown. I've tried running sample Lucene's MultiPhraseQuery in an infinite while loop printing out times for every 1000 executions and coul

Re: Anyway to not bother scoring less good matches ?

2011-05-04 Thread Ahmet Arslan
Im receiving a number of searches with many ORs so that the total number of matches is huge ( > 1 million) although only the first 20 results are required. Analysis shows most time is spent scoring the results. Now it seems to me if you sending a query with 10 OR components, documents that mat

Re: AW: AW: AW: "fuzzy prefix" search

2011-05-04 Thread Erick Erickson
Shingles won't to that either, so I suspect you'll have to write a custom tokenizer. Best Erick On Wed, May 4, 2011 at 2:07 AM, Clemens Wyss wrote: > I know this is just an example. > But even the WhitespaceAnalyzer takes the words apart, which I don't want. I > would like the phrases as they a

Re: Higher scoring if term is at the beginning of a field/document

2011-05-04 Thread Erick Erickson
What is the problem you're trying to solve? I'm wondering if this is an XY problem. See: http://people.apache.org/~hossman/#xyproblem Best Erick On Wed, May 4, 2011 at 3:16 AM, Clemens Wyss wrote: > Given the I have 3 documents with exactly one field and the fields have the > following contents

Re: Anyway to not bother scoring less good matches ?

2011-05-04 Thread Paul Taylor
On 04/05/2011 12:39, Ahmet Arslan wrote: Im receiving a number of searches with many ORs so that the total number of matches is huge (> 1 million) although only the first 20 results are required. Analysis shows most time is spent scoring the results. Now it seems to me if you sending a query

AW: Higher scoring if term is at the beginning of a field/document

2011-05-04 Thread Clemens Wyss
I try to be more specific: Given the three documents below. When I search for "moon", I'd like to get the follwoing order of my search result: moon The moon is bright This is a moon i.e. the "leftmost hit" of my search term should be rated highest/best... How should I analyze/search my documen

Re: MultiPhraseQuery slowing down over time in Lucene 3.1

2011-05-04 Thread Michael McCandless
OK, phew :) Thanks for bringing closure... Mike http://blog.mikemccandless.com On Wed, May 4, 2011 at 6:52 AM, Tomislav Poljak wrote: > Hi, > seems there is a custom impl of MultiPhraseQuery used in the system, > which uses (and maybe misuses) Lucene's MultiPhraseQuery that could be > the reas

Re: Anyway to not bother scoring less good matches ?

2011-05-04 Thread Paul Taylor
On 04/05/2011 12:51, Paul Taylor wrote: On 04/05/2011 12:39, Ahmet Arslan wrote: Im receiving a number of searches with many ORs so that the total number of matches is huge (> 1 million) although only the first 20 results are required. Analysis shows most time is spent scoring the results.

Re: Higher scoring if term is at the beginning of a field/document

2011-05-04 Thread Erick Erickson
I didn't ask a clear question. *Why* do you want to do this? What is the use-case you're trying to solve? Is relevance not what you want? Are you just experimenting? The statement of *what* you want to do is clear, but I don't know an easy to do that. Perhaps there's a better approach to solving t

AW: Higher scoring if term is at the beginning of a field/document

2011-05-04 Thread Clemens Wyss
Ok, I got you ;) Besides my "real index" (which is being analyzed through a ShingleAnalyzerWrapper) I implicitly/transparently build up a "search term index" which I populate with the terms (being shingles) of my "real index". The "search term index" is being used to provide search term suggest

Re: Anyway to not bother scoring less good matches ?

2011-05-04 Thread Ahmet Arslan
> Thanks for the hint, so this could be done by overriding getBooleanQuery() in > QueryParser ? > I think something like this should do the trick. Without overriding anything. Query query= QueryParser.parse("User Entered String"); if (query instanceof BooleanQuery) ((BooleanQuery)query).se

Re: AW: Higher scoring if term is at the beginning of a field/document

2011-05-04 Thread Ahmet Arslan
Besides my "real index" (which is being analyzed through a ShingleAnalyzerWrapper) I implicitly/transparently build up a "search term index" which I populate with the terms (being shingles) of my "real index". The "search term index" is being used to provide search term suggestions when the u

Solr/Lucene 3.1 | apache ab benchmark | org.mortbay.jetty.EofException | java.net.SocketException: Broken pipe

2011-05-04 Thread Johannes Goll
Hi, I am running ab, Apache's HTTP server benchmarking tool to evaluate the query performance of a Solr/Lucene 3.1 instance. The test index has 12 million documents. The search returns the first 10 rows of 8 stored fields that match a standard query (q=field:value). The box has 256G RAM and 32

Re: Anyway to not bother scoring less good matches ?

2011-05-04 Thread Paul Taylor
On 04/05/2011 15:02, Ahmet Arslan wrote: Thanks for the hint, so this could be done by overriding getBooleanQuery() in QueryParser ? I think something like this should do the trick. Without overriding anything. Query query= QueryParser.parse("User Entered String"); if (query instanceof B

Re: AW: AW: AW: AW: "fuzzy prefix" search

2011-05-04 Thread Otis Gospodnetic
We do have EdgeNGramTokenizer if that is what you are after. See how Solr uses it here: http://search-lucene.com/c/Solr:/src/java/org/apache/solr/analysis/EdgeNGramTokenizerFactory.java||EdgeNGramTokenizer Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search

Re: Anyway to not bother scoring less good matches ?

2011-05-04 Thread Ahmet Arslan
> Thanks again, now done that but still not having much > effect on total > ime, So your main concern is enhancing the running time? , not to decrease the number of returned results. Additionally http://wiki.apache.org/lucene-java/ImproveSearchingSpeed ---

Re: Anyway to not bother scoring less good matches ?

2011-05-04 Thread Chris Hostetter
: Well I did extend QuerParser, and the method is being called but rather : disappointingly it had no noticeablke effect on how long queries took. I : really thought by reducing the number of matches the corresponding scoring : phase would be quicker. "matching" and "scoring" go hand in hand ...

AW: AW: AW: AW: AW: "fuzzy prefix" search

2011-05-04 Thread Clemens Wyss
What I am looking for is the autosuggestion implemented here (@solr) http://search-lucene.com/m/0QBv41ssGlh/suggestion&subj=Auto+Suggest How "easily" can I switch from plain Lucene to Solr? Or (even better), can I just make use of "solr-suggestion"? Clemens > -Ursprüngliche Nachricht- >