Boost Problem (again), need example !

2010-02-22 Thread pdaures
Hi, I know that there are many topics about scoring issues, but I didn't find an answer in the topics. This is the problem : Imagine I'm a teacher, and I have to index all the results, comments and score about students. Student : String name (eg : John Smith) String comments : (eg: John is a

Re: Boost Problem (again), need example !

2010-02-22 Thread Ian Lea
Can't you simply sort by descending score (your score, not lucene's)? Seems to me that would give you what you are asking for. The setBoost() method is unlikely to work consistently because it only infuences the score rather than setting it. If your John Mickeal doc happens to have a higher

RE: Boost Problem (again), need example !

2010-02-22 Thread Uwe Schindler
It's CustomScoreQuery in 2.9 and 3.0. Please wait for 2.9.2 and 3.0.1 for an important API change in this experimental query type to work correct with the new per-segment-search! You can test the release artifacts of both new versions here:

range of scores : queryNorm()

2010-02-22 Thread Smith G
Hello , I have observed that even if we change boosting drastically, scores are being normalized at the end because of queryNorm value. Is there anything ( regarding to the queryNorm) that we can rely on ? like score will always be under 10 or some fixed value ? The main objective is to

RE: Boost Problem (again), need example !

2010-02-22 Thread pdaures
HI ! Thank you for your help. I think I don't use CustomScoreQuery correctly when I do a search. BooleanQuery combinedQuery = new BooleanQuery(); combinedQuery.add(textQuery, Occur.MUST); combinedQuery.add(titleQuery, Occur.MUST); CustomScoreQuery customQuery = new

Re: Boost Problem (again), need example !

2010-02-22 Thread Ian Lea
boostField needs to be indexed to be used in the FieldScoreQuery. Are you now using one of the the latest releases that Uwe mentioned, with fixes for CustomScoreQuery? And unless you provide your own implementation of CustomScoreQuery.customScore() I think that you are still not guaranteed to

RE: Boost Problem (again), need example !

2010-02-22 Thread Uwe Schindler
The simple fix for that is to wrap the subQuery using: new ConstantScoreQuery(new QueryWrapperFilter(query)) - after that its score is constant and the ValueSource only scores. I recommend to use NumericField for indexing this boost (no storing needed, only indexing,

RE: Boost Problem (again), need example !

2010-02-22 Thread pdaures
It WORKS ! Thank you so much, I spent a lot of time trying to do that, thank you again ! Uwe Schindler wrote: The simple fix for that is to wrap the subQuery using: new ConstantScoreQuery(new QueryWrapperFilter(query)) - after that its score is constant and the ValueSource only scores.

Re: PayloadNearSpanScorer explain method

2010-02-22 Thread Peter Keegan
Patch is in JIRA: LUCENE-2272 On Wed, Feb 17, 2010 at 8:40 PM, Peter Keegan peterlkee...@gmail.comwrote: Yes, I will provide a patch. Our new proxy server has broken my access to the svn repository, though :-( On Tue, Feb 16, 2010 at 1:12 PM, Grant Ingersoll gsing...@apache.orgwrote: That

Re: Boost Problem (again), need example !

2010-02-22 Thread Erick Erickson
I still don't understand why a simple sort as suggested by Ian wouldn't work. It'd be a lot more reliable than fiddling with doc scores if you want a strict ordering on a particular field (make sure it's untokenized though). Erick On Mon, Feb 22, 2010 at 8:19 AM, pdaures patrick.dau...@gmail.com

Re: range of scores : queryNorm()

2010-02-22 Thread Ian Lea
I have observed that even if we change boosting drastically, scores are being normalized at the end because of queryNorm value. Is there anything ( regarding to the queryNorm) that we can rely on ? Dunno. like score will always be under 10 No. or some fixed value ? I think not. The

Re: range of scores : queryNorm()

2010-02-22 Thread Erick Erickson
Could you back up a step and tell us what the upper-level task you're trying to accomplish is? That is, why the partner wants the number? Because the raw score in Lucene is only relevant within that single query, and then only for ranking. The normalized score *is* in a fixed range already,

Scanning docs at index time

2010-02-22 Thread Nigel
I'd like to scan documents as they're being indexed, to find out immediately if any of them match certain queries. The goal is to find out of there are any new hits for these queries as soon as possible, without re-searching the index over and over (which would be inefficient, and higher

IndexWriter.getReader.getVersion behavior

2010-02-22 Thread Peter Keegan
Using Lucene 2.9.1, I have the following pseudocode which gets repeated at regular intervals: 1. FSDirectory dir = FSDirectory.open(java.io.File); 2. dir.setLockFactory(new SingleInstanceLockFactory()); 3. IndexWriter writer = new IndexWriter(dir, Analyzer, false, maxFieldLen) 4.

Re: IndexWriter.getReader.getVersion behavior

2010-02-22 Thread Jason Rutherglen
Peter, Perhaps other concurrent operations? Jason On Tue, Feb 23, 2010 at 10:43 AM, Peter Keegan peterlkee...@gmail.com wrote: Using Lucene 2.9.1, I have the following pseudocode which gets repeated at regular intervals: 1. FSDirectory dir = FSDirectory.open(java.io.File); 2.

Re: IndexWriter.getReader.getVersion behavior

2010-02-22 Thread Michael McCandless
That's curious. It's only on prepareCommit (or, commit, if you didn't first prepare, since that will call prepareCommit internally) that this version should increase. Is there only 1 thread doing this? Oh, and, are you passing false for autoCommit? Mike On Mon, Feb 22, 2010 at 11:43 AM, Peter

Re: IndexWriter.getReader.getVersion behavior

2010-02-22 Thread Peter Keegan
Only one writer thread and one writer process. I'm calling IndexWriter(Directory d, Analyzer a, boolean create, MaxFieldLength mfl), which sets autocommit=false. Peter On Mon, Feb 22, 2010 at 12:24 PM, Michael McCandless luc...@mikemccandless.com wrote: That's curious. It's only on

Re: IndexWriter.getReader.getVersion behavior

2010-02-22 Thread Peter Keegan
I'm pretty sure there are flushes and segment merges going on, but as you said, that shouldn't affect the version increment. I'll see what I can do to get infoStream output. Thanks, Peter On Mon, Feb 22, 2010 at 2:30 PM, Michael McCandless luc...@mikemccandless.com wrote: Well I'm at a loss

can IndexWriter.addIndexes de-dupe documents?

2010-02-22 Thread jchang
When I call IndexWriter.addIndexes, is there anything I can do to make it filter out duplicates based a certain field (or group of fields)? If I know that the id field of the document is unique, can I make addIndexes know that if it finds a new document bat the same id, the new one is valid and

Re: can IndexWriter.addIndexes de-dupe documents?

2010-02-22 Thread Michael McCandless
addIndexes doesn't make this possible. Maybe add the indexes but then make a 2nd pass to dedup? Mike On Mon, Feb 22, 2010 at 4:26 PM, jchang jchangkihat...@gmail.com wrote: When I call IndexWriter.addIndexes, is there anything I can do to make it filter out duplicates based a certain field

Re: can IndexWriter.addIndexes de-dupe documents?

2010-02-22 Thread Erick Erickson
What sorts of rules would govern which one should be kept? Say you were adding three indexes and there was a document in each that was identical. Which one should be kept? I suspect any rule would be wrong at least part of the time FWIW Erick On Mon, Feb 22, 2010 at 5:02 PM, Michael

Re: Scanning docs at index time

2010-02-22 Thread Apoorv Sharma
I don't know of classes which will be suitable but if they are ordered queries a simple code could easily be written. On Mon, Feb 22, 2010 at 9:59 PM, Nigel nigelspl...@gmail.com wrote: I'd like to scan documents as they're being indexed, to find out immediately if any of them match certain