RE: GData, updateable IndexSearcher

2006-04-27 Thread Vanlerberghe, Luc
Here are some remarks from what I learned by inspecting the code (quite a while ago now, but the principle shouldn't have changed)... When an IndexReader opens the segments of an index it - grabs the commit lock, - reads the "segments" file for the list of segment names. - opens the files for ea

[jira] Commented: (LUCENE-507) CLONE -[PATCH] remove unused variables

2006-04-27 Thread Andi Vajda (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-507?page=comments#action_12376874 ] Andi Vajda commented on LUCENE-507: --- My apologies, I didn't notice this until it was mentioned today. The "//required by gcj" comment is not something I added or need. The f

Re: 2.0 release

2006-04-27 Thread Chuck Williams
Any chance at a last plea for LUCENE-362? It saves me an enormous amount of unnecessary allocation for the common case of a single large compressed field. It is an expert-level api that needs to be used carefully, but has no affect on any behavior if you don't use it. http://issues.apache.org/ji

[jira] Commented: (LUCENE-558) Selective field loading

2006-04-27 Thread Chuck Williams (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-558?page=comments#action_12376849 ] Chuck Williams commented on LUCENE-558: --- There is one potentially important benefit of this approach over LUCENE-545. By having the narrower more concrete API (list of

Re: 2.0 release

2006-04-27 Thread Chris Hostetter
: I should have been more clear: I'm not asking for new feature requests. : Rather for known, high-priority, bugs. I don't know if it's high priority, but LUCENE-546 seems to be a trivial bug with a trivial fix ("seems to be", i'm judging purely by the patch) 2.0 also seems like the best time

Re: 2.0 release

2006-04-27 Thread Yonik Seeley
On 4/27/06, Robert Engels <[EMAIL PROTECTED]> wrote: > What about making IndexReader & IndexWriter interfaces? Or creating > interfaces for these (IReader & IWriter?), and making all of the classes use > the interfaces? There is a drawback to interfaces too... you can't easily add an extra method

Re: 2.0 release

2006-04-27 Thread Doug Cutting
Robert Engels wrote: What about making IndexReader & IndexWriter interfaces? Or creating interfaces for these (IReader & IWriter?), and making all of the classes use the interfaces? I should have been more clear: I'm not asking for new feature requests. Rather for known, high-priority, bugs.

Re: 2.0 release

2006-04-27 Thread Yonik Seeley
Maybe a fix for http://issues.apache.org/jira/browse/LUCENE-556 might be warranted? -Yonik On 4/27/06, Doug Cutting <[EMAIL PROTECTED]> wrote: > Are there any changes folks think we need before we make the 2.0 > release? The major change from 1.9, removal of deprecated items, has > been made. A

RE: 2.0 release

2006-04-27 Thread Robert Engels
What about making IndexReader & IndexWriter interfaces? Or creating interfaces for these (IReader & IWriter?), and making all of the classes use the interfaces? -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Thursday, April 27, 2006 5:20 PM To: java-dev@lucene.apache

Re: 2.0 release

2006-04-27 Thread Doug Cutting
karl wettin wrote: Not critical in any way, but I would not mind if Term and Document were interfaces instead of final classes. That's not likely to happen before the 2.0 release. We're looking high-priority, back-compatible bug fixes at this point. Doug --

Re: 2.0 release

2006-04-27 Thread karl wettin
28 apr 2006 kl. 00.19 skrev Doug Cutting: Are there any changes folks think we need before we make the 2.0 release? The major change from 1.9, removal of deprecated items, has been made. Anything else critical? Not critical in any way, but I would not mind if Term and Document were int

Re: Rich positions (was "boosting fields")

2006-04-27 Thread karl wettin
28 apr 2006 kl. 00.30 skrev Marvin Humphrey: On Apr 27, 2006, at 2:35 PM, karl wettin wrote: What will be required in the IndexReader? Is it enough to add getBoost() in the TermEnum? How would the value be sent to the scorer? It wouldn't be the TermEnum, it would be a TermDocs subclass.

Re: Rich positions (was "boosting fields")

2006-04-27 Thread Marvin Humphrey
On Apr 27, 2006, at 2:35 PM, karl wettin wrote: What will be required in the IndexReader? Is it enough to add getBoost() in the TermEnum? How would the value be sent to the scorer? It wouldn't be the TermEnum, it would be a TermDocs subclass. If we're talking BOOST_PER_POSITION, it would

2.0 release

2006-04-27 Thread Doug Cutting
Are there any changes folks think we need before we make the 2.0 release? The major change from 1.9, removal of deprecated items, has been made. Anything else critical? Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For ad

Re: [jira] Commented: (LUCENE-558) Selective field loading

2006-04-27 Thread Doug Cutting
Chuck Williams (JIRA) wrote: 545 is a good improvement. [ ... ] Is there interest in committing 545? I think we should probably get the 2.0 release out the door before we do that. Doug - To unsubscribe, e-mail: [EMAIL PROTE

[jira] Updated: (LUCENE-559) Turkish Analyzer for Lucene

2006-04-27 Thread Emre Bayram (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-559?page=all ] Emre Bayram updated LUCENE-559: --- Attachment: TurkishAnalyzer.java TurkishStemFilter.java TurkishStemmer.java > Turkish Analyzer for Lucene > --

[jira] Created: (LUCENE-559) Turkish Analyzer for Lucene

2006-04-27 Thread Emre Bayram (JIRA)
Turkish Analyzer for Lucene --- Key: LUCENE-559 URL: http://issues.apache.org/jira/browse/LUCENE-559 Project: Lucene - Java Type: Improvement Components: Analysis Reporter: Emre Bayram I have developed an Analyzer for Turkish, thank

[jira] Commented: (LUCENE-558) Selective field loading

2006-04-27 Thread Chuck Williams (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-558?page=comments#action_12376828 ] Chuck Williams commented on LUCENE-558: --- You're right about IndexReader.document(int), although it appears you removed (package api) FieldsReader.doc(int). I've been re

Re: Rich positions (was "boosting fields")

2006-04-27 Thread Doug Cutting
Marvin Humphrey wrote: Incidentally, how about calling it BOOST_PER_POSITION instead? +1, that is more consistent with other naming. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PRO

Re: Rich positions (was "boosting fields")

2006-04-27 Thread karl wettin
27 apr 2006 kl. 18.41 skrev Doug Cutting: karl wettin wrote: Boost per position, et.c. sounds very expensive. Indeed. It will probably nearly double the size of indexes and also increase search time. But it is also very powerful. Consider the posting representation Google describes on

Re: Rich positions (was "boosting fields")

2006-04-27 Thread Marvin Humphrey
Now that I think about it, putting the score-multiplier into the FreqFile does offer a benefit I hadn't considered before. It makes it possible to tie the score multiplier to a term within a doc, rather than a field within a doc. Say you have a doc with a "body" field that's 1000 terms l

Re: Rich positions (was "boosting fields")

2006-04-27 Thread Marvin Humphrey
On Apr 27, 2006, at 12:17 PM, Doug Cutting wrote: Marvin Humphrey wrote: Moving away from cached norms was the second of three major changes to the file format on my agenda, and the one I was all but certain I wouldn't be able to sell to the Lucene community. The first was using bytec

[jira] Updated: (LUCENE-545) Field Selection and Lazy Field Loading

2006-04-27 Thread Grant Ingersoll (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-545?page=all ] Grant Ingersoll updated LUCENE-545: --- Attachment: newFiles.tar.gz Forgot the new files. > Field Selection and Lazy Field Loading > -- > > Key: L

[jira] Commented: (LUCENE-558) Selective field loading

2006-04-27 Thread Grant Ingersoll (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-558?page=comments#action_12376784 ] Grant Ingersoll commented on LUCENE-558: IndexReader.document(int n) is still in there. All prior APIs work and the introduction of Fieldable is a drop in replacement

Re: Rich positions (was "boosting fields")

2006-04-27 Thread Doug Cutting
Marvin Humphrey wrote: Moving away from cached norms was the second of three major changes to the file format on my agenda, and the one I was all but certain I wouldn't be able to sell to the Lucene community. The first was using bytecounts at the head of Strings. The third was storing st

[jira] Commented: (LUCENE-558) Selective field loading

2006-04-27 Thread Chuck Williams (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-558?page=comments#action_12376781 ] Chuck Williams commented on LUCENE-558: --- 545 is certainly more general and could handle all the cases. I looked at it briefly before doing this version and was concerne

Rich positions (was "boosting fields")

2006-04-27 Thread Marvin Humphrey
On Apr 27, 2006, at 9:41 AM, Doug Cutting wrote: karl wettin wrote: My own immediate thought is to compromise by allowing boost per term in document. Simply remove the norms-methods from the IndexReader and add a new one to the TermEnum and fall back on the field boost. How would the v

[jira] Commented: (LUCENE-140) docs out of order

2006-04-27 Thread Jason Lambert (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-140?page=comments#action_12376780 ] Jason Lambert commented on LUCENE-140: -- I was having this problem intermittently while indexing over multiple threads and I have found that the following steps can cause

Re: GData, updateable IndexSearcher

2006-04-27 Thread Yonik Seeley
On 4/27/06, Robert Engels <[EMAIL PROTECTED]> wrote: > I thought each segment maintained its own list of deleted documents Right. > (since segments are WRITE ONCE Yes, but deletions are the exception to that rule. Once written, segment files never change, except for the file that tracks deleted

RE: GData, updateable IndexSearcher

2006-04-27 Thread Robert Engels
Doug can you please elaborate on this. I thought each segment maintained its own list of deleted documents (since segments are WRITE ONCE, and when that segment is merged or optimized it would "go away" anyway, as the deleted documents are removed. In my reopen() implementation, I check to see if

[jira] Commented: (LUCENE-557) search vs explain - score discrepancies

2006-04-27 Thread Hoss Man (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-557?page=comments#action_12376775 ] Hoss Man commented on LUCENE-557: - In my haste to upload the testing patch before i left work, I faied to mention that it exposes 9 test failures, suggesting at least two bug

Re: GData, updateable IndexSearcher

2006-04-27 Thread jason rutherglen
> I think the 'public static IndexReader.reopen(IndexReader old)' method I > proposed can easily compare the current list of segments for the directory of > old to those that old already has open, and determine which can be reused and > which new segments must be opened. This makes sense. Coul

Re: boosting fields

2006-04-27 Thread Doug Cutting
karl wettin wrote: My own immediate thought is to compromise by allowing boost per term in document. Simply remove the norms-methods from the IndexReader and add a new one to the TermEnum and fall back on the field boost. How would the value be picked up by the scorer? Boost per position,

[jira] Commented: (LUCENE-556) MatchAllDocsQuery, MultiSearcher and a custom HitCollector throwing exception

2006-04-27 Thread jm (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-556?page=comments#action_12376758 ] jm commented on LUCENE-556: --- I used a custom version and my queries work now, but I am not sure wether this is ok...it's mostly an easy shot I took: public class LuceneMatchAllDocs

RE: lucene search sentence

2006-04-27 Thread Robert Engels
Ask the question on the lucene users list, not the dev-list. And, Read a book. Read the javadoc. Read the samples. -Original Message- From: Anton Feldmann [mailto:[EMAIL PROTECTED] Sent: Thursday, April 27, 2006 10:05 AM To: java-dev@lucene.apache.org; java-user@lucene.apache.org Subject:

lucene search sentence

2006-04-27 Thread Anton Feldmann
Hi I wrote a Indexer which is indexing all the contents of a text and the sentence are seperated in an other Document. "Document document = new Document(new Field ("contents", reader )); StringTokenizer token = new StringTokenizer(contents.replaceAll(". ", "\\.x\\") , "\\.x\

[jira] Commented: (LUCENE-558) Selective field loading

2006-04-27 Thread Grant Ingersoll (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-558?page=comments#action_12376723 ] Grant Ingersoll commented on LUCENE-558: This is pretty much what I started out with when I first started working on Lazy/Selective Field loading and I think it is a l

[jira] Updated: (LUCENE-558) Selective field loading

2006-04-27 Thread Chuck Williams (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-558?page=all ] Chuck Williams updated LUCENE-558: -- Attachment: LuceneTrunk.patch > Selective field loading > --- > > Key: LUCENE-558 > URL: http://issues.apache.org/jira

[jira] Created: (LUCENE-558) Selective field loading

2006-04-27 Thread Chuck Williams (JIRA)
Selective field loading --- Key: LUCENE-558 URL: http://issues.apache.org/jira/browse/LUCENE-558 Project: Lucene - Java Type: New Feature Components: Index Versions: 2.0 Environment: All Reporter: Chuck Williams Provides a new

Re: boosting fields

2006-04-27 Thread karl wettin
26 apr 2006 kl. 19.18 skrev Doug Cutting: karl wettin wrote: How about refactoring fields to something like: [Document](fieldName)<#> {0..1} ->[Field +boost]<#> {0..*} -> [FieldValue +store +index +termVector] If you think you have a simple, back-compatible way to do this, pleas