ANNOUNCE: Release of Lucene Java 3.0.1 and 2.9.2

2010-02-26 Thread Uwe Schindler
Hello Lucene users, On behalf of the Lucene development community I would like to announce the release of Lucene Java versions 3.0.1 and 2.9.2: Both releases fix bugs in the previous versions: - 2.9.2 is a bugfix release for the Lucene Java 2.x series, based on Java 1.4 - 3.0.1 has the same

Re: problem about backup index file

2010-02-26 Thread Michael McCandless
Well, lucene is write once and then, eventually, delete once ;) Ie files are eventually deleted (when they are merged away). So when you do the incremental backup, any file not listed in the current commit can be removed from your backup (assuming you only want to backup the last commit). Don't

Re: If you could have one feature in Lucene...

2010-02-26 Thread Paul Taylor
Glen Newton wrote: +2 On 25 February 2010 04:45, Avi Rosenschein arosensch...@gmail.com wrote: Similarity can only be set per index, but I want to adjust scoring behaviour at a field level, to faciliate this could we pass make field name available to all score methods. Currently it is only

Re: ANNOUNCE: Release of Lucene Java 3.0.1 and 2.9.2

2010-02-26 Thread Paul Taylor
Uwe Schindler wrote: Hello Lucene users, On behalf of the Lucene development community I would like to announce the release of Lucene Java versions 3.0.1 and 2.9.2: Both releases fix bugs in the previous versions: - 2.9.2 is a bugfix release for the Lucene Java 2.x series, based on Java 1.4

Re: ANNOUNCE: Release of Lucene Java 3.0.1 and 2.9.2

2010-02-26 Thread Robert Muir
such projects can do this, in one place: public static final Version MY_APP_CURRENT = Version.LUCENE_30; then later StandardAnalyzer analyzer = new StandardAnalyzer(MY_APP_CURRENT); then they have complete control of this, independent of when the upgrade lucene's jar file! On Fri, Feb 26,

Re: ANNOUNCE: Release of Lucene Java 3.0.1 and 2.9.2

2010-02-26 Thread Paul Taylor
Robert Muir wrote: such projects can do this, in one place: public static final Version MY_APP_CURRENT = Version.LUCENE_30; then later StandardAnalyzer analyzer = new StandardAnalyzer(MY_APP_CURRENT); then they have complete control of this, independent of when the upgrade lucene's jar

Re: ANNOUNCE: Release of Lucene Java 3.0.1 and 2.9.2

2010-02-26 Thread Johannes Zillmann
Just one thought... For me it would be natural to be never confronted with the Version.xx thing in the api unless you really need. so f.e. having new QueryParser(, new KeywordAnalyzer()).parse(content: the); as a default (probably using Version.LUCENE_CURRENT under the hood), but

Re: ANNOUNCE: Release of Lucene Java 3.0.1 and 2.9.2

2010-02-26 Thread Michael McCandless
That would be more natural/convenient, but it'd unfortunately defeat the whole reason Version was added in the first place. By making Version required, we force callers to be explicit to Lucene about what level of back compat is required. This then enables Lucene to improve its defaults with

Re: NAS vs SAN vs Server Disk RAID

2010-02-26 Thread Marcelo Ochoa
Hi Ian: Only as curiosity ;) Which distributed file system are you using on top of your NAS storage? Best regards, Marcelo. On Thu, Feb 25, 2010 at 6:54 AM, Ian Lea ian@gmail.com wrote: We've run lucene on NAS, although not with indexes anything like as large as 1Tb, and gave up

Re: NAS vs SAN vs Server Disk RAID

2010-02-26 Thread Ian Lea
NFS. It works fine for simple essentially static lucene indexes and we still use it for that, but things tended to fall apart with dynamic indexes. -- Ian. On Fri, Feb 26, 2010 at 11:06 AM, Marcelo Ochoa marcelo.oc...@gmail.com wrote: Hi Ian:  Only as curiosity ;)  Which distributed file

Re: ANNOUNCE: Release of Lucene Java 3.0.1 and 2.9.2

2010-02-26 Thread Ian Lea
Could there be a Version value called LUCENE_LATEST_DANGER_USE_AT_YOUR_OWN_RISK or whatever you want to make it. I understand the argument about backwards compatibility but I'm with Johannes on making things easier for those who have code which doesn't require the compatibility. Like me. I've

NumericField exact match

2010-02-26 Thread Ivan Vasilev
Hi Guys, Is it possible to make exact searches on fields that are of type NumericField and if yes how? In the LIA book part 2 I found only information about Range searches on such fields and how to Sort them. Example - I have field size that can take integers as values. I want to get docs

Re: IndexWriter.getReader.getVersion behavior

2010-02-26 Thread Peter Keegan
Is there a way for the application to wait for the BG commit to finish before it calls IW.close? If so, would this prevent the extra version? The extra version causes the app. to think that the external data it committed is out of synch with the index, which requires the app to do extra processing

RE: NumericField exact match

2010-02-26 Thread Uwe Schindler
It's very easy: NumericRangeQuery.nexXxxRange(field, val, val, true, true) - val is the exact match. This is not slower as this automatically rewrites to a non-scored TermQuery. If you already changed QueryParser, you can also override the method for exactMatches (newTermQuery). - Uwe

Re: IndexWriter.getReader.getVersion behavior

2010-02-26 Thread Michael McCandless
Note that it's a BG merge (not commit)... You can use the new (as of 2.9 I think) IndexWriter.waitForMerges API? If you call that, then call .getReader().getVersion(), then close open the writer, I think (but you better test to be sure!) the next .getReader().getVersion() should always match.

Re: IndexWriter.getReader.getVersion behavior

2010-02-26 Thread Peter Keegan
Great, I'll give it a try. Thanks! On Fri, Feb 26, 2010 at 3:11 PM, Michael McCandless luc...@mikemccandless.com wrote: Note that it's a BG merge (not commit)... You can use the new (as of 2.9 I think) IndexWriter.waitForMerges API? If you call that, then call .getReader().getVersion(),

Re: IndexWriter.getReader.getVersion behavior

2010-02-26 Thread Peter Keegan
Can IW.waitForMerges be called between 'prepareCommit' and 'commit'? That's when the app calls 'getReader' to create external data. Peter On Fri, Feb 26, 2010 at 3:15 PM, Peter Keegan peterlkee...@gmail.comwrote: Great, I'll give it a try. Thanks! On Fri, Feb 26, 2010 at 3:11 PM, Michael

Re: IndexWriter.getReader.getVersion behavior

2010-02-26 Thread Michael McCandless
That should be fine! Mike On Fri, Feb 26, 2010 at 3:26 PM, Peter Keegan peterlkee...@gmail.com wrote: Can  IW.waitForMerges be called between 'prepareCommit' and 'commit'? That's when the app calls 'getReader' to create external data. Peter On Fri, Feb 26, 2010 at 3:15 PM, Peter Keegan

Re: NumericField exact match

2010-02-26 Thread Ivan Vasilev
Thanks for the answer Uwe, Does it matter precision step when I use NumericRangeQuery for exact matches? I mean if I use the default precision step when indexing that fields it is guaranteed that: 1. With this query I will always hit the docs that contain val for the field; 2. I will never

recovering payload from fields

2010-02-26 Thread Christopher Condit
I'm trying to store semantic information in payloads at index time. I believe this part is successful - but I'm having trouble getting access to the payload locations after the index is created. I'd like to know the offset in the original text for the token with the payload - and get this

Re: recovering payload from fields

2010-02-26 Thread Christopher Tignor
Hello, To my knoweldge, the character position of the tokens is not preserved by Lucene - only the ordinal postion of token's within a document / field is preserved. Thus you need to store this character offset information separately, say, as Payload data. best, CT On Fri, Feb 26, 2010 at

RE: recovering payload from fields

2010-02-26 Thread Christopher Condit
Hi Chris- To my knoweldge, the character position of the tokens is not preserved by Lucene - only the ordinal postion of token's within a document / field is preserved. Thus you need to store this character offset information separately, say, as Payload data. Thanks for the information. So

Re: NAS vs SAN vs Server Disk RAID

2010-02-26 Thread Petite Abeille
On Feb 25, 2010, at 12:54 AM, Andrew Bruno wrote: Since the disk IO on the server is high, our datacenter engineers suggested we look at NAS or SAN, for performance gain, and for future growth. Alternatively, get a stack of RamSan and call it a day:

Infinite loop when searching empty index

2010-02-26 Thread Justin
Is this a bug in Lucene Java as of tr...@915399? int numDocs = reader.numDocs(); // = 0 (empty index) TopDocsCollector collector = TopScoreDocCollector.create(numDocs, true); searcher.search(new MatchAllDocsQuery(), collector); // never returns // Searcher public void

RE: recovering payload from fields

2010-02-26 Thread Christopher Condit
Payload Data is accessed through PayloadSpans so using SpanQUeries is the netry point it seems. There are tools like PayloadSpanUtil that convert other queries into SpanQueries for this purpose if needed but the api for Payloads looks it like it goes through Spans is the bottom line. So