Re: Sort difference between 2.1 and 2.3

2008-04-08 Thread Michael McCandless
You're right, Lucene changed wrt the 0x character: 2.3 now uses this character internally as an end of term marker when storing term text. This was done as part of LUCENE-843 (speeding up indexing). Technically that character is an invalid UTF16 character (for interchange), but it looks like

Re: StandardTokenizerConstants in 2.3

2008-04-08 Thread Michael McCandless
Unfortunately, we lost the StandardTokenizerConstants interface as part of this: https://issues.apache.org/jira/browse/LUCENE-966 which was a speedup to StandardTokenizer by switching to JFlex instead of JavaCC. But, the constants that are used by StandardTokenizer are still available as

Re: Sort difference between 2.1 and 2.3

2008-04-08 Thread Antony Bowesman
Thanks for the explanation Mike. It's not a big issue, it's just a test case where I was needed to ensure ordering for the test, so I'll just use a valid high utf-16 character. It just seemed odd that the field was showing strangely in Luke. Your explanation gives the reason, thanks.

Pooling of posting objects in DocumentsWriter

2008-04-08 Thread Michael Busch
Hi, this is most likely a question for Mike. I'm trying to figure out what changes we need to make in order to support flexible indexing and LUCENE-1231. Currently I'm looking into the DocumentsWriter. If we want to support different posting lists, then we probably want to change the

[jira] Created: (LUCENE-1261) Impossible to use custom norm encoding/decoding

2008-04-08 Thread John Adams (JIRA)
Impossible to use custom norm encoding/decoding --- Key: LUCENE-1261 URL: https://issues.apache.org/jira/browse/LUCENE-1261 Project: Lucene - Java Issue Type: Bug Components:

[jira] Commented: (LUCENE-1261) Impossible to use custom norm encoding/decoding

2008-04-08 Thread Karl Wettin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12586880#action_12586880 ] Karl Wettin commented on LUCENE-1261: - Hi John, see LUCENE-1260 karl

Re: Pooling of posting objects in DocumentsWriter

2008-04-08 Thread Michael McCandless
Hi Michael, I've actually been working on factoring DocumentsWriter, as a first step towards flexible indexing. I agree we would have an abstract base Posting class that just tracks the term text. Then, DocumentsWriter manages inverting each field, maintaining the per-field hash of term Text -

Re: shingles and punctuations

2008-04-08 Thread Mathieu Lecarme
setting a flag in a filter is easy : 8--- package org.apache.lucene.analysis.shingle; import java.io.IOException; import org.apache.lucene.analysis.Token; import org.apache.lucene.analysis.TokenFilter; import org.apache.lucene.analysis.TokenStream; /** * @author Mathieu

[jira] Commented: (LUCENE-1260) Norm codec strategy in Similarity

2008-04-08 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12586954#action_12586954 ] Hoss Man commented on LUCENE-1260: -- bq. I haven't thought too much about it yet, but it

Re: StandardTokenizerConstants in 2.3

2008-04-08 Thread Antony Bowesman
But, the constants that are used by StandardTokenizer are still available as static ints in the StandardTokenizer class (ie, ALPHANUM, APOSTROPHE, etc.). Does that work? Problem as mentioned below is that the StandardTokenizerImpl.java is package private and even though the ints and string

Re: StandardTokenizerConstants in 2.3

2008-04-08 Thread Michael McCandless
But, StandardTokenizer is public? It exports those constants for you? Mike Antony Bowesman wrote: But, the constants that are used by StandardTokenizer are still available as static ints in the StandardTokenizer class (ie, ALPHANUM, APOSTROPHE, etc.). Does that work? Problem as

Re: StandardTokenizerConstants in 2.3

2008-04-08 Thread Antony Bowesman
But, StandardTokenizer is public? It exports those constants for you? Really? Sorry, but I can't find them - in 2.3.1 sources, there are no references to those statics. Javadocs have no reference to them in StandardTokenizer

Optimise Indexing time using lucene..

2008-04-08 Thread lucene4varma
Hi all, I am new to lucene and am using it for text search in my web application, and for that i need to index records in database. We are using jdbc directory to store the indexes. Now the problem is when is start the process of indexing the records for the first time it is taking huge amount

Re: [jira] Created: (LUCENE-1257) Port to Java5

2008-04-08 Thread robert engels
That is opposite of my testing:... The 'foreach' is consistently faster. The time difference is independent of the size of the array. What I know about JVM implementations, the foreach version SHOULD always be faster - because the no bounds checking needs to be done on the element

Re: [jira] Created: (LUCENE-1257) Port to Java5

2008-04-08 Thread Yonik Seeley
On Tue, Apr 8, 2008 at 7:48 PM, robert engels [EMAIL PROTECTED] wrote: That is opposite of my testing:... The 'foreach' is consistently faster. It's consistently slower for me (I tested java5 and java6 both with -server on a P4). I'm a big fan of testing different methods in different test

[jira] Created: (LUCENE-1262) NullPointerException from FieldsReader after problem reading the index

2008-04-08 Thread Trejkaz (JIRA)
NullPointerException from FieldsReader after problem reading the index -- Key: LUCENE-1262 URL: https://issues.apache.org/jira/browse/LUCENE-1262 Project: Lucene - Java

Re: [jira] Created: (LUCENE-1257) Port to Java5

2008-04-08 Thread Yonik Seeley
foreach vs explicit loop counter is pretty academic for Lucene anyway I think. I can't think of any inner loops where it would really matter. -Yonik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail:

Re: StandardTokenizerConstants in 2.3

2008-04-08 Thread Chris Hostetter
: But, StandardTokenizer is public? It exports those constants for you? : : Really? Sorry, but I can't find them - in 2.3.1 sources, there are no : references to those statics. Javadocs have no reference to them in : StandardTokenizer I think Michael is forgetting that he re-added those

Re: Help migrating from 1.9.1 to 2.3.0 (Newbie)

2008-04-08 Thread Chris Hostetter
There is a FAQ covering this question... http://wiki.apache.org/lucene-java/LuceneFAQ#head-86d479476c63a2579e867b75d4faa9664ef6cf4d start by getting your code to compile against 1.9.1 without any deprecation warnings. The deprecation messages in the 1.9.1 javadocs will tell you which new