Fwd: [jira] Updated: (LUCENE-437) SnowballFilter loses token position offset

2005-09-22 Thread Erik Hatcher
Begin forwarded message: From: "Yonik Seeley (JIRA)" <[EMAIL PROTECTED]> Date: September 21, 2005 5:45:27 PM EDT To: java-dev@lucene.apache.org Subject: [jira] Updated: (LUCENE-437) SnowballFilter loses token position offset Reply-To: java-dev@lucene.apache.org [ http://issues.apache.or

Re: Can any one help

2005-09-22 Thread Erik Hatcher
What is the offending .toString() of your query? You're building a big boolean query somehow - via wildcards? range? fuzzy? The queries you mention below would not cause that error. Also, please follow-up to java-user list, not java-dev. Erik On Sep 22, 2005, at 1:55 AM, santosh w

Re: Can any one help

2005-09-22 Thread santosh
Hi Erik, Thanks for support We are passing this String " booleanQuery.toString() =+contents:java* +contents:oracle* " to searcher and i also observed that internally it is creating queries which are adding to the BooleanQuery and Clause sixe is increasing. Any suggetsion -Santhosh - Or

[jira] Resolved: (LUCENE-437) SnowballFilter loses token position offset

2005-09-22 Thread Erik Hatcher (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-437?page=all ] Erik Hatcher resolved LUCENE-437: - Fix Version: unspecified Resolution: Fixed Yonik - thanks for the patch! It has been applied. > SnowballFilter loses token position offset > -

Re: UTF-8 and unit test failure for org.apache.analysis.ru.RussianStem in build with Kaffe

2005-09-22 Thread Ken Krugler
Hi Barry, Hello, it's those pesky Debian Lucene package maintainers again :-). Lucene currently builds and passes all but one unit test against Kaffe[0] 1.1.6. In debugging the failure of the unit test for org.apache.analysis.ru.RussianStem, I enabled a build of the JUnit test reports. A

Re: UTF-8 and unit test failure for org.apache.analysis.ru.RussianStem in build with Kaffe

2005-09-22 Thread Steven Rowe
Barry Hawkins wrote: Guys, Hello, it's those pesky Debian Lucene package maintainers again :-). Lucene currently builds and passes all but one unit test against Kaffe[0] 1.1.6. In debugging the failure of the unit test for org.apache.analysis.ru.RussianStem, I enabled a build of the JUnit t

working towards a release...

2005-09-22 Thread Erik Hatcher
Cross-posting... In gearing up for a 1.9 release, I've been perusing our issue tracker (http://issues.apache.org/jira/browse/LUCENE) to see what low-hanging fruit there is that I can help address. This message is a prod for others to do the same. We have >100 open issues, some of which ma

[jira] Updated: (LUCENE-101) Selecting a language-specific analyzer according to a locale.

2005-09-22 Thread Erik Hatcher (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-101?page=all ] Erik Hatcher updated LUCENE-101: Bugzilla Id: (was: 18934) Component: Analysis (was: Other) Description: Moved from todo.xml: Now we rewrite parts of Lucene co

[jira] Updated: (LUCENE-429) Little improvement for SimpleHTMLEncoder

2005-09-22 Thread Erik Hatcher (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-429?page=all ] Erik Hatcher updated LUCENE-429: Bugzilla Id: (was: 36333) Component: Examples (was: Other) Description: The SimpleHTMLEncoder could be improved slightly: all c

[jira] Commented: (LUCENE-328) Some utilities for a compact sparse filter

2005-09-22 Thread Yonik Seeley (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-328?page=comments#action_12330216 ] Yonik Seeley commented on LUCENE-328: - How about adding a next() or nextDocNr() to DocNrSkipper that doesn't take the current id as a parameter? It would allow more effici

[jira] Commented: (LUCENE-329) Fuzzy query scoring issues

2005-09-22 Thread Mark Harwood (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-329?page=comments#action_12330222 ] Mark Harwood commented on LUCENE-329: - This has been partially addressed. Issue 1, the coord factor has been rectified with the introduction of new constructor BooleanQu

UInt32 or Int32

2005-09-22 Thread Marvin Humphrey
Greets, The File Formats document indicates that Lucene's primitive datatypes include a UInt32, but writeInt actually writes signed ints. In fact, if that weren't the case, FORMAT couldn't be specified as a negative number. Should UInt32 be Int32 instead? It looks like the same holds tr

[jira] Commented: (LUCENE-124) Fuzzy Searches do not get a boost of 0.2 as stated in "Query Syntax" doc

2005-09-22 Thread Mark Harwood (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-124?page=comments#action_12330223 ] Mark Harwood commented on LUCENE-124: - I would suggest this is a duplicate of http://issues.apache.org/jira/browse/LUCENE-329 The idf rating of expanded terms should be t

[jira] Commented: (LUCENE-126) [PATCH] Modifications for retrieval of terms

2005-09-22 Thread Mark Harwood (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-126?page=comments#action_12330228 ] Mark Harwood commented on LUCENE-126: - Suggest we reject this one. query.rewrite is now the standard way of resolving such queries into simpler TermQueries that can be ex

Re: UInt32 or Int32

2005-09-22 Thread Yonik Seeley
I'd lean toward keeping UInt32 in general, so at least that will scale to 4B documents. SegSize is the only place where UInt32 is used that it will matter (all of the other uses will never approach that size). writeInt() writes both signed and unsigned integers (or rather the bit pattern could be

[jira] Updated: (LUCENE-437) SnowballFilter loses token position offset

2005-09-22 Thread Erik Hatcher (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-437?page=all ] Erik Hatcher updated LUCENE-437: Fix Version: 1.9 (was: unspecified) Version: unspecified (was: CVS Nightly - Specify date in submission) > Sno

TokenFilters eating position increments

2005-09-22 Thread Erik Hatcher
Yonik identified an interesting issue with LUCENE-437 - http:// issues.apache.org/jira/browse/LUCENE-437 I patched the SnowballFilter, but then looked at other filters and we have the same issue with some of them (like StandardFilter, GermanStemFilter, GreekLowerCaseFilter, and others that c

[jira] Resolved: (LUCENE-126) [PATCH] Modifications for retrieval of terms

2005-09-22 Thread Erik Hatcher (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-126?page=all ] Erik Hatcher resolved LUCENE-126: - Resolution: Won't Fix Assign To: (was: Lucene Developers) See Mark's comments > [PATCH] Modifications for retrieval of terms >

[jira] Created: (LUCENE-438) add Token.setTermText(), remove final

2005-09-22 Thread Yonik Seeley (JIRA)
add Token.setTermText(), remove final - Key: LUCENE-438 URL: http://issues.apache.org/jira/browse/LUCENE-438 Project: Lucene - Java Type: Improvement Versions: CVS Nightly - Specify date in submission Reporter: Yonik Se

[jira] Updated: (LUCENE-438) add Token.setTermText(), remove final

2005-09-22 Thread Yonik Seeley (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-438?page=all ] Yonik Seeley updated LUCENE-438: Attachment: yonik_Token.txt > add Token.setTermText(), remove final > - > > Key: LUCENE-438 > URL: http://

Re: TokenFilters eating position increments

2005-09-22 Thread Yonik Seeley
> Thoughts? LOL! You're psychic. http://issues.apache.org/jira/browse/LUCENE-438 -Yonik Now hiring -- http://tinyurl.com/7m67g On 9/22/05, Erik Hatcher <[EMAIL PROTECTED]> wrote: > > Yonik identified an interesting issue with LUCENE-437 - http:// > issues.apache.org/jira/browse/LUCENE-437

Re: TokenFilters eating position increments

2005-09-22 Thread Erik Hatcher
Actually, to reply to myself, the filters that are simply changing the term text shouldn't be creating a new term anyway - but rather just setting term.termText = ... on the original term. I'll see about modifying our core and contrib filters to do this. Erik On Sep 22, 2005, at 4:29

Re: UInt32 or Int32

2005-09-22 Thread Marvin Humphrey
On Sep 22, 2005, at 1:16 PM, Yonik Seeley wrote: I'd lean toward keeping UInt32 in general, so at least that will scale to 4B documents. SegSize is the only place where UInt32 is used that it will matter (all of the other uses will never approach that size). OK, sounds good. writeInt() wr

[jira] Commented: (LUCENE-438) add Token.setTermText(), remove final

2005-09-22 Thread Erik Hatcher (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-438?page=comments#action_12330239 ] Erik Hatcher commented on LUCENE-438: - Yes, please elaborate on why you need to subclass Token. > add Token.setTermText(), remove final > -

Re: UTF-8 and unit test failure for org.apache.analysis.ru.RussianStem in build with Kaffe

2005-09-22 Thread Barry Hawkins
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Steven Rowe wrote: [...] > Transliterated into the Latin-1 alphabet, this is "a\r\nbe", where "\r" > and "\n" are carriage return and newline, resp., and the "b" is the > Cyrillic character that sounds like English "b". > > So, it looks to me like the

[jira] Commented: (LUCENE-438) add Token.setTermText(), remove final

2005-09-22 Thread Yonik Seeley (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-438?page=comments#action_12330250 ] Yonik Seeley commented on LUCENE-438: - Mostly to convey information across TokenFilters, and the single type string isn't sufficient. For exampe, I'd like to have an int o