cvs commit: jakarta-lucene-sandbox/contributions/ant example.xml

2004-02-24 Thread ehatcher
ehatcher2004/02/24 17:11:24 Added: contributions/ant example.xml Log: example of antlib usage Revision ChangesPath 1.1 jakarta-lucene-sandbox/contributions/ant/example.xml Index: example.xml ===

Re: Dmitry's Term Vector stuff, plus some

2004-02-24 Thread markharw00d
I'm not sure what applications people have in mind for Term Vector support but I would prefer to have the original text positions (not term sequence positions) stored so I can offer this: 1) Significant terms/phrases identification Like "Gigabits" on gigablast.com - used to offer choices of (uns

Re: Dmitry's Term Vector stuff, plus some

2004-02-24 Thread Grant Ingersoll
This is provided by the Token.startOffset() and Token.endOffset() at indexing time, I think. I don't know if this is accessible at run time. A good place to see what is stored in the files is the File Formats section located at http://jakarta.apache.org/lucene/docs/fileformats.html. (Get the

Re: Dmitry's Term Vector stuff, plus some

2004-02-24 Thread Bruce Ritchie
Grant Ingersoll wrote: It is the location of the token in the document (see IndexReader.termPositions()). This information is already being stored in other parts of the index, it just isn't very efficient to get at it. Ok, that wasn't the answer I was hoping for :) I was hoping that the positi

Re: Dmitry's Term Vector stuff, plus some

2004-02-24 Thread Grant Ingersoll
It is the location of the token in the document (see IndexReader.termPositions()). This information is already being stored in other parts of the index, it just isn't very efficient to get at it. I think it would be useful to add to the IndexReader a way to get a list of positions given a te

Re: Dmitry's Term Vector stuff, plus some

2004-02-24 Thread Bruce Ritchie
Doug Cutting wrote: Grant Ingersoll wrote: Do you see any reason to write position information at all for the term vectors? It could be useful to some folks. If, for example, you only want to expand a query with terms that occur near query terms, like automatic phrase identification. In ge

cvs commit: jakarta-lucene/src/java/org/apache/lucene/index IndexReader.java

2004-02-24 Thread cutting
cutting 2004/02/24 12:43:35 Modified:src/java/org/apache/lucene/index IndexReader.java Log: Fixed javadoc. Revision ChangesPath 1.27 +3 -3 jakarta-lucene/src/java/org/apache/lucene/index/IndexReader.java Index: IndexReader.java ===

DO NOT REPLY [Bug 26702] - [PATCH] arbitrary sorting

2004-02-24 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT . ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://nagoya.apache.org/bugzilla/show_bu

cvs commit: jakarta-lucene/src/java/org/apache/lucene/search StringSortedHitQueue.java

2004-02-24 Thread cutting
cutting 2004/02/24 12:41:16 Modified:src/java/org/apache/lucene/search StringSortedHitQueue.java Log: Fixed problem with sorting. Revision ChangesPath 1.3 +62 -14 jakarta-lucene/src/java/org/apache/lucene/search/StringSortedHitQueue.java Index: StringSorted

DO NOT REPLY [Bug 26702] - [PATCH] arbitrary sorting

2004-02-24 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT . ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://nagoya.apache.org/bugzilla/show_bu

DO NOT REPLY [Bug 18927] - [PATCH] Term Vector support

2004-02-24 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT . ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://nagoya.apache.org/bugzilla/show_bu

cvs commit: jakarta-lucene/src/java/org/apache/lucene/search QueryTermVector.java

2004-02-24 Thread cutting
cutting 2004/02/24 11:46:43 Modified:src/java/org/apache/lucene/index SegmentTermVector.java TermFreqVector.java src/java/org/apache/lucene/search QueryTermVector.java Log: Removed some dead code and redundant javadoc. Fixed a few javadoc bugs.

cvs commit: jakarta-lucene/src/test/org/apache/lucene/search TestSort.java

2004-02-24 Thread cutting
cutting 2004/02/24 11:34:58 Modified:src/java/org/apache/lucene/search FieldDocSortedHitQueue.java FieldSortedHitQueue.java FloatSortedHitQueue.java IntegerSortedHitQueue.java MultiFieldSorted

Re: Porter Stemmer

2004-02-24 Thread Erik Hatcher
On Feb 24, 2004, at 12:33 PM, Michael McGrady wrote: This conversation is a mystery to me. Is there some different Porter stemmer than the one available in the Lucene source code? Yes. As mentioned, the snowball analyzer family lives in the sandbox. The CVS repository is jakarta-lucene-sandbox

cvs commit: jakarta-lucene-sandbox/contributions common.xml

2004-02-24 Thread cutting
cutting 2004/02/24 11:08:07 Modified:contributions common.xml Log: Added support for javadoc, releases, etc. Revision ChangesPath 1.6 +115 -4jakarta-lucene-sandbox/contributions/common.xml Index: common.xml

cvs commit: jakarta-lucene-sandbox/contributions/snowball/src/java/org/apache/lucene/analysis/snowball SnowballAnalyzer.java SnowballFilter.java

2004-02-24 Thread cutting
cutting 2004/02/24 11:07:36 Modified:contributions/snowball LICENSE.txt build.xml contributions/snowball/docs index.html contributions/snowball/src/java/org/apache/lucene/analysis/snowball SnowballAnalyzer.java SnowballFilter.java A

DO NOT REPLY [Bug 18927] - [PATCH] Term Vector support

2004-02-24 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT . ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://nagoya.apache.org/bugzilla/show_bu

Important Deadline: Contributor License Agreement, March 1st

2004-02-24 Thread Otis Gospodnetic
Developers, please see this: http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]&msgNo=16081 I see some important jakarta-lucene names on the list (link in email above and below) of people who have not sent in their CLAs. We'd hate to lose you and your contributions. If you need more time, plea

Re: Porter Stemmer

2004-02-24 Thread Michael McGrady
This conversation is a mystery to me. Is there some different Porter stemmer than the one available in the Lucene source code? At 09:03 AM 2/24/2004, you wrote: On Feb 24, 2004, at 10:03 AM, Grant Ingersoll wrote: Is there any reason why the PorterStemmer can't be made public? I know several p

Re: Porter Stemmer

2004-02-24 Thread Erik Hatcher
On Feb 24, 2004, at 10:03 AM, Grant Ingersoll wrote: Is there any reason why the PorterStemmer can't be made public? I know several people have submitted this patch, both separately and as part of other patches. I, for one, am using it in other places as part of my overall search solution and

Porter Stemmer

2004-02-24 Thread Grant Ingersoll
Hi, Is there any reason why the PorterStemmer can't be made public? I know several people have submitted this patch, both separately and as part of other patches. I, for one, am using it in other places as part of my overall search solution and I bet others are as well. I guess I could under

DO NOT REPLY [Bug 27182] - [PATCH] Thai Analysis Enhancement

2004-02-24 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT . ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://nagoya.apache.org/bugzilla/show_bu

DO NOT REPLY [Bug 27182] - [PATCH] Thai Analysis Enhancement

2004-02-24 Thread bugzilla
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT . ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://nagoya.apache.org/bugzilla/show_bu