Highlighter API

2005-02-18 Thread Daniel Naber
Hi, the Highlighter's getBestFragment method takes a TokenStream and a text. Wouldn't it be easier to give it just the text and an analyzer so the user doesn't have to care about building a TokenStream? Like this: public final String getBestFragment(Analyzer analyzer, String text) throws

Re: [VOTE] Incubate lucene4c?

2005-02-17 Thread Daniel Naber
On Thursday 17 February 2005 12:11, Erik Hatcher wrote: The Incubator requires the Lucene PMC vote on whether to accept the lucene4c codebase. +1 from me. +1 -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL

Re: Incubating Lucene.Net

2005-02-17 Thread Daniel Naber
On Thursday 17 February 2005 17:14, George Aroush wrote: Proposal for new project Lucene.Net (aka dotLucene) +1 -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL

removing the old FAQ

2005-02-16 Thread Daniel Naber
Hi, could someone (Doug?) make me an administrator for the old Lucene project at sourceforge? I'd like to replace the old and outdated FAQ script there with a link to the new FAQ. The old FAQ still comes out on top when searching for lucene faq on Google. Regards Daniel --

Re: removing the old FAQ

2005-02-16 Thread Daniel Naber
On Wednesday 16 February 2005 21:01, Doug Cutting wrote: could someone (Doug?) make me an administrator for the old Lucene project at sourceforge? Done. Thanks, the old FAQ now displays a link to the Wiki FAQ. I assume it's okay if I also remove the other old pages there, e.g.

Re: BitSet implementation and large index

2005-02-14 Thread Daniel Naber
On Monday 14 February 2005 16:29, [EMAIL PROTECTED] wrote: It seems that for a huge index, it might be a good idea to use a different implementation of the BitSet when doing filtering (assuming the non-filtered set is relatively small). This would really help minimize the memory required for

Re: Lucene contribution

2005-02-11 Thread Daniel Naber
On Thursday 10 February 2005 18:01, Rida Benjelloun wrote: I was wondering if you will be interested to add LIUS as a contribution of Lucene (http://jakarta.apache.org/lucene/docs/contributions.htmlhttp://jakart a.apache.org/lucene/docs/contributions.html) in the Miscellaneous category ?

ApacheCon Europe 2005

2005-02-08 Thread Daniel Naber
ApacheCon Europe will take place in Stuttgart, Germany, 18-22 July 2005. Anybody planning to be there, maybe even planning to give a talk? -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For

Re: branch jakarta...

2005-02-08 Thread Daniel Naber
On Tuesday 08 February 2005 19:48, Erik Hatcher wrote: We can remove them (svn remove) if they have no value - and in those first two cases they probably don't (svn diff to find out, maybe). I assume remove works similar as in CVS where you can always get things back from Attic? Regards

Re: Refactoring suggestion for query parsing and creation

2005-02-07 Thread Daniel Naber
On Monday 07 February 2005 17:56, Matthew Denner wrote: QueryParser parser = new QueryParser( description, new StandardAnalyzer(), new SpecialQueryFactory(new QueryFactoryImpl())); This sounds interesting, could you create a bug report (see Lucene Bugs on the homepage) and then

Re: single field code ready - Re: URL to compare 2 Similarity's ready-- Re: Scoring benchmark evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

2005-02-07 Thread Daniel Naber
On Tuesday 08 February 2005 00:06, David Spencer wrote: So, does this make sense and is it useful way of trying to evaluate the Similarities? Is this the MultiFieldQueryParser from Lucene 1.4? Then it's buggy anyway, so it probably doesn't make sense to test it. But even with the current SVN

Re: Subversion conversion

2005-02-03 Thread Daniel Naber
On Wednesday 02 February 2005 21:20, Erik Hatcher wrote: For committers, check out the repository using https and your Apache username/password. I guess this is the password one uses to log into cvs.apache.org with ssh? Then it doesn't work for me, I can check out stuff but not commit. I get

Re: [PATCH] Highlighter and FrenchStemFilter problem

2005-01-31 Thread Daniel Naber
On Monday 31 January 2005 15:14, Erik Hatcher wrote: Thanks for that fix. I have committed the correction. Could you have a look at the br stemmer, it seems to contain similar code which might also be buggy. Regards Daniel -- http://www.danielnaber.de

Re: Scoring benchmark evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

2005-01-29 Thread Daniel Naber
On Saturday 29 January 2005 00:37, David Spencer wrote: Hmmm, is it safe to assume I can build the index w/ lucene-1.4.3.jar but deploy the webapp for searching w/ lucene-1.5-rc1-dev.jar? Yes, everything else would be a bug. And is the current code supposed to build with so many

Re: Passage Search

2005-01-28 Thread Daniel Naber
On Friday 28 January 2005 01:10, Joaquin Delgado wrote: man bites dog~DOCSIZE that would execute a phrasequery with the slope being the individual document size in number of characters of each hit. You can use the value of Integer.MAX_VALUE as a slop, Nutch does that to boost those matches

Re: Scoring benchmark evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

2005-01-28 Thread Daniel Naber
On Friday 28 January 2005 17:53, Chuck Williams wrote: I think the baseline should use Lucene's MultiFieldQueryParser to expand the query to search both title and body fields, as this is presumably the current out-of-the-box solution. Please remember that this is kind of buggy in Lucene 1.4:

Re: cvs commit: jakarta-lucene-sandbox/contributions/lucli build.xml

2005-01-23 Thread Daniel Naber
On Sunday 23 January 2005 13:04, Erik Hatcher wrote: I have gotten lucli to work fine by putting the additional JAR files on the CLASSPATH. When I try to start it with java -jar lucli-dev.jar it seems to ignore the CLASSPATH. Also the -cp option doesn't seem to have any effect when using it

Re: IOException on Windows XP -- Cannot delete deleteable

2005-01-04 Thread Daniel Naber
On Tuesday 04 January 2005 21:35, Wu, Daniel wrote: I implemented the following patch to minimize the need to rename the deleteable file and the IOException doesn't seem to happen any more. Thanks -- could you please recreate that patch with the -u option so that the context is visible and

Re: Lucene jGuru FAQ

2004-12-31 Thread Daniel Naber
On Friday 31 December 2004 00:52, Erik Hatcher wrote: The best option for dealing with the jGuru situation is probably to simply make the wiki the official FAQ and not link to jGuru's site as the official one any more (but still link to it somewhere on the wiki for additional resources).

Re: do we need two FAQs?

2004-12-20 Thread Daniel Naber
On Monday 20 December 2004 23:23, Otis Gospodnetic wrote: I'd remove the older, unmaintained one. jGuru one is more up to date. Unfortunately the start page of it contains three fat ads, that's three too much for an Open Source project. Also it cannot be searched, and it's difficult to read

Re: do we need two FAQs?

2004-12-19 Thread Daniel Naber
On Tuesday 07 December 2004 02:23, Otis Gospodnetic wrote: jGuru can provide XML dump of a FAQ, and I believe I can obtain it, if you want to use that to seed the Wiki FAQ. Could you try to get that XML for me? I'll then semi-automatically import it to the existing FAQ page in our wiki.

Deleting document in IndexWriter

2004-12-16 Thread Daniel Naber
Hi, the request to delete documents in IndexWriter instead of IndexReader comes up regularly. What if we implement a delete() method in IndexWriter like this: public synchronized void delete(int docNum) throws IOException { IndexReader ireader = IndexReader.open(directory);

Re: Boolean Scorer

2004-12-12 Thread Daniel Naber
On Sunday 12 December 2004 04:01, Chuck Williams wrote: I maintain the belief that max is *required* to implement reasonable multi-field searching (1). Could you give a small example -- preferably a test case -- that shows what the problem is? I know it has been discussed before but I hadn't

Re: [Jakarta Lucene Wiki] Updated: PoweredBy

2004-12-11 Thread Daniel Naber
On Saturday 11 December 2004 17:12, Murray Altheim wrote: Is this list just for web sites, or to include all projects that use Lucene internally? The first part of the page (http://wiki.apache.org/jakarta-lucene/PoweredBy) lists any applications and web applications that use Lucene. Regards

setLowercaseWildcardTerms and FuzzyQueries

2004-12-10 Thread Daniel Naber
Hi, QueryParser's setLowercaseWildcardTerms doesn't affect FuzzyQueries. So qp.parse(Axaaa~0.7); will parse to field:Axaaa~0.7 by default (note the uppercase A). Is there any reason for this? I'd suggest to change setLowercaseWildcardTerms so that it also affects FuzzyQueries -- as it's true

Re: Release 1.4.3

2004-12-10 Thread Daniel Naber
On Tuesday 07 December 2004 00:54, Erik Hatcher wrote: http://jakarta.apache.org/site/binindex.cgi http://jakarta.apache.org/site/sourceindex.cgi could update those pages? Done. Could you maybe also remove the link to the nightly build? They are not up-to-date anymore.

Re: HTMLParser.jj still has ASL 1.1

2004-12-08 Thread Daniel Naber
On Tuesday 07 December 2004 23:15, Eric Isakson wrote: Just wanted to note that the \src\demo\org\apache\lucene\demo\html\HTMLParser.jj file was missed in the license upgrade. Thanks, it's fixed now. Regards Daniel -- http://www.danielnaber.de

[ANNOUNCE] Lucene 1.4.3 released

2004-12-07 Thread Daniel Naber
I'd like to officially announce Lucene 1.4.3. This release fixes two bugs, the list of changes is so short that I will simply paste it here: 1. The JSP demo page (src/jsp/results.jsp) now properly escapes error messages which might contain user input (e.g. error messages about query

Re: Release 1.4.3

2004-12-06 Thread Daniel Naber
On Monday 06 December 2004 10:47, Erik Hatcher wrote: Christoph - one remaining task before officially announcing Lucene 1.4.3 is to update the main Jakarta site's binindex and sourceindex files: http://jakarta.apache.org/site/binindex.cgi http://jakarta.apache.org/site/sourceindex.cgi I

do we need two FAQs?

2004-12-06 Thread Daniel Naber
Hi, there are currently two FAQs for Lucene: http://www.jguru.com/faq/Lucene http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi To my mind that leads to redundancy and decreases the motivation to update at least one of them. As the jguru FAQ is full of ads and more difficult to navigate

old license in StandardTokenizer.jj

2004-11-29 Thread Daniel Naber
Hi, StandardTokenizer.jj still contains version 1.1 of the Apache license. Is it okay to update it to 2.0? Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands,

Re: encoding of german analyzer source files

2004-11-26 Thread Daniel Naber
On Friday 26 November 2004 11:42, Stefan Wachter wrote: With UTF-8 encoding theses source files look rather strange when viewed on an ISO-8859-1 development environment because they contain german umlauts and the sharp s. Your editor / IDE needs to be unicode aware and you have to set it up

Re: encoding of german analyzer source files

2004-11-26 Thread Daniel Naber
On Friday 26 November 2004 23:40, Murray Altheim wrote: In grepping through the source I noted nine instances of a lowercase use of !doctype, which isn't valid. This should probably be registered as a bug. Kinda makes me wonder what's generating that, because when I run javadoc on my own

Re: cvs commit: jakarta-lucene/src/java/org/apache/lucene/document DateField.java

2004-11-23 Thread Daniel Naber
On Tuesday 23 November 2004 15:17, [EMAIL PROTECTED] wrote: + * + * @deprecated Use [EMAIL PROTECTED] DateTools} instead. The new DateTools class isn't compatible to DateField, i.e. it returns different Strings. If we force people to use the new class it means they have to re-index. It

Re: typo in javadoc

2004-11-23 Thread Daniel Naber
On Tuesday 23 November 2004 22:39, Paul wrote: org.apache.lucene.index.IndexWriter the links within the first paragraph don't work, their targets don't exist because of overloaded functions This has already been fixed some time ago in CVS. Regards Daniel -- http://www.danielnaber.de

Re: cvs commit: jakarta-lucene/src/java/org/apache/lucene/index SegmentReader.java

2004-11-16 Thread Daniel Naber
On Tuesday 16 November 2004 22:56, [EMAIL PROTECTED] wrote: + throw new RuntimeException(cannot load SegmentReader class: + e.getMessage()); I think it's better to leave out the call to getMessage(), as the toString() which is then used automatically is slightly more verbose, it contains

Re: the future of MultiFieldQueryParser

2004-11-15 Thread Daniel Naber
On Monday 15 November 2004 09:50, Christiaan Fluit wrote: Is it correct that Lucene 1.4.2 already had the desired behaviour? It's parse method looks like this (MFQP revision 1.4): That version doesn't work as expected for queries where all the terms are required, it parses them in a way so

Re: the future of MultiFieldQueryParser

2004-11-15 Thread Daniel Naber
On Monday 15 November 2004 20:00, Christiaan Fluit wrote: I'm not sure whether we are thinking alike here. Judging from the code in 1.4.2, I expect the query X AND Y to be evaluated as: (field1:X AND field1:Y) OR (field2:X AND field2:Y) OR ... (fieldn:X AND fieldn:Y) Yes, that's the way

Re: the future of MultiFieldQueryParser

2004-11-15 Thread Daniel Naber
On Monday 15 November 2004 22:04, Bill Janssen wrote: I've already addressed this issue a few months ago on the lucene-users list. My improved version of MultiFieldQueryParser is at ftp://ftp.parc.xerox.com/transient/janssen/SearchTest.java. Do you see any advantage of your implementation

Re: the future of MultiFieldQueryParser

2004-11-15 Thread Daniel Naber
On Tuesday 16 November 2004 00:09, Bill Janssen wrote: Aside from the fact that the URL you cited doesn't work for me :-? http://issues.apache.org/eyebrowse/[EMAIL PROTECTED] apache.orgmsgId=1798116 Here's an alternative URL: http://java2.5341.com/msg/75416.html Regards Daniel --

the future of MultiFieldQueryParser

2004-11-13 Thread Daniel Naber
Hi, I'd like to fix MultiFieldQueryParser so that it properly works with AND queries. Currently it rewrites AND queries so that all terms must appear in all fields, which rarely makes sense. Eric Jain suggested a new class that works for AND and OR queries:

Re: Bug in the BooleanQuery optimizer? ..TooManyClauses

2004-11-12 Thread Daniel Naber
On Friday 12 November 2004 15:11, Giulio Cesare Solaroli wrote: - select, between the possible terms, only the first 1024 (or what ever the limit is) more meaningful ones, leaving out all the others. This is, BTW, what FuzzyQuery in CVS HEAD does now. For FuzzyQuery, however, it's easier to

Re: Propose Bernhard as committer

2004-11-09 Thread Daniel Naber
On Tuesday 09 November 2004 10:44, Bernhard Messer wrote: I already got my userid from ASF and can successfully connect to apache.org. In the mail from ASF, there is a note, that the Project Management Committee responsible for the project has to grant me access to CVS. Does anybody know how

Re: [PATCH] FuzzyTermEnum optimization and refactor

2004-10-23 Thread Daniel Naber
On Saturday 23 October 2004 01:08, Jonathan Hager wrote: Since fuzzy searching is kind of slow, I took a look at it to see if it could be improved. I saw speed improvements of 10% - 60% by making a couple changes. Thanks for your patch. I did not yet have time to look at the patch in detail,

Re: Normalized Scoring -- was RE: idf and explain(), was Re: Search and Scoring

2004-10-21 Thread Daniel Naber
On Thursday 21 October 2004 20:00, Chuck Williams wrote: Thanks Otis. Other than trying to get some consensus a) that this is a problem worth fixing, and b) on the best approach to fix it, my central question is, if I fix it is it likely to get incorporated back into Lucene? Chuck, sorry, I

Re: Two formal questions

2004-10-21 Thread Daniel Naber
On Thursday 21 October 2004 19:41, Christoph Goller wrote: I think there are enough votes for Bernhard. As I nominated Bernhard, what am I supposed to do now? I looked already through the Apache pages, but didn't find a HOWTO :-) See http://jakarta.apache.org/site/roles.html Regards Daniel

Re: idf and explain(), was Re: Search and Scoring

2004-10-19 Thread Daniel Naber
On Tuesday 19 October 2004 04:03, Chuck Williams wrote: On another note, I had to remove the German analyzer in my current 1.4.2 source configuration because GermanStemmer failed to compile due to what are apparently Unicode character constants that I've now got as illegal two-character

Re: Propose Bernhard as committer

2004-10-18 Thread Daniel Naber
On Monday 18 October 2004 18:35, Christoph Goller wrote: I would like to propose Bernhard as Lucene committer. +1 -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail:

equals() is implemented in some Querys only

2004-10-18 Thread Daniel Naber
Hi, what's the reason that e.g. PhraseQuery implements equals, but FuzzyQuery doesn't? Could this become a problem in some situations? Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED]

Re: Failure of wildcard search in the middle of a term

2004-10-15 Thread Daniel Naber
On Friday 15 October 2004 19:46, Lagerloef Kris-P54513 wrote: A search of Cal*s finds nothing. Try searching cal*s, the analyzer probably indexed the word in lowercase, but the QueryParser by default doesn't do that for WildcardTerms. See QueryParser.setLowercaseWildcardTerms(). Regards

Re: documentation in fileformats.html

2004-10-14 Thread Daniel Naber
On Wednesday 13 October 2004 23:18, Doug Cutting wrote: We should note that when compression is enabled, gzip is used. Actually it's ZLIB, isn't it? Also, byte[] is not a type defined in the file. In the formalism used in fileformats.html, this should be: Value - String | BinaryValue

Re: FuzzyQuery prefix length

2004-10-13 Thread Daniel Naber
On Wednesday 13 October 2004 00:42, Daniel Naber wrote: I'll try to do some performance tests with fuzzy query tomorrow on a 250,000 document index. Searching for Photokopie~ on a 230,000 document corpus takes 2.3 seconds here (AMD Athlon 2600+; other fuzzy terms get similar performance

Re: Re[2]: DateTools tests failed

2004-10-13 Thread Daniel Naber
On Tuesday 12 October 2004 08:58, Anatol Pomozov wrote: Your patch cures one of two failures. But there another test failure. Thanks, I applied your patch. Regards Daniel -- http://www.danielnaber.de - To unsubscribe,

Re: documentation in fileformats.html

2004-10-13 Thread Daniel Naber
On Monday 11 October 2004 23:10, Bernhard Messer wrote: replace: Bits -- Byte Value -- String Currently only the low-order bit is used of Bits is used. It is one for tokenized fields, and zero for non-tokenized fields. The web page is updated now, could you please re-check if it's

Re: FuzzyQuery prefix length

2004-10-12 Thread Daniel Naber
On Tuesday 12 October 2004 17:22, Doug Cutting wrote: Which is worse: a person who searches for Photokopie~ in a 1000 document collection does not find documents containing Fotokopie; or a person who searches for Photokopie~ in a 1M document collection doesn't find anything because it takes

Re: FuzzyQuery prefix length

2004-10-11 Thread Daniel Naber
On Monday 11 October 2004 10:53, Christoph Goller wrote: Maybe the default should remain 0 and folks with big indices should decide by themselve to use a prefix. I agree that the default should stay 0, even for Lucene 2.0. Regards Daniel

Re: QueryParser and backwards-compatibility

2004-10-11 Thread Daniel Naber
On Monday 11 October 2004 10:31, Christoph Goller wrote: These things clearly could break existing applications. So the best solution would be to undo them. Some people might have adapted their software already, maybe the old methods (those with the analyzer argument) should be kept, but as

Re: documentation in fileformats.html

2004-10-11 Thread Daniel Naber
On Monday 11 October 2004 18:21, Bernhard Messer wrote: Currently only the low-order bit is used of Bits is used. It is one for tokenized fields, and zero for non-tokenized fields. is outdated now and should be updated. Any idea how to proceed ? Is that the only sentence? If so, you could

Re: FuzzyQuery prefix length

2004-10-11 Thread Daniel Naber
On Monday 11 October 2004 18:20, Doug Cutting wrote: However 2.0 is our opportunity to make incompatible changes. What is the best default for this, that will work well for the most applications? I see the following problems with a default 0: -It is the only change so far that we cannot

Re: DateTools tests failed

2004-10-11 Thread Daniel Naber
On Monday 11 October 2004 10:44, Anatol Pomozov wrote: I've updated source tree from CVS and tried to run unit tests. They are failed. See log below. Could you try again? I just committed a supposed fix. Regards Daniel -- http://www.danielnaber.de

Re: Lucene 1.4.2?

2004-10-02 Thread Daniel Naber
On Saturday 02 October 2004 13:45, Christoph Goller wrote: Too bad I found the first bug already: the CHANGES files describes the new fuzzy syntax (e.g. fuzzy~0.7), but this isn't part of the release (i.e. the QueryParser part of that feature is missing). Any chance to fix this (i.e.

Re: Lucene 1.4.2?

2004-10-02 Thread Daniel Naber
On Friday 01 October 2004 23:57, Doug Cutting wrote: It is not mirrored yet. Erik's the only one who has ever done that. Erik, do you have time to mirror 1.4.2? Thanks. BTW, the release on the official download pages is still 1.4-final: http://jakarta.apache.org/site/sourceindex.cgi

Re: Lucene 1.4.2?

2004-10-01 Thread Daniel Naber
On Friday 01 October 2004 23:57, Doug Cutting wrote: The new release is up at http://jakarta.apache.org/lucene/. Thanks, I will submit it to freshmeat.net tomorrow. Should I also announce it on the mailing lists, as suggested at http://nagoya.apache.org/wiki/apachewiki.cgi?ReleaseManager or

Re: Lucene 1.4.2?

2004-09-30 Thread Daniel Naber
On Wednesday 29 September 2004 23:17, Doug Cutting wrote: I just made the memory leak patch in this branch, but I've not yet updated CHANGES.txt. I just did that. I also backported Christoph's change to SortComparator.java and the wrong cast in FieldCacheImpl.java (that's not documented in

Re: Lucene 1.4.2?

2004-09-29 Thread Daniel Naber
On Wednesday 29 September 2004 22:14, Doug Cutting wrote: Now that we have a patch for the memory leak problem, should we start a 1.4.2 branch? +1 I can try to do some of the work, but I'd need detailed instructions for branching and tagging. It's probably easier/better if you do those

Re: cvs commit: jakarta-lucene build.xml

2004-09-21 Thread Daniel Naber
On Monday 20 September 2004 20:16, [EMAIL PROTECTED] wrote: Add GCJ target. With ant gcj I get this error: [exec] gcj -O3 -ffast-math -fno-bounds-check -fno-store-check -c -I ../../build/classes/java -I ../../build/gcj -o ../../build/gcj/lucene-gcj.a

Re: Lucene 1.4.2?

2004-09-20 Thread Daniel Naber
On Sunday 19 September 2004 21:13, Otis Gospodnetic wrote: It would be good to take care of that memory leak issue that comes up when people use sorting. Dave Spencer found one Comparator or Map or something that looked suspicious. Yes, the Comparator WeakHashMap in FieldSortedHitQueue

Re: Lucene 1.4.2?

2004-09-20 Thread Daniel Naber
On Monday 20 September 2004 18:49, Doug Cutting wrote: To be clear, you are proposing that we branch from the 1.4.1 tag in CVS and re-apply the patches below? Yes, exactly. I also agree with Otis that we should wait until we have a good solution to the Sort-related memory leak issue before

Lucene 1.4.2?

2004-09-19 Thread Daniel Naber
lead to incorrect results (documents missing, others duplicated) if the sort keys were not unique and there were more than 100 matches. (Daniel Naber) -There was a compile problem with StandardTokenizer.jj because of an missing import. Opinions? Regards Daniel -- http

Re: [VOTE] David Spencer as Lucene Sandbox committer

2004-09-15 Thread Daniel Naber
On Wednesday 15 September 2004 01:44, Otis Gospodnetic wrote: And +1 from me. Dave has submitted several nice pieces of code over the years. +1 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail:

Re: cvs commit: jakarta-lucene/src/test/org/apache/lucene/queryParser TestQueryParser.java

2004-09-15 Thread Daniel Naber
On Tuesday 14 September 2004 15:46, [EMAIL PROTECTED] wrote: QueryParser can now handle minimumSimilarity parameter of FuzzyQuery A query like term~1.5 currently throws an IllegalArgumentException. What about throwing an ParseException instead? Regards Daniel

Re: FSDirectory.makeLock() declared final

2004-09-07 Thread Daniel Naber
On Thursday 02 September 2004 16:21, Mike Hearn wrote: You are probably likely to do this sooner than I am. Feel free. Here you go. Is it OK? I think so (besides one small error with removing the final from a local variable which I fixed). I just committed your patch. Regards Daniel --

Re: CJK Support for HTMLParser.jj

2004-09-07 Thread Daniel Naber
On Monday 23 August 2004 13:46, Joey Lawrance wrote: I've attached the HTMLParser.jj file that successfully parses Japanese HTML for indexing. Joey, thanks for the patch. When I compile it with ant javacc-HTMLParser I get this warning: Warning: Line 364, Column 3: Non-ASCII characters used

new DateTools class

2004-09-05 Thread Daniel Naber
Hi, some random notes about the newly added DateTools class: -it's named DateTools because it contains static methods only, so I think that name is okay -the public round() method can be used to limit date resolution without using the new readable format (which is slightly longer than that of

Re: API cleanup for Field

2004-09-01 Thread Daniel Naber
On Wednesday 01 September 2004 21:55, Doug Cutting wrote: That's right. In particular I think we'll need: public Field(String, Reader, Index); // Reader is never stored Actually you'll also get an exception when you try to index the field UN_TOKENIZED. I didn't check what exactly happens in

Re: API cleanup for Field

2004-08-30 Thread Daniel Naber
On Monday 30 August 2004 18:34, Doug Cutting wrote: If they're confusing and have a less-confusing alternative then we should eventually remove them from the API, so we should deprecate them now. We should move entirely to the new enumeration-based contructors. Everything else should be

Re: RemoteSearchable will not work anylonger, due to changes in BooleanClause

2004-08-29 Thread Daniel Naber
On Saturday 28 August 2004 17:16, Bernhard Messer wrote: Just change the constructor to: public static final class Occur implements java.io.Serializable { ... Thanks, I did that. However, TestRemoteSearchable doesn't work for me and it didn't even work before my changes. In Eclipse I get a

API cleanup for Field

2004-08-28 Thread Daniel Naber
Hi, here's a patch to clean up the API as described by Doug: http://www.mail-archive.com/lucene-user%40jakarta.apache.org/msg08479.html The Field constructor with three booleans is deprecated because it's too easy to mix up the order of those parameters. Also, one variation of Field.Text() is

Re: API cleanup: BooleanQuery.add()

2004-08-25 Thread Daniel Naber
On Monday 23 August 2004 19:38, Doug Cutting wrote: Does this mean we're limited in the changes that we can apply? So is it okay to deprecate (and later remove) the current constructor that takes two booleans and add one that takes the Occur object? Yes, I don't see a problem with that.

Re: API cleanup: BooleanQuery.add()

2004-08-21 Thread Daniel Naber
On Friday 20 August 2004 23:48, Doug Cutting wrote: I still prefer Occur.MUST, Occur.SHOULD and Occur.MUST_NOT. Okay. I noticed that my suggested patch only changes BooleanQuery, but BooleanClause is public, too. So it should be changed as well. However, it implements Serializable and I don't

API cleanup: BooleanQuery.add()

2004-08-19 Thread Daniel Naber
BooleanQuery.add() currently takes two boolean values. That's difficult to use as these two parameters can easily be confused. Also, there's a runtime error if one uses true for both parameters. Thus this method's API should be redesigned. It has been discussed here already:

Re: moving the analyzers into sandbox

2004-08-17 Thread Daniel Naber
On Tuesday 17 August 2004 14:45, Erik Hatcher wrote: src/java/org/apache/lucene/analysis/fr/FrenchStemmer.java:571: duplicate case label [javac] case '?': [javac] ^ ant compile works for me, probably because my system defaults to utf-8. But when I explicitly open

the future of DateField

2004-08-17 Thread Daniel Naber
Hi, as we all know, DateField currently leads to problems with range queries and prefix queries as it saves dates with millisecond precision. I suggest the following changes: -Deprecate everything currently in DateField -Add these methods to DateField: public static String

Re: the future of DateField

2004-08-17 Thread Daniel Naber
On Tuesday 17 August 2004 19:34, Damian Gajda wrote: That is why I had to move from dates represented the way Daniel suggests - to decimal integer numbers. This creates very ugly looking date strings but needs only 4bytes per term while sorting. That IS a memory advantage. What about using

Re: moving the analyzers into sandbox

2004-08-16 Thread Daniel Naber
On Sunday 15 August 2004 13:42, Otis Gospodnetic wrote: +1 for releasing at least some Sandbox components. Analyzers, Snowball Analyzers and Highligher at least. Is this something you can/want to do, Daniel? The analyzers are now in the sandbox, I'll leave it to our ant expert to

Re: moving the analyzers into sandbox

2004-08-16 Thread Daniel Naber
On Monday 16 August 2004 23:16, Erik Hatcher wrote: What is desired that currently isn't available with the top-level contributions directory build.xml? The build packages each project into a .tar.gz including javadocs, and runs unit tests too. I see. Then probably nothing is missing.

Re: [Jakarta Lucene Wiki] Updated: Lucene2Whiteboard

2004-08-16 Thread Daniel Naber
On Monday 16 August 2004 23:40, Bernhard Messer wrote: just looked at your changes you made on the whiteboard. You moved the callback interface idea to Other Changes. I think that such an implementation would raise a change in the current api. Maybe we can make the new code backward

Re: moving the analyzers into sandbox

2004-08-15 Thread Daniel Naber
On Saturday 14 August 2004 22:44, Erik Hatcher wrote: +1 on moving those analyzers out. Technically this means to remove the files from lucene's core and re-add them in sandbox, doesn't it? Or is there some special case for moving (e.g. contact cvs admin so he can move files on the server)?

bitwise OR in BooleanScorer

2004-08-14 Thread Daniel Naber
Hi, BooleanScorer's next() method uses a bitwise OR in a while loop: ...} while (bucketTable.first != null | more); Is there any reason for this, couldn't this just be || ? BTW, I found this with FindBugs (http://findbugs.sourceforge.net/), which takes a jar files and gives some useful

moving the analyzers into sandbox

2004-08-14 Thread Daniel Naber
Hi, any objections against moving the German and Russian analyzers into the sandbox? If not, I'd like to do that, but I'm not sure if we already agreed on doing so. The current situation with analyzers both in lucene's core and in the sandbox doesn't seem to make sense. I suggest that we

Re: optimized disk usage when creating a compound index

2004-08-08 Thread Daniel Naber
On Sunday 08 August 2004 15:16, Christoph Goller wrote: If compound files are used, Lucene needs up to 3 times the disk space (during indexing) that is required by the final index. Talking about compound index, there's a variable open in CompoundFileReader with the comment reference count but

IndexWriter.getUseCompoundFile is confusing

2004-08-06 Thread Daniel Naber
Hi, I open an index with create=false so I can use addIndexes() on that index. I want to use the existing setting for useCompoundFile of that index. But getUseCompoundFile() will always return true, as it just returns what one has set with setUseCompoundFile() or the default. I suggest to

commit messages

2004-08-02 Thread Daniel Naber
Hi, I did some commits yesterday but the commit messages were lost, so here are the links to my changes: fixed web page: http://cvs.apache.org/viewcvs.cgi/jakarta-lucene/xdocs/fileformats.xml?sortby=date fixed a test case:

Re: errors in file format description

2004-08-02 Thread Daniel Naber
On Monday 02 August 2004 15:37, Otis Gospodnetic wrote: Doug's the only one who updates the site, I believe. I've never done it (don't have account on the right machine, etc.) It's actually described here: http://www.apache.de/dev/committers.html#web My account works on that machine, too, but

Re: errors in file format description

2004-08-01 Thread Daniel Naber
On Monday 19 July 2004 19:54, Doug Cutting wrote: -A file named deletable contains the names of files that are no longer used by the index, but which could not be deleted. This is only generated on Win32, where a file may not be deleted while it is still open. -- Actually the file is also

declared exceptions that are never thrown

2004-08-01 Thread Daniel Naber
Hi, Eclipse 3.0 has a nice warning (off by default) that let's you know if a method has an exception in its throws clause which the method can never throw. This happens in several places in Lucene, for example: TokenStream: public void close() throws IOException {} CSInputStream: protected

Re: errors in file format description

2004-07-19 Thread Daniel Naber
On Monday 19 July 2004 19:54, Doug Cutting wrote: You are correct in all cases. Would you like to patch this? I'll fix it directly in CVS once my account has been created. Regards Daniel -- http://www.danielnaber.de - To

errors in file format description

2004-07-16 Thread Daniel Naber
Hi, I think there are some small inaccuracies on http://jakarta.apache.org/lucene/docs/fileformats.html: -The first sentence refers to Lucene 1.4, but the end of the second paragraph then refers to 1.3. -Term Vectors. For each field in each document, the term vector (sometimes called document

move analysis.de.WordlistLoader?

2004-07-10 Thread Daniel Naber
Hi, now that 1.4 is released, what about some small API cleanups? For example, analysis.de.WordlistLoader is misplaced, as it is not specific to German at all. It could be moved one level up (I can send a patch). Or should we rather make one big step and move all non-standard analyzers from

Re: build.xml test task

2004-04-07 Thread Daniel Naber
On Wednesday 07 April 2004 21:48, Erik Hatcher wrote: Do you have junit.jar in ANT_HOME/lib?  You should. What about this patch? With this, ant test complains with a useful message if junit is not found. Regards Daniel -- http://www.danielnaber.de Index: build.xml

  1   2   >