Re: How to order search results by Field value?

2004-03-26 Thread Erik Hatcher
On Mar 26, 2004, at 2:20 AM, Morus Walter wrote: Erik Hatcher writes: Why not do the unique sequential number replacement at index time rather than query time? how would you do that? This requires to know the ids that will be added in future. Let's say you start with strings 'a' and 'b'. Later you

Re: too many files open error

2004-03-26 Thread Erik Hatcher
. Would these recommendations work for this or should I upgrade to lucene 1.3. In doing so, I'm not sure if a rewrite of the docSearcher will be necessary or not. Daniel Naber wrote on 3/26/04: Try IndexWriter.setUseCompoundFile(true) to limit the number of files. Erik Hatcher 3/26/2004 2:32

Re: too many files open error

2004-03-26 Thread Erik Hatcher
On Mar 26, 2004, at 1:33 PM, Chad Small wrote: Is this :) serious? This is open-source. I'm only as serious as it would take for someone to push it through. I don't know what the timeline is, although lots of new features are available. Because we have a need/interest in the new field

Re: I downloaded latest 1.3 of lucene (lucene-1.3-final ), searched for setUse and only found setUseComp

2004-03-26 Thread Erik Hatcher
(true) to limit the number of files. Erik Hatcher 3/26/2004 2:32:16 AM If you are using Lucene 1.3, try using the index in compound format. You will have to rebuild (or convert) your index to this format. The handy utility Luke will convert an index easily. Erik On Mar 25, 2004, at 9:34 PM

Re: too many files open error

2004-03-26 Thread Erik Hatcher
On Mar 26, 2004, at 7:20 PM, Kevin A. Burton wrote: Chad Small wrote: Is this :) serious? Because we have a need/interest in the new field sorting capabilities URL to documentation for field sorting? Geez, you want documentation also? :) Try the JUnit test cases for starters. That is the

Re: Documentation and presentations

2004-03-26 Thread Erik Hatcher
So far so good, Stephane, on the wiki changes - looks good! As for our book - at this point, early summer seems like when it'll actually be on the shelves. By the end of April we should have mostly everything complete, reviewed, and entirely in the publishers hands. *ugh* - this process

Re: Documentation and presentations

2004-03-26 Thread Erik Hatcher
On Mar 26, 2004, at 8:16 PM, Stephane James Vaucher wrote: Erik, maybe Otis and yourself should slow down on development. You wouldn't want your book to discuss lucene-1.3 if you release a version 1.5 before it hits the stores... unless that's your master plan;) It will cover the new Lucene 1.4

Re: How to order search results by Field value?

2004-03-25 Thread Erik Hatcher
Why not do the unique sequential number replacement at index time rather than query time? Erik On Mar 25, 2004, at 6:26 PM, Eric Jain wrote: I will need to have a look at the code, but I assume that in principal it should be possible to replace the strings with sequential integers once the

Re: Zero hits for queries ending with a number

2004-03-24 Thread Erik Hatcher
On Mar 24, 2004, at 5:58 PM, Morris Mizrahi wrote: I think the custom analyzer I created is not properly doing what a KeywordAnalyzer would do. Erik, could you please post what KeywordAnalyzer should look like? It should simply tokenize the entire input as a single token. Incze Lajos posted a

Re: Query syntax on Keyword field question

2004-03-23 Thread Erik Hatcher
QueryParser and Field.Keyword fields are a strange mix. For some background, check the archives as this has been covered pretty extensively. A quick answer is yes you can use MFQP and QP with keyword fields, however you need to be careful which analyzer you use. PerFieldAnalyzerWrapper is a

Re: Final Hits

2004-03-22 Thread Erik Hatcher
How exactly would you take advantage of a subclassable Hits class? On Mar 21, 2004, at 6:01 AM, Terry Steichen wrote: Does anyone know why the Hits class is final (thus preventing it from being subclassed)? Regards, Terry -

Re: Final Hits

2004-03-22 Thread Erik Hatcher
removing the final attribute(s)? Regards, Terry - Original Message - From: Erik Hatcher [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, March 22, 2004 7:06 AM Subject: Re: Final Hits How exactly would you take advantage of a subclassable Hits class? On Mar 21, 2004

Re: CJK Analyzer indexing japanese word document

2004-03-16 Thread Erik Hatcher
On Mar 16, 2004, at 8:39 PM, [EMAIL PROTECTED] wrote: My experience tells me that CJKAnalyzer needs to be improved somehow For example, single word X* search works perfectly, however, multiple words wildcard XX* never works. Well, in this case it is QueryParser, not the analyzer, as the

Re: phrases

2004-03-16 Thread Erik Hatcher
Try setting the slop factor on your phrase query. This should accomplish what you want. Set it to something like 10 and see what you get. Erik On Mar 16, 2004, at 8:55 PM, Supun Edirisinghe wrote: I have a field called buisnessname and this field contains keywords like Georgian House

Re: UNIX command-line indexing script?

2004-03-15 Thread Erik Hatcher
Have a look at the Ant index task in the Lucene sandbox. You're on your own, currently, to build this and understand it, but I use it frequently. In fact, the sample index from our book is generated with this: index index=${build.dir}/index

Re: Date Range and proximity search

2004-03-14 Thread Erik Hatcher
To be honest, I'm way out of the loop of the demo and needs to be re-written. It is on my to-do list! But, date range and proximity searches most definitely work. Can you be more specific about what you index and how you searched? Perhaps even a working test case? Erik On Mar 14, 2004,

Re: Date Range and proximity search

2004-03-14 Thread Erik Hatcher
: Erik Hatcher [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Sunday, March 14, 2004 4:00 PM Subject: Re: Date Range and proximity search To be honest, I'm way out of the loop of the demo and needs to be re-written. It is on my to-do list! But, date range and proximity searches most

Re: Zero hits for queries ending with a number

2004-03-13 Thread Erik Hatcher
On Mar 13, 2004, at 6:02 AM, Morus Walter wrote: Otis Gospodnetic writes: Field.Keyword is suitable for storing data like Url. Give that a try. Hmm. I don't think keyword fields can be used with query parser, which is probably one of the problems here. He did try keyword fields. Look in the

Re: DocumentWriter, StopFilter should use HashMap... (patch)

2004-03-11 Thread Erik Hatcher
, 2004, at 12:04 PM, Doug Cutting wrote: Erik Hatcher wrote: Yes, I saw it. But is there a reason not to just expose HashSet given that it is the data structure that is most efficient? I bought into Kevin's arguments that it made sense to just expose HashSet. Just the general principal that one

Re: DocumentWriter, StopFilter should use HashMap... (patch)

2004-03-11 Thread Erik Hatcher
is important. Erik On Mar 11, 2004, at 5:22 PM, Kevin A. Burton wrote: Erik Hatcher wrote: I will refactor again using Set with no copying this time (except for the String[] and Hashtable) constructors. This was my original preference, but I got caught up in the arguments by Kevin and lost my

Re: Retrieving sections of a document

2004-03-11 Thread Erik Hatcher
I would think your best bet is to index each section as a separate Document, with a field that refers to the HTML file itself somehow. Erik On Mar 11, 2004, at 7:43 PM, Ashwin Shripathi Raj wrote: Hi, I have a large HTML document broken up into sections. On a search, I need to retrieve only

Re: DocumentWriter, StopFilter should use HashMap... (patch)

2004-03-10 Thread Erik Hatcher
On Mar 9, 2004, at 10:23 PM, Kevin A. Burton wrote: You need do make it a HashSet: table = new HashSet( stopTable.keySet() ); Done. Also... while you're at it... the private variable name is 'table' which this HashSet certainly is *not* ;) Well, depends on your definition of 'table' I suppose

Re: DocumentWriter, StopFilter should use HashMap... (patch)

2004-03-10 Thread Erik Hatcher
On Mar 10, 2004, at 2:59 PM, Kevin A. Burton wrote: I refuse to expose HashSet... sorry! :) But I did wrap what is passed in, like above, in a HashSet in my latest commit. Hm... You're doing this EVEN if the caller passes a HashSet directly?! Well it was in the ctor. But I guess I'm not seeing

Re: 1.3-final builds as 1.4-rc1-dev?

2004-03-10 Thread Erik Hatcher
It means we screwed up the timing somehow and changed the build file version after we built the binary version, is my guess. We'll be more careful with the 1.4 release and make sure this doesn't happen then. Erik On Mar 10, 2004, at 8:34 PM, Jeff Wong wrote: Hello, I noticed that Lucene

Re: 1.3-final builds as 1.4-rc1-dev?

2004-03-10 Thread Erik Hatcher
On Mar 10, 2004, at 9:45 PM, Doug Cutting wrote: Jeff Wong wrote: I noticed that Lucene 1.3-final source builds a JAR file whose version number is 1.4-rc1-dev. What does this mean? Will 1.4-final build as 1.5-rc1-dev? Probably. If you modify the sources of a 1.3-final release, and build them,

Re: DocumentWriter, StopFilter should use HashMap... (patch)

2004-03-10 Thread Erik Hatcher
On Mar 10, 2004, at 10:28 PM, Doug Cutting wrote: Erik Hatcher wrote: Also... you're HashSet constructor has to copy values from the original HashSet into the new HashSet ... not very clean and this can just be removed by forcing the caller to use a HashSet (which they should). I've caved

Re: DocumentWriter, StopFilter should use HashMap... (patch)

2004-03-09 Thread Erik Hatcher
not recompile your own source code against a new Lucene JAR so I will simply provide another signature too. Erik On Mar 9, 2004, at 4:15 AM, Kevin A. Burton wrote: Erik Hatcher wrote: I don't see any reason for this to be a Hashtable. It seems an acceptable alternative to not share analyzer

Re: DocumentWriter, StopFilter should use HashMap... (patch)

2004-03-09 Thread Erik Hatcher
Kevin - I've made this change and committed it, using a Set. Let me know if there are any issues with what I've committed - I believe I've faithfully preserved backwards compatibility. Erik p.s. ... On Mar 9, 2004, at 2:00 PM, Kevin A. Burton wrote: public StopFilter(TokenStream in,

Re: Caching and paging search results

2004-03-08 Thread Erik Hatcher
In the RealWorld... many applications actually just re-run a search and jump to the appropriate page within the hits searching is generally plenty fast enough to alleviate concerns of caching. However, if you need to cache Hits, you need to be sure to keep around the originating

Re: Filtering out duplicate documents...

2004-03-08 Thread Erik Hatcher
My impression is the new term vector support should at least make this type of comparison feasible in some manner. I'd be interested to see what you come up with if you give this a try. You will need the latest CVS codebase. Erik On Mar 8, 2004, at 4:37 PM, Michael Giles wrote: I'm

Re: DocumentWriter, StopFilter should use HashMap... (patch)

2004-03-08 Thread Erik Hatcher
I don't see any reason for this to be a Hashtable. It seems an acceptable alternative to not share analyzer/filter instances across threads - they don't really take up much space, so is there a reason to share them? Or I'm guessing you're sharing it implicitly through an IndexWriter, huh?

Re: Lucene Taglib

2004-03-08 Thread Erik Hatcher
find quite enjoyable and refreshing. Your taglib is a nicely done. Erik Regards, Iskandar - Original Message - From: Erik Hatcher [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, March 08, 2004 7:48 PM Subject: Re: Lucene Taglib On Mar 8, 2004, at 3:46 AM

Re: Storing numbers

2004-03-07 Thread Erik Hatcher
On Mar 7, 2004, at 6:27 AM, [EMAIL PROTECTED] wrote: On Fri, 5 Mar 2004 19:18:04 -0500, Erik Hatcher [EMAIL PROTECTED] wrote: Thanks for the idea for a good example for the upcoming Lucene in Action book... it's been added! Thanks for mentioning me in the book ;) Well, I actually already had

Re: using lucene to search in a 1 huge file. (aka grep -n)

2004-03-06 Thread Erik Hatcher
On Mar 6, 2004, at 1:17 AM, prasen wrote: Any tutorial/samples on how to use indices, and use them in your search ? Sure, tons. See the articles/resources section of the Lucene website. Otis has written several. I've written a few articles at java.net on Lucene. And there are a handful of

Re: Lucene Search Taglib

2004-03-06 Thread Erik Hatcher
I, too, gave up on the sandbox taglib. I apologize for even committing it without giving it more of a workout. I gave a good effort to fix it up a couple of months ago, but there was more work to do than I was willing to put in. I have not heard from the original contributor, and I

Re: Query: A ? B

2004-03-05 Thread Erik Hatcher
Actually a slop of 1 does guarantee order... it is either an exact match or 1 term off. It takes a slop of 2 or greater for reverse order matches. But it is not exactly 1 term off, which is what Jochen wants. *shrug* Erik On Mar 4, 2004, at 6:22 PM, Otis Gospodnetic wrote: Ah, sorry, I

Re: Query validation in web app

2004-03-05 Thread Erik Hatcher
Kelvin, In what scenarios does QueryParser fail without throwing a ParseException? I think we should fix those cases to ensure a ParseException is thrown. Erik On Mar 5, 2004, at 3:21 AM, Kelvin Tan wrote: Lucene reacts pretty badly to non-wellformed queries, not throwing a

Re: Query validation in web app

2004-03-05 Thread Erik Hatcher
:18:29 -0500, Erik Hatcher said: Kelvin, In what scenarios does QueryParser fail without throwing a ParseException? I think we should fix those cases to ensure a ParseException is thrown. Erik Sorry, my bad. Was it ever throwing Errors? Probably not, but somehow I had the impression

Re: Storing numbers

2004-03-05 Thread Erik Hatcher
Terms in Lucene are text. If you want to deal with number ranges, you need to pad them. 0001 for example. Be sure all numbers have the same width and zero padded. Lucene use lexicographical ordering, so you must be sure things collate in this way. Erik On Mar 5, 2004, at 11:46

Re: Storing numbers

2004-03-05 Thread Erik Hatcher
(oh, say through a common function :). In fact, this is a great example for LIA. I'll add it! And I'll post the code back here in a day or so after I write it. Erik On Mar 5, 2004, at 12:34 PM, [EMAIL PROTECTED] wrote: On Friday 05 March 2004 18:01, Erik Hatcher wrote: 0001

Re: Storing numbers

2004-03-05 Thread Erik Hatcher
On Mar 5, 2004, at 4:16 PM, Erik Hatcher wrote: Another quite cool option is to subclass QueryParser, and override getRangeQuery. Do the padding there. This will allow users to type in normal looking numbers, and the padding happens automatically. You'll need to be sure that numbers padded

Re: Query: A ? B

2004-03-04 Thread Erik Hatcher
Right Otis was confused by what you were asking. Google supports what you are asking for, I believe, although I don't recall if an '*' indicates one or more or just one. As far as I know, there is no easy way to do the exact distance like you desire. You could always clone the PhraseQuery

Re: Sys properties Was: java.io.tmpdir as lock dir .... once again

2004-03-03 Thread Erik Hatcher
On Mar 3, 2004, at 4:25 PM, hui wrote: Anoterh similar issue. If we could have a parameter to control the max number of the files within the index, that is going to avoid the problem of running of the file handler issue. When the file number within one index reaches the limit, optimization is

Re: FuzzyQuery info

2004-03-02 Thread Erik Hatcher
On Mar 2, 2004, at 1:23 PM, Supun Edirisinghe wrote: now, one more question: what are the big performance hits from using a FuzzyQuery. what are some bad cases to use it(eg. many words in the phrase? long strings? ) would it be better to read up on the Levenshtein algorithm or to get into the

Re: FuzzyQuery info

2004-03-01 Thread Erik Hatcher
On Mar 1, 2004, at 7:05 PM, Supun Edirisinghe wrote: is there any documentation on FuzzyQuery or articles written on it? ( I mean besides the API pages.) I cover it a little in this article: http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html

Re: Indexing multiple instances of the same field for each document

2004-02-29 Thread Erik Hatcher
What you are doing is really the job of an Analyzer. You are doing pre-analysis, when instead you could do all of this within the context of a custom analyzer and avoid many of these issues altogether. Do you use the XML only during indexing? If so, you could bypass the whole conversion to

Lucene Wiki

2004-02-28 Thread Erik Hatcher
Lucene's wiki has been migrated to: http://wiki.apache.org/jakarta-lucene The old content was migrated to http://wiki.apache.org/jakarta-lucene/LuceneProjectPages Doug has gotten on the wiki bandwagon with Nutch also: http://www.nutch.org/cgi-bin/twiki/view/Main/Nutch I've started PoweredBy

Re: Indexing multiple instances of the same field for each docume nt

2004-02-28 Thread Erik Hatcher
On Feb 28, 2004, at 5:38 PM, Moray McConnachie (OA) wrote: - Original Message - I guess the best way to handle this problem, other than getting the application to transform values prior to query or indexing, is actually to tokenize the field after all, but use the same KeywordAnalyzer to

Re: Indexing multiple instances of the same field for each document

2004-02-27 Thread Erik Hatcher
On Feb 27, 2004, at 5:16 AM, Moray McConnachie wrote: I note from previous entries on the mailing list and my own experiments that you can add many entries to the same field for each document. Example: a given document belongs to more than one product, ergo I index the product field with values

Re: CJK Analyzer in lucene 1.3 final

2004-02-27 Thread Erik Hatcher
On Feb 27, 2004, at 7:12 AM, Ankur Goel wrote: Hi, In the lucene-1.3-final version's CHANGES.txt it is written that Fix StandardTokenizer's handling of CJK characters (Chinese, Japanese and Korean ideograms). Does it mean that for CJK characters we now do not need to use any separate analyzer,

Re: Indexing multiple instances of the same field for each docume nt

2004-02-27 Thread Erik Hatcher
On Feb 27, 2004, at 10:00 AM, Moray McConnachie wrote: Are you using QueryParser? Try using a TermQuery(product, PROD_A) when indexing as a Keyword and see what you get. If that finds it, then you are suffering from analysis paralysis. QueryParser, Keyword fields, and analyzers are a very

Re: Indexing multiple instances of the same field for each document

2004-02-27 Thread Erik Hatcher
Roy, On Feb 27, 2004, at 12:12 PM, Roy Klein wrote: Document doc = new Document(); doc.add(Field.Text(contents, the)); Changing these to Field.Keyword gets it to work. I'm delving a little bit to understand why, but it seems if you are adding words individually anyway you'd

Re: Indexing multiple instances of the same field for each document

2004-02-27 Thread Erik Hatcher
of troubleshooting but haven't figured it out yet. Something in DocumentWriter I presume. Erik Roy -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Friday, February 27, 2004 2:12 PM To: Lucene Users List Subject: Re: Indexing multiple instances of the same field

Re: Indexing multiple instances of the same field for each document

2004-02-27 Thread Erik Hatcher
On Feb 27, 2004, at 6:17 PM, Doug Cutting wrote: I think it's document.add(). Fields are pushed onto the front, rather than added to the end. Ah, ok DocumentFieldList/DocumentFieldEnumeration are the culprits. This is certainly a bug. With things going in reverse order as they are now, a

Re: Field boosting Was: Indexing multiple instances of the same field for each document

2004-02-27 Thread Erik Hatcher
in the phrase, the other document matches the phrase query. Roy -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Friday, February 27, 2004 4:34 PM To: Lucene Users List Subject: Re: Indexing multiple instances of the same field for each document On Feb 27, 2004

Re: segments question

2004-02-25 Thread Erik Hatcher
On Feb 25, 2004, at 4:01 PM, sam xia wrote: Or should I build the whole thing into one big segment and use the filter to do this. There is a DateFilter. Is there a way to implement a category filter? What is the best way to accomplish this? I'd recommend a pool of filters for each category.

Re: segments question

2004-02-25 Thread Erik Hatcher
On Feb 25, 2004, at 7:58 PM, sam xia wrote: I'd recommend a pool of filters for each category. Regenerate them when the index changes, otherwise leave the instances alive and reuse them for queries - this will speed things up pretty dramatically I'd guess. There is a QueryFilter you could use,

Re: Did you mean...

2004-02-17 Thread Erik Hatcher
On Feb 17, 2004, at 6:53 AM, [EMAIL PROTECTED] wrote: On Monday 16 February 2004 20:56, Erik Hatcher wrote: On Feb 16, 2004, at 9:50 AM, [EMAIL PROTECTED] wrote: TokenStream in = new WhitespaceAnalyzer().tokenStream(contents, new StringReader(doc.getField(contents).stringValue())); The field

Re: Did you mean...

2004-02-17 Thread Erik Hatcher
On Feb 17, 2004, at 9:58 AM, [EMAIL PROTECTED] wrote: On Tuesday 17 February 2004 15:18, Erik Hatcher wrote: You would do them separately. I'm not clear on what you are trying to do. The Analyzer does all this during indexing automatically for you, but it sounds like you are just trying

Re: Did you mean...

2004-02-17 Thread Erik Hatcher
On Feb 17, 2004, at 11:39 AM, [EMAIL PROTECTED] wrote: On Tuesday 17 February 2004 16:13, Erik Hatcher wrote: The words (or terms) are already in the index ready to be read very rapidly and accurately. IndexReader is what you want to investigate if your fields are indexed. Look

Re: Did you mean...

2004-02-16 Thread Erik Hatcher
On Feb 16, 2004, at 6:12 AM, [EMAIL PROTECTED] wrote: On Monday 16 February 2004 12:02, Viparthi, Kiran (AFIS) wrote: As mentioned I didn't use any information from index so I didn't uses any TokenStream but let me check it out. deprecated: String description =

Re: Did you mean...

2004-02-16 Thread Erik Hatcher
On Feb 16, 2004, at 7:59 AM, [EMAIL PROTECTED] wrote: On Monday 16 February 2004 12:40, Erik Hatcher wrote: On Feb 16, 2004, at 6:12 AM, [EMAIL PROTECTED] wrote: String description = doc.getField(contents).stringValue(); What is the value of description here? ? The value of the field contents

Re: Did you mean...

2004-02-16 Thread Erik Hatcher
On Feb 16, 2004, at 9:50 AM, [EMAIL PROTECTED] wrote: Can somebody explain tokenStream() to me? You are now venturing under the covers of Lucene's API. This is where I give the sage advice to get the Lucene source code and surf around it a bit. (It helps to have a nice IDE where you can click

Re: Did you mean...

2004-02-16 Thread Erik Hatcher
On Feb 16, 2004, at 10:34 AM, [EMAIL PROTECTED] wrote: On Monday 16 February 2004 15:16, Erik Hatcher wrote: And thus the nature of the problem. Try using the WhitespaceAnalyzer instead to see what you get. Can I chain multiple analyzer in order to filter common stop words? You cannot chain

Re: Word not in index

2004-02-16 Thread Erik Hatcher
Timo, You are asking a lot of good questions, but also questions for which answers already exist. Just dig a little deeper and you will see. Have a look at my java.net article (titled Lucene Intro) and you will find utility code that hilights how analyzers work. Tinker with that a bit,

Re: Field Reindex Question

2004-02-15 Thread Erik Hatcher
You must remove and re-add the entire document to perform an update. Such is the (current) nature of Lucene. Erik On Feb 15, 2004, at 10:25 PM, Tim Walters wrote: Hi, I'm thinking of using Lucene in an application that might change the field data without modifying the document. It would be

Re: Limiting hit count

2004-02-13 Thread Erik Hatcher
On Feb 13, 2004, at 7:02 AM, [EMAIL PROTECTED] wrote: On Friday 13 February 2004 12:18, Julien Nioche wrote: If you want to limit the set of Documents you're querying, you should consider using Filter objects and send it to the searcher along with your Query. Hm, hard to find information about

Re: Limiting hit count

2004-02-13 Thread Erik Hatcher
On Feb 13, 2004, at 9:12 AM, [EMAIL PROTECTED] wrote: On Friday 13 February 2004 15:02, Erik Hatcher wrote: Use a HitCollector and grab the first one that comes in, then bail out. That should do the trick for getting the first hit only. According to the API docs I ought to use HitCollector only

Re: spans directory in the CVS version

2004-02-11 Thread Erik Hatcher
On Feb 11, 2004, at 5:00 AM, Nicolas Maisonneuve wrote: hy, recently, there is a new subdirectory spans in the search directory. what is it and how use it ? Have a look at the test cases which use the new features, and also see the CHANGES file which mentions it. Erik

Re: ANNOUNCE: Plucene

2004-02-11 Thread Erik Hatcher
In this case, I'd recommend calling out to a Lucene, CLucene, or PLucene. Sam Ruby plugged it into his Perl-based blog like this: http://radio.weblogs.com/0101679/stories/2002/08/13/ luceneSearchFromBlosxom.html On Feb 11, 2004, at 6:23 PM, [EMAIL PROTECTED] wrote: Hi! Somewhat off-topic:

Re: Newbie: PerFieldAnalyzerWrapper or Build a dynamic BooleanQuery

2004-02-08 Thread Erik Hatcher
On Feb 8, 2004, at 11:13 AM, David Black wrote: Let's assume I have an object that is composed of the following fields... UID: 434 (Keyword/Stored) TITLE: Java For Dum Dums (Text/Stored) AUTHOR: Fred Smith - Text/Stored DESCRIPTION: This would be a big long field -

Re: Search Refinement Approaches

2004-02-07 Thread Erik Hatcher
On Feb 7, 2004, at 5:32 PM, Ramy Hardan wrote: Is there an efficient way to search refinement preferably without losing the Hits class? I'm not quite following your Filter questions, but QueryFilter seems to fit the bill for what you are trying to do. Just keep around the previous query, and

Re: Query question

2004-02-06 Thread Erik Hatcher
); System.out.println(Key : + result.get(value) + Desc: + result.get(name)) ; } System.out.println(Finished Search: +hits.length()); } Thanks in advance, Justin -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Thursday, February 05, 2004 6:34 PM

Re: Newbie Phrase Query question

2004-02-05 Thread Erik Hatcher
On Feb 5, 2004, at 8:19 PM, Scott Smith wrote: There is a minor issue I found that I think works as documented, but wonder why it's that way. If you enter a search string that's a hyphenated word such as fred-bill (w/o the quotes), the QueryParser generates a search string to find all documents

Re: [newbie] Hit quality rating

2004-02-04 Thread Erik Hatcher
On Feb 4, 2004, at 9:07 AM, [EMAIL PROTECTED] wrote: On Wednesday 04 February 2004 14:48, Otis Gospodnetic wrote: There is score. Oops, you are right Hits.score(). But it seems I have to implement a sorting iterator on my own :-\ Well, the original design is to have hits sorted by score you

Re: Lucene Book

2004-02-04 Thread Erik Hatcher
On Feb 4, 2004, at 12:21 PM, William W wrote: Hi Erik, How is the book ? ;) William. :) Otis and I are burning the midnight oil to get this thing done as soon as possible. We are probably 3/4 done with the manuscript. We've been through one review cycle. The bulk should be done by the end of

Lucene 1.3 FINALly mirrored

2004-02-03 Thread Erik Hatcher
Doug asked me to take care of the logistics of pushing the Lucene 1.3 FINAL release to the Apache site properly so that it is mirrored worldwide. Last weekend I said the right magic incantations and it looks like it has been successful. So, without further ado, Lucene 1.3 is now completely

Re: Lucene 1.3 FINALly mirrored

2004-02-03 Thread Erik Hatcher
On Feb 3, 2004, at 7:12 AM, Erik Hatcher wrote: Doug asked me to take care of the logistics of pushing the Lucene 1.3 FINAL release to the Apache site properly so that it is mirrored worldwide. Last weekend I said the right magic incantations and it looks like it has been successful. So

Re: Newbie Phrase Query question

2004-02-03 Thread Erik Hatcher
The best suggestion I have is to look at the code in my first java.net article (Intro Lucene) and borrow the Analyzer utility code to see what happens to a sample string as it is analyzed. Then pass that same string to QueryParser (along with the same analyzer) and see what the

Re: SQLDirectory

2004-02-01 Thread Erik Hatcher
On Feb 1, 2004, at 6:16 AM, [EMAIL PROTECTED] wrote: There was some third-party SQLDirectory for lucene 1.2 which was abandoned for a matter of performance. Well, why not loading the index into RAM? Is there some (official) SQLDirectory for 1.3? If you look back in the list archives a few weeks

Re: HTMLDocument

2004-02-01 Thread Erik Hatcher
On Feb 1, 2004, at 6:19 AM, [EMAIL PROTECTED] wrote: Hi! Is there any HTMLDocument out there? The one in the demo package of lucene does not handle non-wellformed HTML files (what about nekohtml?) and seems to have some other inabilities and bugs as well (and why isn't it part of the distro

Re: Date Range support

2004-02-01 Thread Erik Hatcher
On Jan 29, 2004, at 5:08 AM, tom wa wrote: I'm trying to create an index which can also be searched with date ranges. My first attempt using the Lucene date format ran in to trouble after my index grew and I couldn't search over more than a few days. I saw some other posts explaining why this

Re: selective-field and score

2004-01-30 Thread Erik Hatcher
If you know the document id, you can use IndexSearcher.explain() (you could do a TermQuery to find it to get the number, or get to it more directly through IndexReader perhaps). You are affecting the score by adding more to the query as the score is based on the query itself. Erik On Jan

Re: Japanese Analyzer

2004-01-30 Thread Erik Hatcher
On Jan 29, 2004, at 1:45 PM, Otis Gospodnetic wrote: --- Weir, Michael [EMAIL PROTECTED] wrote: Is the CJKAnalyzer the best to use for Japanese? If not, which is? If so, from where can I download it? There is also a ChineseTokenizer/Analyzer in the sandbox as well. It may have value for

Re: Japanese Analyzer

2004-01-30 Thread Erik Hatcher
On Jan 29, 2004, at 1:45 PM, Otis Gospodnetic wrote: --- Weir, Michael [EMAIL PROTECTED] wrote: Is the CJKAnalyzer the best to use for Japanese? If not, which is? If so, from where can I download it? There is also a ChineseTokenizer/Analyzer in the sandbox as well. It may have value for

Re: Performance difference between 1.2 and 1.3?

2004-01-29 Thread Erik Hatcher
On Jan 29, 2004, at 9:00 AM, Weir, Michael wrote: I am fairly new to Lucene and I have noticed a difference between Lucene 1.2RC1 (which came with our build of Cocoon) and the new Lucene 1.3Final. I am indexing about 400 very small documents, each in 10 languages. The document contents are

Re: Paid support for Lucene

2004-01-29 Thread Erik Hatcher
and eHatcher Solutions would be happy to as well :)) On Jan 29, 2004, at 12:16 PM, Ryan Ackley wrote: I know of two: http://superlinksoftware.com http://jboss.org - Original Message - From: Boris Goldowsky [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, January 29, 2004 12:04

Re: Paid support for Lucene

2004-01-29 Thread Erik Hatcher
On Jan 29, 2004, at 1:56 PM, Dror Matalon wrote: On Thu, Jan 29, 2004 at 01:46:12PM -0500, Erik Hatcher wrote: and eHatcher Solutions would be happy to as well :)) Recommended. Eric knows Lucene well and is very responsive. That should read very expensive :)) But we all know you get what you pay

Re: use Lucene LOCAL (looking for a frontend)

2004-01-28 Thread Erik Hatcher
Lucene is a Java API, and can be used within any type of Java program (command-line, web, etc). It is up to you as the developer embedding Lucene to put whatever kind of interface you want on it. To index local files leverage some of the code I have put in my java.net articles, or use the Ant

Re: use Lucene LOCAL (looking for a frontend)

2004-01-28 Thread Erik Hatcher
On Jan 28, 2004, at 9:01 AM, Sebastian Fey wrote: How you present the search results will be up to you and the needs of your project. ive NO experience with java. it would be nice to see an example of a webinterface, that implements lucene to have something to start with. No offense intended at

Re: QueryParser and escaped characters

2004-01-27 Thread Erik Hatcher
Your escape character *is* working to pass it through the parser into the analyzer. It is the analyzer that is splitting at the dash. Phrases get analyzed too. Erik p.s. I wish I had a nickel for every Lucene issue that boils down to QueryParser or Analyzer misunderstanding. :) The two

Re: arrays of values in a field

2004-01-27 Thread Erik Hatcher
On Jan 27, 2004, at 2:27 PM, Gabe wrote: If I have a group of documents and I want to filter on a category, it is fairly straightforward. I just create a Field that contains the category and filter on it. However, what if I want the field category to have multiple possible values? Is there a known

Re: Using Russian analyzer in Luke

2004-01-25 Thread Erik Hatcher
On Jan 24, 2004, at 6:44 PM, Pasha Bizhan wrote: Luke use default ctor for Analyser, but Russian Analyser doesn't contain it. And German Analyser too - try Luke and the error will be the same. You can add this code into RussianAnalyzer.java and enjoy: public RussianAnalyzer() { this.charset =

Re: Using Russian analyzer in Luke

2004-01-25 Thread Erik Hatcher
On Jan 25, 2004, at 2:53 PM, Pasha Bizhan wrote: Hi, I'm not sure that's rightly. Because Russian unicode charset, KOI charset and win1251 charset is equal in use. May be unicode charset is less common. I guess so Russian Analyser hasn't no-arg constructor. Pasha - my apologies, but I'm not

Re: Using Russian analyzer in Luke

2004-01-25 Thread Erik Hatcher
On Jan 25, 2004, at 5:36 PM, Pasha Bizhan wrote: My code is only example and RussianCharsets.RussianUnicode too. We use RussianCharsets.CP1251. But other people can use other charset. I think that Russian Analyser must not has no-arg constructor. The choise of default charset is not evident. The

Re: HTML tagged terms boosting...

2004-01-21 Thread Erik Hatcher
It definitely cannot be done with custom token types. You're probably aiming for field-specific boosting, so you will need to parse the HTML into separate fields and use a multi-field search approach. I'm sure there are other tricks that could be used for boosting, like inserting the words

Re: Query Term Questions

2004-01-21 Thread Erik Hatcher
On Jan 20, 2004, at 10:22 AM, Terry Steichen wrote: 1) Is there a way to set the query boost factor depending not on the presence of a term, but on the presence of two specific terms? For example, I may want to boost the relevance of a document that contains both iraq and clerics, but not

Re: Query Term Questions

2004-01-21 Thread Erik Hatcher
On Jan 21, 2004, at 10:01 AM, Terry Steichen wrote: But doesn't the query itself take this into account? If there are multiple matching terms then the overlap (coord) factor kicks in. TS==Except that I'd like to be able to choose to do this on a query-by-query basis. In other words, it's

Re: Query Term Questions

2004-01-21 Thread Erik Hatcher
On Jan 21, 2004, at 4:21 PM, Terry Steichen wrote: PS: Is this in the docs? If not, maybe it should be mentioned. Depends on what you consider the docs. I looked at QueryParser.jj to see what it parses. Also, on http://jakarta.apache.org/lucene/docs/queryparsersyntax.html it has an example of

Re: difference in javadoc and faq similarity expression

2004-01-19 Thread Erik Hatcher
On Jan 19, 2004, at 5:03 AM, Nicolas Maisonneuve wrote: i have a report to write about lucene and i don't know what formula write in the paper and how explain it Ultimately the answer lies within the code itself - as we all know documentation and FAQ's can easily become out of sync from the

<    1   2   3   4   5   6   7   8   9   >