RE: Optimize completely in memory with a FSDirectory?

2006-04-06 Thread Max Pfingsthorn
Hi, Thanks for your suggestion. I thought about the same, but somehow it didn't seem like such a good idea... Now that I think about it, it would take the same IO load (in terms of flushing many megabytes to disk) as optimizing in memory with the FSDirectory. Another weird thing we observed

nested phrase queries

2006-04-06 Thread Michael Dodson
Can phrase queries be nested the same way boolean queries can be nested? I want a user query to be translated into a boolean query (say, x AND (y OR z)), and I want those terms to be within a certain distance of each other (approximately within the same sentence, so the slop would be

Re: nested phrase queries

2006-04-06 Thread Erik Hatcher
On Apr 6, 2006, at 8:47 AM, Michael Dodson wrote: Can phrase queries be nested the same way boolean queries can be nested? Yes... using SpanNearQuery instead of PhraseQuery. I want a user query to be translated into a boolean query (say, x AND (y OR z)), and I want those terms to be

Re: nested phrase queries

2006-04-06 Thread mark harwood
The XMLQueryParser in the contrib section also handles Spans (as well as a few other Lucene queries/filters not represented by the standard QueryParser). Here's an example of a complex query from the JUnit test ?xml version=1.0 encoding=UTF-8? SpanOr fieldName=contents SpanNear slop=8

*easy* way to perform range searches on numeric values

2006-04-06 Thread Bill Snyder
Hello, How can I configure Lucene to handle numeric range searches? (This question has been asked 100 times, I'm sure.) I've tried the suggestions on the SearchNumericalFields wiki page. This seems to work for simple queries. Searching for line:[1 to 10] gives me lines 1 thru 10 of the

Multiple Indexes Search

2006-04-06 Thread Yang Sun
Hi, Just wondering if there is anyway to search two indexes with relations like in the relational database. For example, in index1 there are fields pid and content. in index2 there are fields cid, record, and pid. I want to search keyword1 in content and keyword2 in record and they should

Re: StopAnalyzer and apostrophes

2006-04-06 Thread Marvin Humphrey
I wrote: It looks like StopAnalyzer tokenizes by letter, and doesn't handle apostrophes. So, the input I don't know produces these tokens: don t know Is that right? It's not right. StopAnalyzer does tokenize letter by letter, but 't' is a stopword, so the tokens are:

DateField vs DateTools

2006-04-06 Thread John Smith
Hi We are in the process of upgrading Lucene from 1.2 to 1.9. There used to be 2 methods in DateField.java in 1.2 public static String MIN_DATE_STRING() public static String MAX_DATE_STRING() This basically gave the minimum and the maximum dates we could index

Question related to using FieldCacheImpl

2006-04-06 Thread John Smith
Hi I need to access min and max values of a particular field in the index, as soon as a searcher is initialized. I don't need it later. Looking at old newsgroup mails, I found a few recommendations. One was to keep the min and max fields external to the index. But this will not work

RE: Data structure of a Lucene Index

2006-04-06 Thread Dmitry Goldenberg
Ideally, I'd love to see an article explaining both in detail: the index structure as well as the merge algorithm... From: Prasenjit Mukherjee [mailto:[EMAIL PROTECTED] Sent: Tue 3/28/2006 11:57 PM To: java-user@lucene.apache.org Subject: Data structure of a

Re: DateField vs DateTools

2006-04-06 Thread Daniel Naber
On Donnerstag 06 April 2006 19:50, John Smith wrote:   I have not drilled down into the implementation details too much, but what was the reason for getting rid of these methods in Lucene 1.9? There is no limit on the given dates in DateTools (within the limits of what Java's Calendar/Date

RE: Distributed Lucene.. - clustering as a requirement

2006-04-06 Thread Dmitry Goldenberg
I firmly believe that clustering support should be a part of Lucene. We've tried implementing it ourselves and so far have been unsuccessful. We tried storing Lucene indices in a database that is the back-end repository for our app in a clustered environment and could not overcome the

Re: nested phrase queries

2006-04-06 Thread Erik Hatcher
Seeing this worries me we'll see users creating XML strings, then parsing them to get the desired query. I've seen this lots with QueryParser, but it would be even more gross to see folks do this with the XML syntax. So, here's my community service message for the day if you're

Re: Question related to using FieldCacheImpl

2006-04-06 Thread John Smith
Thank you JS --- Yonik Seeley [EMAIL PROTECTED] wrote: On 4/6/06, John Smith [EMAIL PROTECTED] wrote: // inherit javadocs public String[] getStrings (IndexReader reader, String field) The string array I get back, is it guaranteed that the first non-null value I encounter in

Re: Distributed Lucene.. - clustering as a requirement

2006-04-06 Thread Chris Lamprecht
What about using lucene just for searching (i.e., no stored fields except maybe one ID primary key field), and using an RDBMS for storing the actual documents? This way you're using lucene for what lucene is best at, and using the database for what it's good at. At least up to a point -- RDBMSs

RE: Distributed Lucene.. - clustering as a requirement

2006-04-06 Thread Dmitry Goldenberg
I think it's a good idea. For an enterprise-level application, Lucene appears too file-system and too byte-sequence-centric a technology. Just my opinion. The Directory API is just too low-level. I'd be OK with an RDBMS-based Directory implementation I could take and use. But generally, I

Question about Lucene's search algorithm

2006-04-06 Thread inge santoso
Hi all, I’m still new to Lucene. I'm in the last year of my bachelor degree in Computer Science. My final thesis is about indexing and searching in Lucene 1.4.3. I've read about “Space Optimizations for Total Ranking” paper. My main question is : 1.What search

doc.get(contents)

2006-04-06 Thread miki sun
Dear all I got a java.lang.NullPointerException at java.io.StringReader.init(StringReader.java:33) error when processing the following code: for (int i = 0; i theHits.length(); i++) { Document doc = theHits.doc(i); String contents = doc.get(contents) ; TokenStream tokenStream =

Getting count of documents matching a query?

2006-04-06 Thread Tom Hill
Hi - Is there a fast way (not easy, but speedy) of getting the count of documents that match a query? I need the count, and don't need the docs at this point. If I had a simple query, (e.g. book) I can use docFreq(), and it's lightning fast. If I just run it as a query it's much slower. I'm

Re: Getting count of documents matching a query?

2006-04-06 Thread Chris Hostetter
: I need the count, and don't need the docs at this point. If I had a : simple query, (e.g. book) I can use docFreq(), and it's lightning : fast. If I just run it as a query it's much slower. I'm just : wondering if I did a custom scorer / similarity / hitcollector, how : much faster than a query

Re: highlighting - fuzzy search

2006-04-06 Thread Daniel Noll
Fisheye wrote: HashSet terms = new HashSet(); query.rewrite(reader).extractTerms(terms); Ok, but this delivers every term, not just a list of words the Levenshtein algorithm produced with similarity. I asked a similar thing in the past about term highlighting in general,

Re: StopAnalyzer and apostrophes

2006-04-06 Thread Daniel Noll
Marvin Humphrey wrote: I wrote: It looks like StopAnalyzer tokenizes by letter, and doesn't handle apostrophes. So, the input I don't know produces these tokens: don t know Is that right? It's not right. StopAnalyzer does tokenize letter by letter, but 't' is a stopword, so

Re: StopAnalyzer and apostrophes

2006-04-06 Thread Marvin Humphrey
On Apr 6, 2006, at 4:23 PM, Daniel Noll wrote: Marvin Humphrey wrote: I wrote: It looks like StopAnalyzer tokenizes by letter, and doesn't handle apostrophes. So, the input I don't know produces these tokens: don t know Is that right? It's not right. StopAnalyzer does

Calling addDocument twice for the same document

2006-04-06 Thread Daniel Noll
Hi all. I have a situation where a Document is constructed with a bunch of strings and a couple of readers. An error may occur while reading from the readers, and in these situations, we want to remove the reader and then try to index the same document again. I've made a test case which