Re: question with spellchecker

2006-06-07 Thread eks dev
try your query like ((ducted^1000 duct~2) +tape) Or maybe (duct* +tape) or even better you could try to do some stemming (Porter stemmer should get rid of these ed-suffixes) and some of the above if this does not help, have a look at lingpipe spellChecker class as this looks like exactly what

RE: Compound / non-compound index files and SIGKILL

2006-06-07 Thread Rob Staveley (Tom)
(1)... threaddump It hadn't occurred to me that I'd be able to do that, Chris. It took me a little while to figure out how, because I'm running the application as a daemon (i.e. using nohup, teeing standard output to a log file and redirecting stdout and stderr to /dev/null), which counts out

Re: question with spellchecker

2006-06-07 Thread mark harwood
I think the problem in your particular example is the suggestion software has no consideration of context. I've been playing with context-sensitive suggestions recently which take a bunch of validated (ie existing) words (eg tape) and use this to help shortlist alternatives for an unknown or

Re: Lucene and learning search

2006-06-07 Thread karl wettin
On Tue, 2006-06-06 at 22:23 +, michael turner wrote: Working on a project that requires a Search query similiar to what is seen onamazon.com in that after searching for and displaying an item, the system shows: Users that have searched for A AND B have also searched for .

java.io.IOException: Cannot delete _1d.fdt

2006-06-07 Thread Kiran Joisher
Hi, I am working on a struts application using lucene for indexing mysql database. Whenever I rebuild the application and deploy in tomcat and try to rebuild the index from scratch I have to shutdown tomcat and then restart it again. In case I don't do this I get IOException while creating

Different scoring mechanism

2006-06-07 Thread Trieschnigg, R.B. \(Dolf\)
Hi, I am trying to implement an alternative scoring mechanism in Lucene. A query of multiple terms is represented as a BooleanQuery with one or more Occur.SHOULD clauses. The scoring for a document is very simple: - Assign a score for each queryterm. ! If a document does not contain a

Converting SQL statement to Lucene query

2006-06-07 Thread George Aroush
Hi folks, Has anyone done or do you know of an API library that will take SQL statement and convert them to Lucene Query? I know not every SQL statement can become a Lucene Query but that's OK as long as the library will highlight them. Thanks! -- George

Re: MMapDirectory vs RAMDirectory

2006-06-07 Thread Peter Keegan
I was able to improve the behavior by setting the mapped ByteBuffer to null in the close method of MMapIndexInput. This seems to be a strong enough 'suggestion' to the gc, as I can see the references go away with process explorer, and the index files can be deleted, usually. Occasionally, a

Re: Converting SQL statement to Lucene query

2006-06-07 Thread Paul . Illingworth
You could take a look at Apaches Jackrabbit - it does this sort of thing. Its not exactly a library but it might give you some pointers. My understanding is that it uses an SQL like syntax for defining queries that are converted into an abstract syntax tree which it can then convert into any

Re: spring lucene

2006-06-07 Thread Karel Tejnora
Not explict closing can lead especially when is allowed a lot of memory to JVM but small amount is used that old files will stay on the disk on linux. Solution is in using ReentrantReadWriteLock where the re-open method opens new indexreader at ThreadLocal accuire write lock saves old reference

RE: Compound / non-compound index files and SIGKILL

2006-06-07 Thread Chris Hostetter
: However, I'm not sure what to make of: : 8 : Thread 3740: (state = BLOCKED) : - java.lang.Object.wait(long) @bci=0 (Interpreted frame) : - java.lang.Object.wait() @bci=2, line=474 (Compiled frame) : Error occurred during stack walking: : java.lang.NullPointerException : at

IndexWriter.addIndexes optimization

2006-06-07 Thread Benjamin Stein
I have a very large corpus that I am storing in many indexes: 200 indexes * ~500MB each, with 10^6 very tiny documents in each. (I could look into optimizing this later, of course, but seems ok for now) During indexing, I have been using a RAMDirectory to store many thousands of documents in

Re: IndexWriter.addIndexes optimization

2006-06-07 Thread Benjamin Stein
On 6/7/06, Benjamin Stein [EMAIL PROTECTED] wrote: During indexing, I have been using a RAMDirectory to store many thousands of documents in memory before flushing the buffer to disk using IndexWriter.addIndexes. For the most part this works very well, except that performance degrades

Re: IndexWriter.addIndexes optimization

2006-06-07 Thread Grant Ingersoll
My understanding of the IndexWriter code is that it more or less manages this for you. It has an internal RAMDirectory which it uses to index in memory and then periodically flushes to disk based on your merge factor settings (amongst other settings). So I am not sure if the extra work you

Re: IndexWriter.addIndexes optimization

2006-06-07 Thread Dan Armbrust
Benjamin Stein wrote: I could probably store the little RAMDirectories to disk as many FSDirectories, and then addIndexes() of *all* the FSDirectories at the end instead of every time. That would probably be smart. Glad I asked myself! That was what I was going to suggest - you may also

Re: duplicate results MultiFieldQueryParser

2006-06-07 Thread varun sood
yeah you are right. I was talking about going through the large index and discovering the problem as to where else it have occured and how? But thanks for your tips. I use Luke before and certianly it helped me this time aswell. I found the problem. Its not great Lucene. Its me.. error in

RE: Compound / non-compound index files and SIGKILL

2006-06-07 Thread Rob Staveley (Tom)
I'm not sure what exactly your process method is doing In essence it gets text from the content's input stream and writes it to the PipedWriter and hence to the PipedReader passed to the Field constructor. The process method for a plain text content handler simply copies from the input stream

Re: Lucene and learning search

2006-06-07 Thread karl wettin
On Tue, 2006-06-06 at 22:23 +, michael turner wrote: Users that have searched for A AND B have also searched for . Something just hit me. Perhaps it would be interesting for you to track sessions that search for the same thing but don't seem to find what they are looking