Loading FS index into RAM index

2005-07-08 Thread [EMAIL PROTECTED]
I was reading the Lucene book and came across the part where the author detailed how to write index to ram dir and then flush it to file dir. That turned on a light bulb and I want to do bidirectional index loading. Build index in ram, store in file for backup. Then if I have to shutdown the p

Index Partitioning ( was Re: Search deadlocking under load)

2005-07-08 Thread Paul Smith
Nathan, first apologies for somewhat hijacking your thread, but I believe my question to be very related. Nathan's Scenario 1 is the one we're effectively employing (or in the process of setting up). Rather than 1 Index To Rule Them All, I have decided to partition the index structure. Us

Re: search that span over consecutive documents

2005-07-08 Thread Erik Hatcher
On Jul 8, 2005, at 2:57 AM, Daniel Moldovan wrote: My application must index a lot of books that are stored in xml files. Each xml file represents a page of the book and this way each page becomes a lucene Document. Each page is organized in different sections and finally each section conta

Re: How to get the un-stemed word

2005-07-08 Thread Marvin Humphrey
On Jul 8, 2005, at 8:44 AM, mark harwood wrote: You can get the unstemmed word by re-analysing the (hopefully stored somewhere) text. Look at the tokens emitted from the TokenStream and when you get to the one that matches the stemmed form you can use the token offset info to retrieve the unste

Re: Lucene faster on JDK 1.5?

2005-07-08 Thread roy-lucene-user
This might be a good time to ask another question. Are there any advantages to lucene using the java.nio package? Roy On 7/8/05, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > > > Nothing significant, but I've been using 1.5 on > Simpy.com(lots of > Lucene behind it) for over

Re: Loading large index into RAM

2005-07-08 Thread Otis Gospodnetic
Yes, you could mount your memory as a ram disk (ramfs type under Linux) partition. This was mentioned on this list 1+ years ago. Otis --- Cheolgoo Kang <[EMAIL PROTECTED]> wrote: > How about using RAM disk and FSDirectory? It would be not so fast as > RAMDirectory, > but will be fast enough. >

Re: Search deadlocking under load

2005-07-08 Thread Otis Gospodnetic
Nathan, 3) is the recommended usage. Your index is on an NFS share, which means you are searching it over the network. Make it local, and you should see performance improvements. Local or remove, it makes sense that searches take longer to execute, and the load goes up. Yes, it shouldn't deadlo

Re: FileNotFoundException segments

2005-07-08 Thread Muetze303
Ok, your directory exists. if ((indexFile = new File(indexDir)).exists() && indexFile.isDirectory()) { exists = false; System.out.println("Index does not exist"); } now is exists == false at this point: writer = new IndexWriter(indexFile, new StandardAnalyzer(), exists); exists is still fals

Re: How to get the un-stemed word

2005-07-08 Thread mark harwood
You can get the unstemmed word by re-analysing the (hopefully stored somewhere) text. Look at the tokens emitted from the TokenStream and when you get to the one that matches the stemmed form you can use the token offset info to retrieve the unstemmed form from the original text. Another option w

Re: How to get the un-stemed word

2005-07-08 Thread Erik Hatcher
On Jul 8, 2005, at 11:13 AM, Andrew Boyd wrote: It would be cool to have the type in the index. Imagine if you had different types like person, place, event or even subject, predicate, object. It would greatly enhance the search capabilities of lucene. I completely concur. It is for the

Re: How to get the un-stemed word

2005-07-08 Thread Andrew Boyd
Thanks for the reply. It would be cool to have the type in the index. Imagine if you had different types like person, place, event or even subject, predicate, object. It would greatly enhance the search capabilities of lucene. Andrew -Original Message- From: Erik Hatcher <[EMAIL PROT

Re: Help on the ParallelMultiSearcher.rewrite(Query) method

2005-07-08 Thread Erik Hatcher
On Jul 8, 2005, at 10:32 AM, Terence Lai wrote: I have implemented the way as you described, and it is now working probably. However, I have a concern on the performance of my implementation. Since I am using parallelMultiSearcher to perform the search. I have no idea on which index direc

Search deadlocking under load

2005-07-08 Thread Nathan Brackett
Hey all, We're looking to use Lucene as the back end to our website and we're running into an unusual deadlocking problem. For testing purposes, we're just running one web server (threaded environment) against an index mounted on an NFS share. This machine performs searches only against this inde

Re: How to get the un-stemed word

2005-07-08 Thread Erik Hatcher
On Jul 8, 2005, at 9:08 AM, Andrew Boyd wrote: Hi all, I am using the snowball stemmer and for all my searches that works fine. However, I have a need to display the un-stemmed word after doing some term vector analysis. I was thinking that I might insert the real word at the same po

Re: Search

2005-07-08 Thread Erik Hatcher
On Jul 8, 2005, at 8:48 AM, christopher may wrote: Is there a simple way for me to add a browse by letter setup on lucene's main page. If anybody knows of any documents on this I would greatly appreciate it, Thanks I don't understand what you mean. Are you talking about changing Lucene's

RE: Re: Help on the ParallelMultiSearcher.rewrite(Query) method

2005-07-08 Thread Terence Lai
Hi Erik, I have implemented the way as you described, and it is now working probably. However, I have a concern on the performance of my implementation. Since I am using parallelMultiSearcher to perform the search. I have no idea on which index directory is corresponding to the document that I

Re: Loading large index into RAM

2005-07-08 Thread Cheolgoo Kang
How about using RAM disk and FSDirectory? It would be not so fast as RAMDirectory, but will be fast enough. On 7/8/05, Chris Lamprecht <[EMAIL PROTECTED]> wrote: > If you're under an x86_64 machine (AMD opteron, for instance), you may > be able to set your JVM heap this large. But if you have 6GB

How to get the un-stemed word

2005-07-08 Thread Andrew Boyd
Hi all, I am using the snowball stemmer and for all my searches that works fine. However, I have a need to display the un-stemmed word after doing some term vector analysis. I was thinking that I might insert the real word at the same position as the stemed word but give the real word a type

Search

2005-07-08 Thread christopher may
Is there a simple way for me to add a browse by letter setup on lucene's main page. If anybody knows of any documents on this I would greatly appreciate it, Thanks - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional co

Bertrand VENZAL/CER31/REC est absent(e).

2005-07-08 Thread Bertrand VENZAL
Je serai absent(e) à partir du 08/07/2005 de retour le 02/08/2005. Je répondrai à votre message dès mon retour. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Loading large index into RAM

2005-07-08 Thread Chris Lamprecht
If you're under an x86_64 machine (AMD opteron, for instance), you may be able to set your JVM heap this large. But if you have 6GB RAM, you might try keeping your JVM small (under 1GB), and letting linux's filesystem cache do the work. Lucene searches are often CPU-bound (during the search phase