AnalyZer HELP Please

2004-08-18 Thread Karthik N S
Hi Guys Finally with lot's experimentation, I came to know that A word such as 'new' already present in Analyzer, will not return any hits [ Even when enclosed with Quotes "\""] such as "New Year" That's really Intresting:( Thx Karthik -Orig

RE: OutOfMemoryError

2004-08-18 Thread John Moylan
Terence, This may help: http://issues.apache.org/bugzilla/show_bug.cgi?id=30628 I had the problem, above...but I managed to resolve it be not closing the indexsearcher. Instead I now reuse the same indexsearcher all of the time within my JSP code as an application variable. GC keeps memory in che

Re: OutOfMemoryError

2004-08-18 Thread Otis Gospodnetic
Reuse your IndexSearcher! :) Also, I think somebody has written some EJB stuff to work with Lucene. The project is on SF.net. Otis --- Terence Lai <[EMAIL PROTECTED]> wrote: > Hi All, > > I am getting a OutOfMemoryError when I deploy my EJB application. To > debug the problem, I wrote the fol

Re: AnalyZer HELP Please

2004-08-18 Thread Erik Hatcher
On Aug 18, 2004, at 3:41 AM, Karthik N S wrote: Hi Guys Finally with lot's experimentation, I came to know that A word such as 'new' already present in Analyzer, will not return any hits [ Even when enclosed with Quotes "\""] such as "New Year" That's really Intresting.

RE: Restoring a corrupt index

2004-08-18 Thread Honey George
Looks like problem is not with the hexeditor, even in the ultraedit(i had access to a windows box) I am seeing the same display. The problem is I am not able to identify where a record starts with just 1 record in the file. Need to try some alternate approach. Thanks, George --- [EMAIL PROTEC

Re: Restoring a corrupt index

2004-08-18 Thread Erik Hatcher
The details of the segments file (and all the others) is freely available here: http://jakarta.apache.org/lucene/docs/fileformats.html Also, there is Java code in Lucene, of course, that manipulates the segments file which could be leveraged (although probably package scoped and not eas

Re: Restoring a corrupt index

2004-08-18 Thread Honey George
Thanks Erik, that worked. I was able to remove the corrupt index and now it looks like the index is OK. I was able to view the number of documents in the index. Before that I was getting the error, java.io.IOException: read past EOF I am yet to find out how my index got corrupted. There is another

RE: Restoring a corrupt index

2004-08-18 Thread Karthik N S
Hi Guys In Our Situation we would be indexing Million & Millions of Information documents with Huge Giga Bytes of Data Indexed and finally would be put into a MERGED INDEX, Categorized accordingly. There may be a possibility of Corruption, So Please do post the code reffrals..

RE: Re: OutOfMemoryError

2004-08-18 Thread Terence Lai
Hi Otis, The reason why I ran into this problem is that I partition my search documents into multiple index directories ordered by document modified date. My application only returns the lastest 500 documents that matches the criteria. By partitioning the documents into different directories, w

Lucene Search Applet

2004-08-18 Thread Simon mcIlwaine
Im developing a Lucene CD-ROM based search which will search html pages on CD-ROM, using an applet as the UI. I know that theres a problem with lock files and also security restrictions on applets so I am using the RAMDirectory. I have it working in a Swing application however when I put it into

Re: Lucene Search Applet

2004-08-18 Thread Terry Steichen
I suspect it has to do with the security restrictions of the applet, 'cause it doesn't appear to be finding your Lucene jar file. Also, regarding the lock files, I believe you can disable the locking stuff just for purposes like yours (read-only index). Regards, Terry - Original Message

RE: AnalyZer HELP Please

2004-08-18 Thread Tate Avery
That is interesting. I went to lookup the cases for this (on Google). Here are my 4 queries and the results: a) of the from it - 25,500,000 matches containing 'of' and 'the' and 'from' and 'it' - i.e. stop list NOT used if query is only stopwords b) "of the from it"

NegativeArraySizeException when creating a new IndexSearcher

2004-08-18 Thread Sven
Hi! I have a problem to port a Lucene based knowledgebase from Windows to Linux. On Windows it works fine whereas I get a NegativeArraySizeException on Linux when I try to initialise a new IndexSearcher to search the index. Deleting and rebuilding the index didn't help. I checked permissions, file

Re: AnalyZer HELP Please

2004-08-18 Thread Erik Hatcher
Thanks for doing the legwork. My favorite example is "to be or not to be" with and without quotes. The top hit without quotes is quite funny. So, Google doesn't throw away stop words, but they do special query processing to keep you from doing silly things like "show me all documents with 't

What's the return order when the scores for two doc are exactly t he same

2004-08-18 Thread Ching-Pei Hsing
Hi, What is the order returned by Lucene when the scores for two result documents are exactly the same? I know this rarely happens in full text search but it happened in our case. We build index over ID¡¦s of structured data and try to search with ID¡¦s. Thanks Ching-pei --

RE: AnalyZer HELP Please

2004-08-18 Thread Tate Avery
Basically, Google uses its stop lists selectively. To me, the 2 rules appear to be: 1) Do not use stop list for items in quotes (i.e. exact phrase) 2) Do not use stop list if the query is ONLY stop words Furthermore, they DO let you do silly things like find out there are approximately 5,760,0

Re: AnalyZer HELP Please

2004-08-18 Thread Erik Hatcher
On Aug 18, 2004, at 2:39 PM, Tate Avery wrote: Anyway, that is how I interpreted my Google tests. And, as an observation, one would need to be a bit creative to get the same behaviour with Lucene given the current analyzer setup (IMO). Again, see Nutch for the creative part of this, since it is

Re: What's the return order when the scores for two doc are exactly t he same

2004-08-18 Thread Erik Hatcher
The index order is the "secondary" sort order. You can change this by using the new sorting facility if desired. Erik On Aug 18, 2004, at 2:24 PM, Ching-Pei Hsing wrote: Hi, What is the order returned by Lucene when the scores for two result documents are exactly the same? I know this ra

RE: Re: OutOfMemoryError

2004-08-18 Thread Terence Lai
Hi, I tried to reuse the IndexSearcher, but I have another question. What happen if an application server unloads the class after it is idle for a while, and then re-instantiate the object back when it recieves a new request? Everytime the server re-instantiates the class, a new IndexSearcher i

Index Size

2004-08-18 Thread Rob Jose
Hello I have indexed several thousand (52 to be exact) text files and I keep running out of disk space to store the indexes. The size of the documents I have indexed is around 2.5 GB. The size of the Lucene indexes is around 287 GB. Does this seem correct? I am not storing the contents of th

Re: Index Size

2004-08-18 Thread Stephane James Vaucher
From: Doug Cutting http://www.mail-archive.com/[EMAIL PROTECTED]/msg08757.html > An index typically requires around 35% of the plain text size. I think it's a little big. sv On Wed, 18 Aug 2004, Rob Jose wrote: > Hello > I have indexed several thousand (52 to be exact) text files and I keep >

Re: Re: OutOfMemoryError

2004-08-18 Thread David Sitsky
> I tried to reuse the IndexSearcher, but I have another question. What > happen if an application server unloads the class after it is idle for a > while, and then re-instantiate the object back when it recieves a new > request? The EJB spec takes this into account, as there are hook methods you

RE: Re: Re: OutOfMemoryError

2004-08-18 Thread Terence Lai
Hi David, In my test program, I invoke the IndexSearcher.close() method at the end of the loop. However, it doesn't seems to release the memory. My concern is that even though I put the IndexSearcher.close() statement in the hook methods, it may not release all the memory until the application