Re: Size + memory restrictions

2006-02-15 Thread Leon Chaddock
Hi Greg, Thanks. We are actually running against 4 segments of 4gb so about 20 million docs. We cant merge the segments as their seems to be problems with out linux box , with having files over about 4gb. Not sure why that is. If I was to upgrade to 8gb of ram does it seem likely this will dou

Re: Relevance Feedback Lucene+Algorithms

2006-02-15 Thread Dave Kor
You might also want to look at that the LucQE project (http://sourceforge.net/projects/lucene-qe/), which implement a couple of automated relevance feedback methods including Rocchio's formula. On 2/15/06, Koji Sekiguchi <[EMAIL PROTECTED]> wrote: > Please check Grant Ingersoll's presentation at A

Re: QueryParser behaviour ..

2006-02-15 Thread sergiu gordea
Chris Hostetter wrote: : Exactly this is my question, why the QueryParser creates a Phrase query : when he gets several tokens from analyzer : and not a BooleanQuery? Because if it did that, there would be no way to write phrase queries :) I'm not very sure about this ... QueryParser only

Re: Relevance Feedback Lucene+Algorithms

2006-02-15 Thread Grant Ingersoll
URL is http://www.cnlp.org/apachecon2005/ Koji Sekiguchi wrote: Please check Grant Ingersoll's presentation at ApacheCon 2005. He put out great demo programs for the relevance feedback using Lucene. Thank you, Koji -Original Message- From: varun sood [mailto:[EMAIL PROTECTED] Sent

Re: QueryParser behaviour ..

2006-02-15 Thread Yonik Seeley
> From the user's point of view I think it will make sense to > build a phrase query only when the quotes are found in the search string. You make an interesting point Sergiu. Your proposal would increase the expressive power of the QueryParser by allowing the construction of either phrase querie

Re: Help with mass delete from large index

2006-02-15 Thread Chandramohan
> perform such a cull again, you might make several > distinct indexes (one per > day, per week, per whatever) during that reindexing > so the next time will be > much easier. How would you search and consolidate the results across multiple indexes? Hits from each index will have independent sc

Re: Size + memory restrictions

2006-02-15 Thread Leon Chaddock
Looking into the memory problems further I read "Every time you open an IndexSearcher/IndexReader resources are used which take up memory. for an application pointed at a static index, you only ever need one IndexReader/IndexSearcher that can be shared among multiple threads issuing queries. if

Re: Help with mass delete from large index

2006-02-15 Thread Michael D. Curtin
Chandramohan wrote: perform such a cull again, you might make several distinct indexes (one per day, per week, per whatever) during that reindexing so the next time will be much easier. How would you search and consolidate the results across multiple indexes? Hits from each index will have

Performance Issues

2006-02-15 Thread Urvashi Gadi
Hi All, My system requires traversing Hits (search result) and extracting some data from it. If the result set is very large my system becomes very slow. Is there a way to increase performance? Is there a way i can limit the number of most relevant documents returned? Best regards, Urvashi

RE: index merging

2006-02-15 Thread Omar Didi
I have tried to use the isCurrent() method IndexReader to figure out if an index is merging. but since I have to do this evrytime I need to add a document, the performance got s slow. here is what I am doing, I create 4 indexs and I am running with 4 threads. I do a round robbin on the ind

Iterating hits

2006-02-15 Thread Daniel Cortes
Hi lucene users I have a strange error and I don't know to do? My logs say this: java.lang.ArrayIndexOutOfBoundsException: 100 >= 100 at java.util.Vector.elementAt(Vector.java:431) at org.apache.lucene.search.Hits.hitDoc(Hits.java:127) at org.apache.lucene.search.Hits.doc(Hits

Re: Iterating hits

2006-02-15 Thread Yonik Seeley
Try using a different reader to delete the documents. Hits can re-execute a query, and if the searcher you are using is sharing the reader you are deleting with, it's like changing a list you are iterating over (fewer hits will be found the next time the query is executed). -Yonik On 2/15/06, Dan

Re: Size + memory restrictions

2006-02-15 Thread Chris Hostetter
: We may have many different segments of our index, and it seems below we are : using one : IndexSearcher per segment. Could this explain why we run out of memory when : using more than 2/3 segments? : Anyone else have any comments on the below? terminology is a big issue hwere .. when you use the

Re: Size + memory restrictions

2006-02-15 Thread Leon Chaddock
Hi Chris, Thanks, when I quoted segment I meant index file. So if we have 10 seperate index files are you saying we should have one indexSearcher for the index collectively, or one per index file Thanks Leon - Original Message - From: "Chris Hostetter" <[EMAIL PROTECTED]> To: Sent

Re: Size + memory restrictions

2006-02-15 Thread Otis Gospodnetic
Leon, Index is typically a directory on disk with files (commonly called "index files") in it. Each index can have 1 or more segments. Each segment is comprised of several index files. If you are using the compound index format, then the situation is a bit different (less index files). Otis P.

Re: Relevance Feedback Lucene+Algorithms

2006-02-15 Thread varun sood
Hi Thanks for replying. I read your ppt. It is good. But the code or the basic relevance feedback is not explained there. Actually I am not familiar with JSP, JUnit, Maven, etc. I guess It will take me lot of time to actually discover how the things work in demo program because I have to learn all

Re: index merging

2006-02-15 Thread Daniel Noll
Omar Didi wrote: I have tried to use the isCurrent() method IndexReader to figure out if an index is merging. but since I have to do this evrytime I need to add a document, the performance got s slow. here is what I am doing, I create 4 indexs and I am running with 4 threads. I do a round r

Re: Relevance Feedback Lucene+Algorithms

2006-02-15 Thread Grant Ingersoll
In the example code, take a look at the SearchServlet.java code and the performFeedback and getTopTerms() methods, which demonstrate the use of the term vectors. It is fairly well commented. You don't need maven, JSP or JUnit for this. On the indexing side, look at the TVHTMLDocument for how

Hardware Requirements for a large index?

2006-02-15 Thread Chun Wei Ho
Hi, I am in the process of deciding specs for a crawling machine and a searching machine (two machines), which will support merging/indexing and searching operations on a single Lucene index that may scale to about several million pages (at which it would be about 2-10 GB, assuming linear growth w

How to index numeric fields

2006-02-15 Thread Shivani Sawhney
Hi, What is the best way to index numeric decimal fields, like experience, when I want to use a range search on this field? Thanks in advance. Regards, Shivani

Re: How to index numeric fields

2006-02-15 Thread Otis Gospodnetic
Here are a few bits: http://www.lucenebook.com/search?query=indexing+numbers The Wiki and the FAQ also have some information about indexing numbers/dates. Basically, you want them small (ints, faster sorting, if you need sorting), and you don't want them too fine, if you'll be expanding them into

Re: ArrayIndexOutOfBoundsException while closing the index writer

2006-02-15 Thread Otis Gospodnetic
Who knows what else the app is doing. However, I can quickly suggest that you add a finally block and close your writer in there if writer != null. Otis - Original Message From: Shivani Sawhney <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, February 15, 2006 11:31:

ArrayIndexOutOfBoundsException while closing the index writer

2006-02-15 Thread Shivani Sawhney
Hi, I have used Lucene in my application and am just indexing and searching on some documents. The code that indexes the documents was working fine till yesterday and suddenly stopped working. I get an error when I am trying to close the index writer. The code is as follows: .

RE: ArrayIndexOutOfBoundsException while closing the index writer

2006-02-15 Thread Shivani Sawhney
Hi Otis, Thanks for such a quick reply. I tried using finally, but it didn't help. I guess if I explain the integration of lucene with my app in little detail then you probably can help me better. I allow users to upload documents, which are then indexed, and search on them. Now I am getting thi