Re: Full disk space during indexing process with 120 gb of free disk space

2006-12-07 Thread Dan Armbrust
Ariel Isaac Romero Cartaya wrote: Hi every body: I am getting a problem during the indexing process, I am indexing big amounts of texts most of them in pdf format I am using pdf box 0.6 version. The space in hard disk before that the indexing process begin is around 120 Gb but incredibly even

Reading Performance

2006-12-07 Thread Aigner, Thomas
Howdy all, I have a question on reading many documents and time to do this. I have a loop on the hits object reading a record, then writing it to a file. When there is only 1 user on the Index Searcher, this process to read say 100,000 takes around 3 seconds. This is slow, but can

Re: Reading Performance

2006-12-07 Thread Grant Ingersoll
Have you done any profiling to identify hotspots in Lucene versus your application? You might look into the FieldSelector code (used in IndexReader) in the Trunk version of Lucene could be used to only load the fields you are interested when getting the document from disk. This can be us

Re: BooleanQuery

2006-12-07 Thread Marcelo Ohashi
Thanks for your help guy. I'll test that query parser. Marcelo On Dec 6, 2006, at 11:37 PM, Renaud Waldura wrote: Read my own complaints about QueryParser here: http://marc.theaimsgroup.com/?l=lucene-user&m=116069469827270&w=2 You're in for a surprise. As alluded by Erick, the stock QP doesn

Re: Reading Performance

2006-12-07 Thread Erick Erickson
Well, the performance isn't bad considering you're executing the *search* around 1,000 times... One of the characteristics of a Hits object is that it's optimized for getting the top 100 docs or so. To get the next 100 docs it re-executes the query. Repeatedly . I'd try using a HitCollector o

RE: Reading Performance

2006-12-07 Thread Aigner, Thomas
Thanks Grant and Erik for your suggestions. I will try both of them and let you know if I see a marked increase in speed. Tom -Original Message- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: Thursday, December 07, 2006 1:24 PM To: java-user@lucene.apache.org Subject: Re: Readin

Get scores per field.

2006-12-07 Thread Sunil Kumar PK
Hi All, Is it possible to get the scores/filed in the result document, instead of getting scores/document? If this feature is not exists, what are the possible ways for implementing this feature? Thanks, Sunil

Optimizing search speed & performance for a 10G Index.

2006-12-07 Thread Chun Wei Ho
Hi, We run a search engine based on Lucene 1.9.1 / Nutch 0.7.2. Our index has approximately 2 million documents and the physical size of it is about 10 GB. We run it as a tomcat web application on a Fedora Core 4 server with duo Xeon 3.2GHz processors and 4GB RAM. We receive about 46500 web sear

Re: How to set query time scoring

2006-12-07 Thread Xiaocheng Luan
Try to play with the similarity class/subclasses, it might help. For example, you may adjust the coord to increase the chance (not necessary guarantee?) that ORed results will be after the ANDed results; adjust the sloppy factor to favor phrases, etc. Xiaocheng Sajid Khan <[EMAIL PROTECTED]> wro