Re: about pdf search

2011-03-07 Thread Ian Lea
Please don't cross-post to multiple lists. Look back through the lucene user email archive and you'll find people talking about this or use your favourite search engine to find hits for something like "lucene pdf highlighting". If you don't find an answer, post again to the most appropriate list

performance issues in multivalued fields

2011-03-07 Thread suman.holani
Hello, I am facing an issue for multivalued fields in lucene I am generating lucene doc , where page is multivalued . So my doc will be like this having more than n fields( which can be more than 1500 also ..) per doc in case page attribute Example

Re: performance issues in multivalued fields

2011-03-07 Thread Erick Erickson
You have to describe in detail what "taking a huge performance hit" means, there's not much to go on here... But in general, adding N elements to a mutli-valued field isn't a problem at all. This bit of code: Document D = searcher.doc(hits[i].doc); is very suspicious. Does your cLucene version h

Re: FastVectorHighlighter and field compression

2011-03-07 Thread Koji Sekiguchi
(11/03/07 1:16), Joel Halbert wrote: Hi, I'm using FastVectorHighlighter for highlighting, 3.0.3. At the moment this is highlighting a field which is stored, but not compressed. It all works perfectly. I'd like to compress the field that is being highlighted, but it seems like the new way to c

RE: performance issues in multivalued fields

2011-03-07 Thread suman.holani
Thanks for prompt reply. I am not using compression or lazy loading in either clucene and lucene. Since I need to get the data from lucene for all searched docs for further processing If In clucene it takes 15 ms In lucene it takes 100ms+ for the same search :( Number of hits is around 1000 d

Altering tf for a single field

2011-03-07 Thread Aaron Lav
I'm trying to figure out how to achieve the effect of per-field modifications to tf for lucene 2.9.x. (Specifically, I'd like to cap it for a single field which is subject to keyword stuffing, while still allowing phrase searches in that field to work.) It looks to me as if providing my own varia

Re: about pdf search

2011-03-07 Thread James Wilson
Cescy wrote: Hi, I am developing a pdf search engine, just use in local computer to search massive pdf documents. I used pdfbox+lucene to index and search, and then I have to display the context to the user in pdf file in user interface. HOW CAN I ACHIEVE THIS??? I have completed a projec

Re: about pdf search

2011-03-07 Thread Bill Janssen
James Wilson wrote: > I have completed a project to do the exact same thing. I put the pdf > text in XML files. Then after I do a Lucene search I read the text from > the XML files. I do not store the text in the Lucene index. That would > bloat the index and slow down my searches. FYI -- I