Lucene Highlighting and Dynamic Summaries

2009-03-07 Thread Amin Mohammed-Coleman
Hi I am currently indexing documents (pdf, ms word, etc) that are uploaded, these documents can be searched and what the search returns to the user are summaries of the documents. Currently the summaries are extracted when indexing the file (summary constructed by taking the first 10 lines of the

Re: Lucene Highlighting and Dynamic Summaries

2009-03-07 Thread Michael McCandless
You should look at contrib/highlighter, which does exactly this. Mike Amin Mohammed-Coleman wrote: Hi I am currently indexing documents (pdf, ms word, etc) that are uploaded, these documents can be searched and what the search returns to the user are summaries of the documents. Currently

Re: Lucene Highlighting and Dynamic Summaries

2009-03-07 Thread Erik Hatcher
With the caveat that if you're not storing the text you want highlighted, you'll have to retrieve it somehow and send it into the Highlighter yourself. Erik On Mar 7, 2009, at 5:40 AM, Michael McCandless wrote: You should look at contrib/highlighter, which does exactly this. Mike

Re: Lucene Highlighting and Dynamic Summaries

2009-03-07 Thread Amin Mohammed-Coleman
hi that's what i was thinking about. i would need to get the file and extract the text again and then pass through the highlighter. The other option is storing the content in the index the downside being index is going to be large. Which would be the recommended approach? Cheers Amin On Sat,

Re: Lucene Highlighting and Dynamic Summaries

2009-03-07 Thread Erik Hatcher
It depends :) It's a trade-off. If storing is not prohibitive, I recommend that as it makes life easier for highlighting. Erik On Mar 7, 2009, at 6:37 AM, Amin Mohammed-Coleman wrote: hi that's what i was thinking about. i would need to get the file and extract the text again a

RE: Lucene Highlighting and Dynamic Summaries

2009-03-07 Thread Uwe Schindler
o: java-user@lucene.apache.org > Subject: Re: Lucene Highlighting and Dynamic Summaries > > It depends :) > > It's a trade-off. If storing is not prohibitive, I recommend that as > it makes life easier for highlighting. > > Erik > > On Mar 7, 2009, at

Re: Lucene Highlighting and Dynamic Summaries

2009-03-07 Thread Amin Mohammed-Coleman
turday, March 07, 2009 12:46 PM > > To: java-user@lucene.apache.org > > Subject: Re: Lucene Highlighting and Dynamic Summaries > > > > It depends :) > > > > It's a trade-off. If storing is not prohibitive, I recommend that as > > it makes life easier

RE: Lucene Highlighting and Dynamic Summaries

2009-03-07 Thread Uwe Schindler
> cool. i will use compression and store in index. is there anything > special > i need to for decompressing the text? i presume i can just do > doc.get("content")? > thanks for your advice all! No just use Field.Store.COMPRESS when adding to index and Document.get() when fetching. The decompress

Re: Lucene Highlighting and Dynamic Summaries

2009-03-07 Thread Amin Mohammed-Coleman
Thanks! The final piece that I needed to do for the project! Cheers Amin On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler wrote: > > cool. i will use compression and store in index. is there anything > > special > > i need to for decompressing the text? i presume i can just do > > doc.get("cont

Re: Lucene Highlighting and Dynamic Summaries

2009-03-07 Thread Amin Mohammed-Coleman
Hi Got it working! Thanks again for your help! Amin On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman wrote: > Thanks! The final piece that I needed to do for the project! > Cheers > > Amin > > On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler wrote: > >> > cool. i will use compression an

Re: Lucene Highlighting and Dynamic Summaries

2009-03-09 Thread Amin Mohammed-Coleman
Hi I am seeing some strange behaviour with the highlighter and I'm wondering if anyone else is experiencing this. In certain instances I don't get a summary being generated. I perform the search and the search returns the correct document. I can see that the lucene document contains the text in

Re: Lucene Highlighting and Dynamic Summaries

2009-03-11 Thread Amin Mohammed-Coleman
Hi Apologies for re sending this mail. Just wondering if anyone has experienced the below. I'm not sure if this could happen due nature of document. It does seem strange one term search returns summary while another does not even though same document is being returned. I'm asking this so

Re: Lucene Highlighting and Dynamic Summaries

2009-03-11 Thread markharw00d
If you can supply a Junit test that recreates the problem I think we can start to make progress on this. Amin Mohammed-Coleman wrote: Hi Apologies for re sending this mail. Just wondering if anyone has experienced the below. I'm not sure if this could happen due nature of document. It does

Re: Lucene Highlighting and Dynamic Summaries

2009-03-12 Thread Amin Mohammed-Coleman
Hi Please find attadched a test case plus a document. Just to mention this occurs sometimes for other files. Cheers Amin On Wed, Mar 11, 2009 at 6:11 PM, markharw00d wrote: > If you can supply a Junit test that recreates the problem I think we can > start to make progress on this. > > > > Amin

Re: Lucene Highlighting and Dynamic Summaries

2009-03-12 Thread mark harwood
The attachment didn't make it through here. Can you add it as an attachment to a new JIRA issue? Thanks, Mark From: Amin Mohammed-Coleman To: java-user@lucene.apache.org Sent: Thursday, 12 March, 2009 7:47:20 Subject: Re: Lucene Highlighting and Dy

Re: Lucene Highlighting and Dynamic Summaries

2009-03-12 Thread Amin Mohammed-Coleman
> From: Amin Mohammed-Coleman > To: java-user@lucene.apache.org > Sent: Thursday, 12 March, 2009 7:47:20 > Subject: Re: Lucene Highlighting and Dynamic Summaries > > Hi > > Please find attadched a test case plus a document. Just to mention this > occurs

Re: Lucene Highlighting and Dynamic Summaries

2009-03-12 Thread Amin Mohammed-Coleman
t;> The attachment didn't make it through here. Can you add it as an >> attachment to a new JIRA issue? >> >> Thanks, >> Mark >> >> >> >> >> >> >> From: Amin Mohammed-Coleman >> To: java

Re: Lucene Highlighting and Dynamic Summaries

2009-03-12 Thread Amin Mohammed-Coleman
Hi I have found that it is not issue with POI. I extracted text using PoI but differenlty and the term is extracted properly. When I store the text and retrieve it the term exists. However running the text through highlighter doesn't work I will post test case with plain text file on JIR

Re: Lucene Highlighting and Dynamic Summaries

2009-03-12 Thread Amin Mohammed-Coleman
JIRA updated. Includes new testcase which shows highlighter not working as expected. On Thu, Mar 12, 2009 at 5:56 PM, Amin Mohammed-Coleman wrote: > Hi > > I have found that it is not issue with POI. I extracted text using PoI but > differenlty and the term is extracted properly. When I store t

Re: Lucene Highlighting and Dynamic Summaries

2009-03-12 Thread Amin Mohammed-Coleman
I did the following: highlighter.setMaxDocCharsToAnalyze(Integer.MAX_VALUE); which works. On Thu, Mar 12, 2009 at 6:41 PM, Amin Mohammed-Coleman wrote: > JIRA updated. Includes new testcase which shows highlighter not working as > expected. > > > On Thu, Mar 12, 2009 at 5:56 PM, Amin Mohammed

Re: Lucene Highlighting and Dynamic Summaries

2009-03-12 Thread Michael McCandless
IndexWriter has such behavior too, and because it was such a common trap (developers could not understand why their content was being truncated), we made that setting explicit, up front so you were aware of it. I think this in general is a reasonable approach for settings that "lose" stuff

Re: Lucene Highlighting and Dynamic Summaries

2009-03-12 Thread Amin Mohammed-Coleman
Hi I think that would be good. Probably a silly thing to ask but I guess there is a performance implication by setting it to max value. Is there a general setting that other developers use? Cheers Amin On 12 Mar 2009, at 22:03, Michael McCandless wrote: IndexWriter has such behavi

Re: Lucene Highlighting and Dynamic Summaries

2009-03-13 Thread Michael McCandless
Amin Mohammed-Coleman wrote: I think that would be good. I'll open an issue. Probably a silly thing to ask but I guess there is a performance implication by setting it to max value. Right. And it's tough choosing a default in situations like this -- performance vs losing stuff. Howe

Re: Lucene Highlighting and Dynamic Summaries

2009-03-13 Thread Amin Mohammed-Coleman
Sweet! When will this highlighter be available? Can I use this now? Cheers! On Fri, Mar 13, 2009 at 10:10 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > > Amin Mohammed-Coleman wrote: > > I think that would be good. >> > > I'll open an issue. > > Probably a silly thing to ask

Re: Lucene Highlighting and Dynamic Summaries

2009-03-13 Thread Michael McCandless
Well, it's not yet committed. You can use it now by pulling the patch attached to the issue & testing it yourself. If you do so, please report back! This is how Lucene improves. I'm hoping we can include it in 2.9... Mike On Mar 13, 2009, at 6:35 AM, Amin Mohammed-Coleman wrote: Swee

Re: Lucene Highlighting and Dynamic Summaries

2009-03-13 Thread Amin Mohammed-Coleman
Absolutely! I have received considerable help from the community and there are so many more stuff I want to ask! Cheers! Amin On Fri, Mar 13, 2009 at 10:41 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > > Well, it's not yet committed. > > You can use it now by pulling the patch a

Re: Lucene Highlighting and Dynamic Summaries

2009-03-13 Thread Amin Mohammed-Coleman
Ok. I tried to apply the patch(s) and completely messed it up (user error). Is there a full example of the highlighter that is available that I can apply and test? Cheers Amin On Fri, Mar 13, 2009 at 12:09 PM, Amin Mohammed-Coleman wrote: > Absolutely! I have received considerable help from