Hi
I am currently indexing documents (pdf, ms word, etc) that are uploaded,
these documents can be searched and what the search returns to the user are
summaries of the documents. Currently the summaries are extracted when
indexing the file (summary constructed by taking the first 10 lines of the
You should look at contrib/highlighter, which does exactly this.
Mike
Amin Mohammed-Coleman wrote:
Hi
I am currently indexing documents (pdf, ms word, etc) that are
uploaded,
these documents can be searched and what the search returns to the
user are
summaries of the documents. Currently
With the caveat that if you're not storing the text you want
highlighted, you'll have to retrieve it somehow and send it into the
Highlighter yourself.
Erik
On Mar 7, 2009, at 5:40 AM, Michael McCandless wrote:
You should look at contrib/highlighter, which does exactly this.
Mike
hi
that's what i was thinking about. i would need to get the file and extract
the text again and then pass through the highlighter. The other option is
storing the content in the index the downside being index is going to be
large. Which would be the recommended approach?
Cheers
Amin
On Sat,
It depends :)
It's a trade-off. If storing is not prohibitive, I recommend that as
it makes life easier for highlighting.
Erik
On Mar 7, 2009, at 6:37 AM, Amin Mohammed-Coleman wrote:
hi
that's what i was thinking about. i would need to get the file and
extract
the text again a
o: java-user@lucene.apache.org
> Subject: Re: Lucene Highlighting and Dynamic Summaries
>
> It depends :)
>
> It's a trade-off. If storing is not prohibitive, I recommend that as
> it makes life easier for highlighting.
>
> Erik
>
> On Mar 7, 2009, at
turday, March 07, 2009 12:46 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Lucene Highlighting and Dynamic Summaries
> >
> > It depends :)
> >
> > It's a trade-off. If storing is not prohibitive, I recommend that as
> > it makes life easier
> cool. i will use compression and store in index. is there anything
> special
> i need to for decompressing the text? i presume i can just do
> doc.get("content")?
> thanks for your advice all!
No just use Field.Store.COMPRESS when adding to index and Document.get()
when fetching. The decompress
Thanks! The final piece that I needed to do for the project!
Cheers
Amin
On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler wrote:
> > cool. i will use compression and store in index. is there anything
> > special
> > i need to for decompressing the text? i presume i can just do
> > doc.get("cont
Hi
Got it working! Thanks again for your help!
Amin
On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman wrote:
> Thanks! The final piece that I needed to do for the project!
> Cheers
>
> Amin
>
> On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler wrote:
>
>> > cool. i will use compression an
Hi
I am seeing some strange behaviour with the highlighter and I'm wondering if
anyone else is experiencing this. In certain instances I don't get a
summary being generated. I perform the search and the search returns the
correct document. I can see that the lucene document contains the text in
Hi
Apologies for re sending this mail. Just wondering if anyone has
experienced the below. I'm not sure if this could happen due nature of
document. It does seem strange one term search returns summary while
another does not even though same document is being returned.
I'm asking this so
If you can supply a Junit test that recreates the problem I think we can
start to make progress on this.
Amin Mohammed-Coleman wrote:
Hi
Apologies for re sending this mail. Just wondering if anyone has
experienced the below. I'm not sure if this could happen due nature of
document. It does
Hi
Please find attadched a test case plus a document. Just to mention this
occurs sometimes for other files.
Cheers
Amin
On Wed, Mar 11, 2009 at 6:11 PM, markharw00d wrote:
> If you can supply a Junit test that recreates the problem I think we can
> start to make progress on this.
>
>
>
> Amin
The attachment didn't make it through here. Can you add it as an attachment to
a new JIRA issue?
Thanks,
Mark
From: Amin Mohammed-Coleman
To: java-user@lucene.apache.org
Sent: Thursday, 12 March, 2009 7:47:20
Subject: Re: Lucene Highlighting and Dy
> From: Amin Mohammed-Coleman
> To: java-user@lucene.apache.org
> Sent: Thursday, 12 March, 2009 7:47:20
> Subject: Re: Lucene Highlighting and Dynamic Summaries
>
> Hi
>
> Please find attadched a test case plus a document. Just to mention this
> occurs
t;> The attachment didn't make it through here. Can you add it as an
>> attachment to a new JIRA issue?
>>
>> Thanks,
>> Mark
>>
>>
>>
>>
>>
>>
>> From: Amin Mohammed-Coleman
>> To: java
Hi
I have found that it is not issue with POI. I extracted text using PoI
but differenlty and the term is extracted properly. When I store the
text and retrieve it the term exists. However running the text through
highlighter doesn't work
I will post test case with plain text file on JIR
JIRA updated. Includes new testcase which shows highlighter not working as
expected.
On Thu, Mar 12, 2009 at 5:56 PM, Amin Mohammed-Coleman wrote:
> Hi
>
> I have found that it is not issue with POI. I extracted text using PoI but
> differenlty and the term is extracted properly. When I store t
I did the following:
highlighter.setMaxDocCharsToAnalyze(Integer.MAX_VALUE);
which works.
On Thu, Mar 12, 2009 at 6:41 PM, Amin Mohammed-Coleman wrote:
> JIRA updated. Includes new testcase which shows highlighter not working as
> expected.
>
>
> On Thu, Mar 12, 2009 at 5:56 PM, Amin Mohammed
IndexWriter has such behavior too, and because it was such a common trap
(developers could not understand why their content was being
truncated), we
made that setting explicit, up front so you were aware of it.
I think this in general is a reasonable approach for settings that
"lose" stuff
Hi
I think that would be good. Probably a silly thing to ask but I guess
there is a performance implication by setting it to max value.
Is there a general setting that other developers use?
Cheers
Amin
On 12 Mar 2009, at 22:03, Michael McCandless
wrote:
IndexWriter has such behavi
Amin Mohammed-Coleman wrote:
I think that would be good.
I'll open an issue.
Probably a silly thing to ask but I guess there is a performance
implication by setting it to max value.
Right. And it's tough choosing a default in situations like this --
performance vs losing stuff.
Howe
Sweet! When will this highlighter be available? Can I use this now?
Cheers!
On Fri, Mar 13, 2009 at 10:10 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:
>
> Amin Mohammed-Coleman wrote:
>
> I think that would be good.
>>
>
> I'll open an issue.
>
> Probably a silly thing to ask
Well, it's not yet committed.
You can use it now by pulling the patch attached to the issue &
testing it yourself. If you do so, please report back! This is how
Lucene improves.
I'm hoping we can include it in 2.9...
Mike
On Mar 13, 2009, at 6:35 AM, Amin Mohammed-Coleman wrote:
Swee
Absolutely! I have received considerable help from the community and there
are so many more stuff I want to ask!
Cheers!
Amin
On Fri, Mar 13, 2009 at 10:41 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:
>
> Well, it's not yet committed.
>
> You can use it now by pulling the patch a
Ok. I tried to apply the patch(s) and completely messed it up (user
error). Is there a full example of the highlighter that is available that I
can apply and test?
Cheers
Amin
On Fri, Mar 13, 2009 at 12:09 PM, Amin Mohammed-Coleman wrote:
> Absolutely! I have received considerable help from
27 matches
Mail list logo