RE: keyword-in-content for PDF document

Allison, Timothy B. Thu, 13 Apr 2017 09:17:05 -0700

If you don't care about sentence boundaries, but just want a window around 
target terms and you want concordance functionality (sort before, after, etc), 
you might check out LUCENE-5317, which is available as a standalone jar on my 
github site [1] and is available through maven central.


Using a highlighter, too, will get you close.

See a crummy image of LUCENE-5317 [2] or the full presentation [3]

[1] https://github.com/tballison/lucene-addons/tree/6.5-0.1
[2] https://twitter.com/_tallison/status/852492398793981952
[3] 
https://github.com/tballison/share/blob/master/slides/TextProcessingAndAdvancedSearch_tallison_MITRE_201510_final_abbrev.pdf
 slide 23ff.


-----Original Message-----
From: ankur [mailto:ankur.sancheti.netw...@gmail.com] 
Sent: Thursday, April 13, 2017 12:08 PM
To: solr-user@lucene.apache.org
Subject: Re: keyword-in-content for PDF document

Thanks Alex. Yes, I am using TIKA. So, to some extent it preserves the text
flow.

There is something interesting in your reply, "Or you could try using
highlighter to return only 
the sentence. ".

I didnt understand that bit. How do we use Highlighter to return the
sentence?

To make sure, I want to return all sentences where the word "Growth"
appears. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/keyword-in-context-for-PDF-document-tp4329754p4329794.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: keyword-in-content for PDF document

Reply via email to