How to get hits coordinates in Lucene 4.4.0

2013-08-12 Thread Lingviston
Hi, I'm trying to use Lucene in my Android project. To start with I've created a small demo app. It works with .txt files but I need to work with .pdf. So analyzing my code I understand that it will have some issues with .pdfs due to memory management. However the question I want to ask here is not

Re: How to get hits coordinates in Lucene 4.4.0

2013-08-12 Thread Michael McCandless
I think you're asking for what Lucene calls "offsets", i.e. the character indices into the original indexed text, telling you where each hit occurred. All highlighters use offsets to find the matches in the original indexed text. One option, which both Highlighter and FastVectorHighlighter use, i

Re: How to get hits coordinates in Lucene 4.4.0

2013-08-12 Thread Lingviston
Like I said I will work with pdf files. So I will draw highlights by myself over the rendered pdf file (as far as I know lucene can't work with pdf by default). Yes, offsets is what I'm looking for. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-hits-coordinate

Re: How to get hits coordinates in Lucene 4.4.0

2013-08-12 Thread Michael McCandless
OK. But, the offsets refer to the plain text after you filtered the PDF document, not e.g. to offset in the original PDF content. Mike McCandless http://blog.mikemccandless.com On Mon, Aug 12, 2013 at 9:58 AM, Lingviston wrote: > Like I said I will work with pdf files. So I will draw highlig

Re: How to get hits coordinates in Lucene 4.4.0

2013-08-12 Thread Lingviston
I think that's OK for me. I just need to know the right way to get them. Notice that queries must support boolean operators, *, ? and qoutes. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-hits-coordinates-in-Lucene-4-4-0-tp4083913p4084046.html Sent from the Luce

Re: How to get hits coordinates in Lucene 4.4.0

2013-08-13 Thread Michael McCandless
If you use PostingsHighlighter, then Passage.getMatchStarts/Ends gives you the offsets of each match. You'd need a custom PassageFormatter that takes these ints and saves them somewhere; or possibly the patch on LUCENE-4906 (allowing you to return custom objects, not just String) from your highlig

Re: How to get hits coordinates in Lucene 4.4.0

2013-08-13 Thread Lingviston
I'm currently using this snippet (with older Highlighter): HitPositionCollector collector = new HitPositionCollector(); highlighter = new Highlighter(collector, scorer); highlighter.setTextFragmenter(new SimpleSpanFragmenter(scorer,

Re: How to get hits coordinates in Lucene 4.4.0

2013-08-18 Thread Karl Wettin
On Aug 13, 2013, at 12:55 PM, Michael McCandless wrote: > I'm less familiar with the older highlighters but likely it's possible > to get the absolute offsets from them as well. Using vector highlighter I've achieved that by extending and cloning the code of ScoreOrderFragmentsBuilder#makeFrag

Re: How to get hits coordinates in Lucene 4.4.0

2013-08-19 Thread Jon Stewart
Iterating over term matches is a recent need for me, too (experimenting with ranking matches/passages independently, across documents). I'm using the new PostingsHighlighter and giving it my own PassageFormatter. This does no formatting, but does store away the offsets from each Passage. One big p

Re: How to get hits coordinates in Lucene 4.4.0

2013-08-19 Thread Michael McCandless
Hi Jon, Can you open an issue for this? We can explore how/whether to get the current docID to the formatter... Mike McCandless http://blog.mikemccandless.com On Mon, Aug 19, 2013 at 1:07 PM, Jon Stewart wrote: > Iterating over term matches is a recent need for me, too (experimenting > with

Re: How to get hits coordinates in Lucene 4.4.0

2013-08-19 Thread Jon Stewart
Done. https://issues.apache.org/jira/browse/LUCENE-5181 Jon On Mon, Aug 19, 2013 at 1:26 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > Hi Jon, > > Can you open an issue for this? We can explore how/whether to get the > current docID to the formatter... > > Mike McCandless > >

Re: How to get hits coordinates in Lucene 4.4.0

2013-09-06 Thread Darren Hoffman
Lingviston, Can you tell me what IDE and process you are using to build your APK file? I am having issues with loading the Lucene42Codec and I see the code you are using is just like mine. However, when I try to run the app, I get an exception stating that it can't find the codec. I am using Int