Re: UnifiedHighlighter and extraction of exact hit offset ranges

2017-01-12 Thread Timothy Rodriguez (BLOOMBERG/ 120 PARK)
aps a default). From: david.w.smi...@gmail.com At: 01/11/17 13:19:37 To: Timothy Rodriguez (BLOOMBERG/ 120 PARK), dawid.we...@gmail.com Cc: dev@lucene.apache.org Subject: Re: UnifiedHighlighter and extraction of exact hit offset ranges If the generics could be contained _instead of_ spreading to t

Re: UnifiedHighlighter and extraction of exact hit offset ranges

2017-01-11 Thread Dawid Weiss
> I'm guessing what you're seeing is from browsing the 6.3 code. The > extensibility has been improved and committed for 6.4; see CHANGES.txt and > LUCENE-7559 which did it. In particular, all Passage methods are now > public. Ah, surely I am!... Sorry about that -- I've been updating/ modifying

Re: UnifiedHighlighter and extraction of exact hit offset ranges

2017-01-11 Thread David Smiley
If the generics could be contained _instead of_ spreading to the UH class itself (making UH typed), I think it could be nice. But given the per-field possible settings for formatting... that in particular makes balancing these concerns hard. I guess in the end Object isn't too bad since it's limi

Re: UnifiedHighlighter and extraction of exact hit offset ranges

2017-01-11 Thread David Smiley
Dawid, I'm guessing what you're seeing is from browsing the 6.3 code. The extensibility has been improved and committed for 6.4; see CHANGES.txt and LUCENE-7559 which did it. In particular, all Passage methods are now public. I agree that OffsetsEnum methods should be public so that someone coul

Re: UnifiedHighlighter and extraction of exact hit offset ranges

2017-01-11 Thread Timothy Rodriguez (BLOOMBERG/ 120 PARK)
While we were open sourcing it. I had tried creating a patch to generify it, but the generics did wind up all over the place. Ultimately the UnifiedHighlighter would need to be generic itself so it can ensure the passage formatters etc are of the same type. (Or alternatively, generic passage fo

Re: UnifiedHighlighter and extraction of exact hit offset ranges

2017-01-11 Thread Dawid Weiss
Thanks David! That's almost exactly what I ended up doing. I don't mind casting Object to my own type; you can always make it a covariant override in your subclass (which you have to do to access those expert-level methods anyway). I still kind of think startOffset/endOffset and other related met

Re: UnifiedHighlighter and extraction of exact hit offset ranges

2017-01-11 Thread David Smiley
Hi Dawid, You could write a trivial PassageFormatter that simply returns the Passage list instead of doing formatting. Passages contain offsets. And yes, WholeBreakIterator if you don't need passage fragmentation. Unless I'm missing some aspect of your requirements, this doesn't involve any inter

Re: UnifiedHighlighter and extraction of exact hit offset ranges

2017-01-11 Thread Dawid Weiss
To follow-up: I hacked into the offsets by passing WholeBreakIterator and a custom PassageFormatter that just returns the matches from the singleton resulting passage. This is suboptimal though, as there's still some complex logic going on in highlightOffsetsEnums that could be avoided. Dawid On

UnifiedHighlighter and extraction of exact hit offset ranges

2017-01-11 Thread Dawid Weiss
Can any of the folks who contributed to UnifiedHighlighter (David?) clarify my thinking here? I have a requirement to extract (for a set of search results) a list of exact "hit" ranges (field offsets, with support for multi-term queries and span queries). Obviously, I'm only talking about queries