aps a default).
From: david.w.smi...@gmail.com At: 01/11/17 13:19:37
To: Timothy Rodriguez (BLOOMBERG/ 120 PARK), dawid.we...@gmail.com
Cc: dev@lucene.apache.org
Subject: Re: UnifiedHighlighter and extraction of exact hit offset ranges
If the generics could be contained _instead of_ spreading to t
> I'm guessing what you're seeing is from browsing the 6.3 code. The
> extensibility has been improved and committed for 6.4; see CHANGES.txt and
> LUCENE-7559 which did it. In particular, all Passage methods are now
> public.
Ah, surely I am!... Sorry about that -- I've been updating/ modifying
If the generics could be contained _instead of_ spreading to the UH class
itself (making UH typed), I think it could be nice. But given the
per-field possible settings for formatting... that in particular makes
balancing these concerns hard. I guess in the end Object isn't too bad
since it's limi
Dawid,
I'm guessing what you're seeing is from browsing the 6.3 code. The
extensibility has been improved and committed for 6.4; see CHANGES.txt and
LUCENE-7559 which did it. In particular, all Passage methods are now
public.
I agree that OffsetsEnum methods should be public so that someone coul
While we were open sourcing it. I had tried creating a patch to generify it,
but the generics did wind up all over the place. Ultimately the
UnifiedHighlighter would need to be generic itself so it can ensure the passage
formatters etc are of the same type. (Or alternatively, generic passage
fo
Thanks David!
That's almost exactly what I ended up doing. I don't mind casting
Object to my own type; you can always make it a covariant override in
your subclass (which you have to do to access those expert-level
methods anyway).
I still kind of think startOffset/endOffset and other related met
Hi Dawid,
You could write a trivial PassageFormatter that simply returns the Passage
list instead of doing formatting. Passages contain offsets. And yes,
WholeBreakIterator if you don't need passage fragmentation. Unless I'm
missing some aspect of your requirements, this doesn't involve any inter
To follow-up: I hacked into the offsets by passing WholeBreakIterator
and a custom PassageFormatter that just returns the matches from the
singleton resulting passage. This is suboptimal though, as there's
still some complex logic going on in highlightOffsetsEnums that could
be avoided.
Dawid
On
Can any of the folks who contributed to UnifiedHighlighter (David?)
clarify my thinking here?
I have a requirement to extract (for a set of search results) a list
of exact "hit" ranges (field offsets, with support for multi-term
queries and span queries). Obviously, I'm only talking about queries