-Original Message-
> > From: Karolina Bernat [mailto:karolina.ber...@googlemail.com]
> > Sent: Tuesday, January 25, 2011 1:45 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Preserving original HTML file offsets for highlighting
> >
> > Hi Uwe,
> >
>
remen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Karolina Bernat [mailto:karolina.ber...@googlemail.com]
> Sent: Tuesday, January 25, 2011 1:45 PM
> To: java-user@lucene.apache.org
> Subject: Re: Preserving original HTML file offsets for
ng of HTML
> files with Lucene.
>
> What I need to do is highlight the hits (terms) in the original HTML file
> (or get the positions of the terms/tokens in the original file).
> This problem has already been described by Fred Toth in this thread in 2005
> (Preserving origina
rg
> Subject: Preserving original HTML file offsets for highlighting
>
> Hi all,
>
> I'm new to Lucene and have a question about indexing/highlighting of HTML
> files with Lucene.
>
> What I need to do is highlight the hits (terms) in the original HTML file
(or get
> the p
th in this thread in 2005
(Preserving original HTML file offsets for highlighting, need
HTMLTokenizer?):
http://mail-archives.apache.org/mod_mbox/lucene-java-user/200505.mbox/%3c6.2.1.2.2.20050530134630.063ae...@fast.synernet.com%3E
I've searched the mailing list archives hoping for an answer,
Fred Toth wrote:
I'm thinking we need something like "HTMLTokenizer" which bridges the
gap between StandardAnalyzer and an external HTML parser. Since so
many of us are dealing with HTML, I would think this would be generally
useful for many problems. It could work this way:
Given this input:
H
Hi all,
Those of you who have read and responded to my recent posts know
that we are working on highlighting the entire document after a search.
(Not fragments in a results list.)
It appears that one of the key tools to assist with this is the ability of
Lucene to store file offsets of terms as