Re: Preserving original HTML file offsets for highlighting

2011-01-26 Thread Karolina Bernat
-Original Message- > > From: Karolina Bernat [mailto:karolina.ber...@googlemail.com] > > Sent: Tuesday, January 25, 2011 1:45 PM > > To: java-user@lucene.apache.org > > Subject: Re: Preserving original HTML file offsets for highlighting > > > > Hi Uwe, > > >

RE: Preserving original HTML file offsets for highlighting

2011-01-25 Thread Uwe Schindler
remen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Karolina Bernat [mailto:karolina.ber...@googlemail.com] > Sent: Tuesday, January 25, 2011 1:45 PM > To: java-user@lucene.apache.org > Subject: Re: Preserving original HTML file offsets for

Re: Preserving original HTML file offsets for highlighting

2011-01-25 Thread Karolina Bernat
ng of HTML > files with Lucene. > > What I need to do is highlight the hits (terms) in the original HTML file > (or get the positions of the terms/tokens in the original file). > This problem has already been described by Fred Toth in this thread in 2005 > (Preserving origina

RE: Preserving original HTML file offsets for highlighting

2011-01-24 Thread Uwe Schindler
rg > Subject: Preserving original HTML file offsets for highlighting > > Hi all, > > I'm new to Lucene and have a question about indexing/highlighting of HTML > files with Lucene. > > What I need to do is highlight the hits (terms) in the original HTML file (or get > the p

Preserving original HTML file offsets for highlighting

2011-01-24 Thread Karolina Bernat
th in this thread in 2005 (Preserving original HTML file offsets for highlighting, need HTMLTokenizer?): http://mail-archives.apache.org/mod_mbox/lucene-java-user/200505.mbox/%3c6.2.1.2.2.20050530134630.063ae...@fast.synernet.com%3E I've searched the mailing list archives hoping for an answer,

Re: Preserving original HTML file offsets for highlighting, need HTMLTokenizer?

2005-06-03 Thread Doug Cutting
Fred Toth wrote: I'm thinking we need something like "HTMLTokenizer" which bridges the gap between StandardAnalyzer and an external HTML parser. Since so many of us are dealing with HTML, I would think this would be generally useful for many problems. It could work this way: Given this input: H

Preserving original HTML file offsets for highlighting, need HTMLTokenizer?

2005-05-30 Thread Fred Toth
Hi all, Those of you who have read and responded to my recent posts know that we are working on highlighting the entire document after a search. (Not fragments in a results list.) It appears that one of the key tools to assist with this is the ability of Lucene to store file offsets of terms as