Hi Lance,

Yes, that is once solution but wouldn't it stop people searching for something like "<choice" in the first place? I mean, if I encode such characters at the index time, one would have to write a query like "&lt;choice". Am I right?

Thanks,
Niraj

Lance Norskog wrote:
To display html-markup in an html page, it has to be in entity-encoded
form. So, encode the <> as entities in your input application, and
have it indexed and stored in this format. Then, the <b><u> are
inserted as normal. This gives you the html text displayable in an
html page, with all words highlightable. And add gt/lt etc. as
stopwords.

At this point you have the element names, attribute names and values,
and text parts searchable and highlightable. If you only want the HTML
syntax parts shown, the PatternReplaceFilter is your friend: with
regex patterns you can pull out those values and ignore the text
parts.

The analysis.jsp page will make it much much easier to debug this.

Good luck!

On Thu, Mar 25, 2010 at 8:21 AM, Niraj Aswani <n.asw...@dcs.shef.ac.uk> wrote:
Hi,

I am using the following two parameters to highlight the hits.

"hl.simple.pre=" + URLEncoder.encode("<b><u>")
"hl.simple.post=" + URLEncoder.encode("</u></b>")

This seems to work.  However, there is a bit of trouble when the text itself
contains html markup.

For example, I have indexed a document with the following text in it.
=======
something here...
<choice minOccurs="1" maxOccurs="unbounded">xyz</choice>
something here..
=======

When I search for the keyword choice, what it does is, it inserts "<b><u>"
just before the word choice and "</u></b>" immediately after the word
choice. It results into something like below:

<<b><u>choice</b></u> minOccurs="1"
maxOccurs="unbounded">xyz</<b><u>choice</u></b>>


I would like it to be something like:

&lt;<b><u>choice</b></u> minOccurs="1"
maxOccurs="unbounded"&gt;xyz/<b><u>choice</u></b>&gt;

Is there any way to do it such that the highlight content is encoded as HTML
but the prefix and suffix are not?

Thanks,
Niraj



When I issue a query, it returns all the corret





Reply via email to