Re: Indexing HTML

Michael Kimsal Mon, 27 Aug 2007 07:00:49 -0700

What's odd about this is that the error seems to indicate that I did.

The full text (minus the stack trace) was


org.xmlpull.v1.XmlPullParserException: parser must be on START_TAG or TEXT
to read text (position: START_TAG seen ...&lt;field name="line"&gt;&lt;a
href="foobar"&gt;... @4:37)

Or is that just a byproduct of how SOLR reports the errors back - always
escaping them?

Thanks guys - I'll have another crack at this tonight.


On 8/27/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:
>
> Michael,
>
> I think the issue is that you're not escaping the <field> values.
> Send something like this to Solr instead:
>
>   <field name="line">&lt;a
> href="foobar"&gt;&lt;b&gt;&lt;i&gt;linktext&lt;/i&gt;&lt;/b&gt;&lt;/
> a&gt;</field>
>
>         Erik
>
>
> On Aug 27, 2007, at 9:29 AM, Michael Kimsal wrote:
>
> > Hello
> >
> > I'm trying to index individual lines of an HTML file, and I'm
> > hitting this
> > error:
> >
> > TEXT must be immediately followed by END_TAG and not START_TAG
> >
> > I've got something that looks like
> >
> > <add>
> > <doc>
> > <field name="id">4</field>
> > <field name="line"><a href="foobar"><b><i>linktext</i></b></a></field>
> > </doc>
> > </add>
> >
> > Actually, that sample code above, as its own data file POSTed to SOLR,
> > throws
> >
> > parser must be on START_TAG or TEXT to read text (position:
> > START_TAG seen
> > ...&lt;field name="line"&gt;&lt;a href="foobar"&gt;... @4:37
> >
> > as an error.
> >
> > Any clues as to how I can do this?  I'd like to keep the original
> > copy of
> > each line intact in the index.
> >
> > Thanks!
> >
> > --
> > Michael Kimsal
> > http://webdevradio.com
>
>


-- 
Michael Kimsal
http://webdevradio.com

Re: Indexing HTML

Reply via email to