You need to encode your html content so it can be include as a normal
'string' value in your xml element.

As far as remember, the only unsafe characters you have to encode as
entities are:
<  -> &lt;
> -> &gt;
" -> &quote;
& -> &amp;

(google xml entities to be sure).

I dont know what language you use , but for perl for instance, you can
use something like:
use HTML::Entities ;
my $xmlString = encode_entities($rawHTML  , '<>&"' );

Also you need to make sure your Html is encoded in UTF-8 . To comply
with solr need for UTF-8 encoded xml.

I hope it helps.

J.

On 8/22/07, Ravish Bhagdev <[EMAIL PROTECTED]> wrote:
> Hello,
>
> Sorry for stupid question.  I'm trying to index html file as one of
> the fields in Solr, I've setup appropriate analyzer in schema but I'm
> not sure how to add html content to Solr.  Encapsulating HTML content
> within field tag is obviously not valid.  How do I add html content?
> Hope the query is clear....
>
> Thanks,
> Ravi
>


-- 
Jerome Eteve.
[EMAIL PROTECTED]
http://jerome.eteve.free.fr/

Reply via email to