On Tue, 2007-09-25 at 12:06 +0100, Jérôme Etévé wrote:
> If I understand, you want to keep the raw html code in solr like that
> (in your posting xml file):
> 
> <field name="storyFullText">
>   <html></html>
> </field>
> 
> I think you should encode your content to protect these xml entities:
> <  ->  &lt;
> > -> &gt;
> " -> &quot;
> & -> &amp;
> 
> If you use perl, have a look at HTML::Entities.

AFAIR you cannot use tags, they always are getting transformed to
entities. The solution is to have a xsl transformation after the
response that transforms the entities back to tags.

Have a look at the thread 
http://marc.info/?t=116775837900001&r=1&w=2
and especially at
http://marc.info/?l=solr-user&m=116782664828926&w=2

HTH

salu2

> 
> 
> On 9/25/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> > Hello,
> >
> > I've got some problem with html code who is embedded in xml file:
> >
> > Sample source .
> >
> > <content>
> >         <stories>
> >                 <div class="storyTitle">
> >                          Les débats
> >                 </div>
> >                 <div class="storyIntroductionText">
> >                         Le premier tour des élections fédérales se 
> > déroulera le 21
> > octobre prochain. D'ici là, La 1ère vous propose plusieurs rendez-
> > vous, dont plusieurs grands débats à l'enseigne de Forums.
> >                 </div>
> >                 <div class="paragraph">
> >                         <div class="paragraphTitle"/>
> >                         <div class="paragraphText">
> >                                 my para textehere
> >                                 <br/>
> >                                 <br/>
> >                                 Vous trouverez sur cette page toutes les 
> > dates et les heures de
> > ces différents rendez-vous ainsi que le nom et les partis des
> > débatteurs. De plus, vous pourrez également écouter ou réécouter
> > l'ensemble de ces émissions.
> >                         </div>
> >                 </div>
> > ....
> > ---------
> > When a make a query on solr I've got something like that in the
> > source code of the xml result:
> >
> > <td xmlns="http://www.w3.org/1999/xhtml";>
> > <span class="markup">&lt;</span>
> > <span class="start-tag">div</span>
> > <span class="attribute-name">class</span>
> > <span class="markup">=</span>
> > <span class="attribute-value">"paragraph"</span>
> > <span class="markup">&gt;</span><div class="expander-content">
> > <div class="indent"><span class="markup">&lt;</span>
> > <span class="start-tag">div</span>
> > <span class="attribute-name">class</span>
> > <span class="markup">=</span>
> > <span class="attribute-value">"paragraphTitle"</span>
> > <span class="markup">/&gt;</span></div><table><tr>
> > <td class="expander">−<div class="spacer"/>
> > </td><td><span class="markup">&lt;</span>
> > ...
> >
> > It is not exactly what I want. I want to keep the html tags, that all
> > without formatting.
> >
> > So the br tags and a tags are well formed in xml and json result, but
> > the div tags are not kept.
> > ---------
> > In the schema.xml I've got this for the html content
> >
> > <fieldType name="html" class="solr.TextField" />
> >
> >   <field name="storyFullText" type="html" indexed="true"
> > stored="true" multiValued="true"/>
> >
> > ---------
> >
> > Any help would be appreciate.
> >
> > Thanks in advance.
> >
> > S. Christin
> >
> >
> >
> >
> >
> >
> 
> 
-- 
Thorsten Scherler                                 thorsten.at.apache.org
Open Source Java                      consulting, training and solutions

Reply via email to