There is no option in the Strip HTML filter to discard whitespace between elements. And it certainly doesn't know the semantics of some XML schema for "choice". You'll have to pre-process that semantics before Solr ingestion, or do your own custom filter.

-- Jack Krupansky

-----Original Message----- From: Tigunn
Sent: Thursday, May 31, 2012 11:30 AM
To: solr-user@lucene.apache.org
Subject: Strip html

Hello,
I have an index full text on xml files.
Exemple:
---------------------------------------
<item type="fragment" n="3">
                           <cit dbp:hand="GF-encre">

si les <hi rend="underline">ruches d’<term>abeilles</term>
                                    </hi> prouvent la
                  monarchie, les fourmillières, les troupes d’éléphants ou
de <lb/>
                                    <choice>
                                        <orig>C</orig>
                                        <reg>c</reg>
                                    </choice>astors prouvent la
république.
                               <bibl xml:id="b-7468-3"/>
                           </cit>
                       </item>
---------------------------------------
I use solr 1.4.1 to make full text search with php. When i search "castor",
i can't fund this one. But if i search "c astor" it's ok: problem !!!!

I make a transformation XSLT which return :
---------------------------------------
si les ruches d’abeilles prouvent la
                 monarchie, les fourmillières, les troupes d’éléphants ou
de castors prouvent la république.
---------------------------------------
i put this html in solr:  $doc->addField('body_strip_html', $body_norm);

In schema.xml:
<fieldType name="text_strip_html" class="solr.TextField"
positionIncrementGap="100">
       <analyzer>
               <charFilter class="solr.HTMLStripCharFilterFactory"/>
               <tokenizer class="solr.StandardTokenizerFactory"/>
       </analyzer>
   </fieldType>

AND

  <field name="body_strip_html" type="text_strip_html" indexed="true"
stored="true"/>


But this don't work!
I want to return this xml files (look exemple) if i search "castor".

Can you help me, please?
thanks.


--
View this message in context: http://lucene.472066.n3.nabble.com/Strip-html-tp3987051.html Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to