If I'm not mistaken, that's TEI, and I suggest you consult with the
TEI community for strategies for document indexing, as there are a lot
of branching-style tags in TEI. My guess is that you'll hear that it's
best to perform some sort of term expansion on the document as a
preprocessing step.

Michael Della Bitta

Appinions, Inc. -- Where Influence Isn’t a Game.

-----Original Message----- From: Tigunn
Sent: Thursday, May 31, 2012 11:30 AM
To: solr-user@lucene.apache.org
Subject: Strip html

I have an index full text on xml files.
<item type="fragment" n="3">
                          <cit dbp:hand="GF-encre">

si les <hi rend="underline">ruches d’<term>abeilles</term>
>                                    </hi> prouvent la
>                  monarchie, les fourmillières, les troupes d’éléphants ou
> de <lb/>
>                                    <choice>
>                                        <orig>C</orig>
>                                        <reg>c</reg>
>                                    </choice>astors prouvent la
> république.

                              <bibl xml:id="b-7468-3"/>
I use solr 1.4.1 to make full text search with php. When i search "castor",
i can't fund this one. But if i search "c astor" it's ok: problem !!!!

I make a transformation XSLT which return :
si les ruches d’abeilles prouvent la
                monarchie, les fourmillières, les troupes d’éléphants ou
de castors prouvent la république.
i put this html in solr:  $doc->addField('body_strip_html', $body_norm);

In schema.xml:
<fieldType name="text_strip_html" class="solr.TextField"
              <charFilter class="solr.HTMLStripCharFilterFactory"/>
              <tokenizer class="solr.StandardTokenizerFactory"/>


 <field name="body_strip_html" type="text_strip_html" indexed="true"

But this don't work!
I want to return this xml files (look exemple) if i search "castor".

Can you help me, please?

View this message in context:
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to