Are you going to use the values stored on Solr to display the data in HTML? For searching purposes I suggest to delete all the HTML tags, and store the plain text, for this you could use the HTMLStripCharFilterFactory char filter, this will "clean" your content and only pass the actual text which is in the end what you're going to use.
If you are going to use the solr result to display the content in an HTML page then I would suggest to keep your index clean and index only the actual searchable text no HTML, I actually use the recommended filter to strip HTML out of crawled HTML pages. Although what a Solr document means to you? An entire conversation is modeled 1 Solr document? have you considered separating each conversation interaction on a document? ----- Original Message ----- From: "tomas.kalas" <kala...@email.cz> To: solr-user@lucene.apache.org Sent: Thursday, October 30, 2014 10:27:50 AM Subject: Design optimal Solr Schema Hello i have problem with design of schema in Solr. I have a transcript of a telephone conversation in this format. I parse it at individual fields. I have this schema: <?xml version="1.0"?> <add> <doc> <field name="id">01.cn</field> <field name="t">0<br /> 1<br /> 2<br /> 2 <br /> 3 <br /> ....</field> <field name="st">0.00<br /> 1.54<br /> 1.54<br /> 1.54 <br /> 1.57 <br /> ....</field> <field name="et">1.54<br /> 1.54<br /> 1.57<br /> 1.57 <br /> 1.7 <br /> ....</field> <field name="w">_SILENCE_<br /> <s><br /> HELLO<br /> HALLO <br /> _DELETE_ <br /> ....</field> <field name="p">0.000000<br /> 1<br /> 1<br /> 2.06115e-009 <br /> 1 <br /> ....</field> <field name="c">0<br /> 0<br /> 0<br /> 0 <br /> 0 <br /> ....</field> </doc> </add> I displayed it in html document, and therefore i used the <br />. This is a original document: T=0 ST=0.00 ET=1.54 W=_SILENCE_ P=0.000000 C=0 T=1 ST=1.54 ET=1.54 W=<s> P=1 C=0 T=2 ST=1.54 ET=1.57 W=HELLO P=1 C=0 T=2 ST=1.54 ET=1.57 W=HALLO P=2.06115e-009 C=0 T=3 ST=1.57 ET=1.70 W=_DELETE_ P=1 C=0 T=3 ST=1.57 ET=1.70 W=NO P=2.06115e-009 C=0 T=4 ST=1.70 ET=2.12 W=HOW P=1 C=0 T=5 ST=2.12 ET=2.18 W=ARE_ P=0.25 C=0 T=5 ST=2.12 ET=2.18 W=_DELETE_ P=0.25 C=0 .......................................... .......................................... Id - filename T = Segment ST = Start time ET = End time W = Word P = Probability C = Chanel I want to search for example word which is to time 1.57 (w:HeLLO) AND (t:[0 TO 1.57]). But if i have all data in one field (t, st,et ...) then it doesn't work. It find all files where is hello a further time than 1.57. Do you have any ideas how it make it? Thanks a lot for your help. -- View this message in context: http://lucene.472066.n3.nabble.com/Design-optimal-Solr-Schema-tp4166632.html Sent from the Solr - User mailing list archive at Nabble.com.