DIH doucments not indexed because of loss in xsl transformation.

jerome . dupont Tue, 10 Dec 2013 06:29:10 -0800

Hello

I'm indexing xml files with xpathEntityProcessor, and for some hundreads
documents on 12 millions are not processed.


When I tried to index only one of the KO documents it doesn't either index.
So it's not a matter of big number of documents.

We tried to do the xslt transformation externaly, to catch the xml
transformed and to index it in SOLR, it worked.
So the doc seems OK.
I looked on the doc, it was big, so I commented a part, it has been indexed
in solr with xsl transform.


So I downloaded the dih code and I debugged the execution of these lines,
which launch the xsl transformation, to see what was happening exactly

          SimpleCharArrayReader caw = new SimpleCharArrayReader();
          xslTransformer.transform(new StreamSource(data),
                  new StreamResult(caw));
          data = caw.getReader();

It appeared that the caw missed data, so the xsltTransformer didn't work
correctly.
Digging further in TransformerImpl code, I see  the content of my xml file
in some buffer but  somewhere something goes wrong, that I don't understand
( it's getting very tricky for me).

xslTransformer is from class
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl

Is there a mean to change the xslt transformer class,
or is there a known limitation of size in this xmltransformer, which can be
increased?

I've work in solr 4.2 and then in solr 4.6.

Thank in advance

Regards
Jérôme Dupont
Bibliothèque Nationale de France
Département des Systèmes d'Information
Tour T3 - Quai François Mauriac
75706 Paris Cedex 13
téléphone: 33 (0)1 53 79 45 40
e-mail: jerome.dup...@bnf.fr
-----------------------------------------------

Exposition  Astérix à la BnF !  - du 16 octobre 2013 au 19 janvier 2014 - BnF - 
François-Mitterrand / Grande Galerie Avant d'imprimer, pensez à 
l'environnement.

DIH doucments not indexed because of loss in xsl transformation.

Reply via email to