Have you tried Solr Cell?  http://wiki.apache.org/solr/ExtractingRequestHandler



On Mar 13, 2009, at 2:49 AM, CIF Search wrote:

But these documents have to be converted to a particular format before being posted. Any XML document cannot be posted to Solr (with XSLT handled by Solr
internally).
DIH handles any xml format, but it operates in pull mode.


On Fri, Mar 13, 2009 at 11:45 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

On Fri, Mar 13, 2009 at 11:36 AM, CIF Search <cifsea...@gmail.com> wrote:

There is a fundamental problem with using 'pull' approach using DIH.
Normally people want a delta imports which are done using a timestamp
field.
Now it may not always be possible for application servers to sync their timestamps (given protocol restrictions due to security reasons). Due to this Solr application is likely to miss a few records occasionally. Such
a
problem does not arise if applications themseleves identify their records and post. Should we not have such a feature in Solr, which will allow
users
to push data onto the index in whichever format they wish to? This will
also
facilitate plugging in solr seamlessly with all kinds of applications.


You can of course push your documents to Solr using the XML/CSV update (or using the solrj client). It's just that you can't push documents with DIH.

http://wiki.apache.org/solr/#head-98c3ee61c5fc837b09e3dfe3fb420491c9071be3

--
Regards,
Shalin Shekhar Mangar.


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search

Reply via email to