Have you tried Solr Cell? http://wiki.apache.org/solr/ExtractingRequestHandler
On Mar 13, 2009, at 2:49 AM, CIF Search wrote:
But these documents have to be converted to a particular format
before being
posted. Any XML document cannot be posted to Solr (with XSLT handled
by Solr
internally).
DIH handles any xml format, but it operates in pull mode.
On Fri, Mar 13, 2009 at 11:45 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:
On Fri, Mar 13, 2009 at 11:36 AM, CIF Search <cifsea...@gmail.com>
wrote:
There is a fundamental problem with using 'pull' approach using DIH.
Normally people want a delta imports which are done using a
timestamp
field.
Now it may not always be possible for application servers to sync
their
timestamps (given protocol restrictions due to security reasons).
Due to
this Solr application is likely to miss a few records
occasionally. Such
a
problem does not arise if applications themseleves identify their
records
and post. Should we not have such a feature in Solr, which will
allow
users
to push data onto the index in whichever format they wish to? This
will
also
facilitate plugging in solr seamlessly with all kinds of
applications.
You can of course push your documents to Solr using the XML/CSV
update (or
using the solrj client). It's just that you can't push documents
with DIH.
http://wiki.apache.org/solr/#head-98c3ee61c5fc837b09e3dfe3fb420491c9071be3
--
Regards,
Shalin Shekhar Mangar.
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search