Perhaps the bin/post tool? See:
https://lucidworks.com/2015/08/04/solr-5-new-binpost-utility/

On Wed, Dec 6, 2017 at 2:05 PM, Matthew Roth <mgrot...@gmail.com> wrote:
> Hi All,
>
> Is there a DIH for HDFS? I see this old feature request [0
> <https://issues.apache.org/jira/browse/SOLR-2096>] that never seems to have
> gone anywhere. Google searches and searches on this list don't get me to
> far.
>
> Essentially my workflow is that I have many thousands of XML documents
> stored in hdfs. I run an xslt transformation in spark [1
> <https://github.com/elsevierlabs-os/spark-xml-utils>]. This transforms to
> the expected solr input of <add><doc><field ... /></doc></add>. This is
> than written the back to hdfs. Now how do I get it back to solr? I suppose
> I could move the data back to the local fs, but on the surface that feels
> like the wrong way.
>
> I don't need to store the documents in HDFS after the spark transformation,
> I wonder if I can write them using solrj. However, I am not really familiar
> with solrj. I am also running a single node. Most of the material I have
> read on spark-solr expects you to be running SolrCloud.
>
> Best,
> Matt
>
>
>
> [0] https://issues.apache.org/jira/browse/SOLR-2096
> [1] https://github.com/elsevierlabs-os/spark-xml-utils

Reply via email to