Perhaps the bin/post tool? See: https://lucidworks.com/2015/08/04/solr-5-new-binpost-utility/
On Wed, Dec 6, 2017 at 2:05 PM, Matthew Roth <mgrot...@gmail.com> wrote: > Hi All, > > Is there a DIH for HDFS? I see this old feature request [0 > <https://issues.apache.org/jira/browse/SOLR-2096>] that never seems to have > gone anywhere. Google searches and searches on this list don't get me to > far. > > Essentially my workflow is that I have many thousands of XML documents > stored in hdfs. I run an xslt transformation in spark [1 > <https://github.com/elsevierlabs-os/spark-xml-utils>]. This transforms to > the expected solr input of <add><doc><field ... /></doc></add>. This is > than written the back to hdfs. Now how do I get it back to solr? I suppose > I could move the data back to the local fs, but on the surface that feels > like the wrong way. > > I don't need to store the documents in HDFS after the spark transformation, > I wonder if I can write them using solrj. However, I am not really familiar > with solrj. I am also running a single node. Most of the material I have > read on spark-solr expects you to be running SolrCloud. > > Best, > Matt > > > > [0] https://issues.apache.org/jira/browse/SOLR-2096 > [1] https://github.com/elsevierlabs-os/spark-xml-utils