Thx Jan,

All I know is I've got a data set of 500k documents, Solr formatted, and
I want it to be as easy as possible to get them into Solr. I also want
to be able to show the benefit of multithreading. The outcome would
really be "make sure your code uses multiple threads to push to Solr"
rather than "use post.jar in production". I see post.jar as a
demonstration tool, rather than anything else, and am considering adding
another feature to enhance that.

However, I did stall once I started looking at the SimplePostTool.jar
class, because it is loosing its connection with the term 'Simple'.
Adding multithreading, however useful, correct, whatever, would
completely push it over the edge. Thus, I think the proper approach is
to refactor the tool into a number of classes, and only then think about
adding multithreading as a completely separate affair. I'm more than
happy to have a go at that refactoring, especially if you're prepared to
review it.

I guess the other thing that is much needed is a wiki page that details
the features of the tool, and also explains that its role is
educational, rather than anything else.

Upayavira

On Mon, Feb 4, 2013, at 09:10 PM, Jan Høydahl wrote:
> Hi,
> 
> Hmm, the tool is getting bloated for a one-class no-deps tool already :)
> Guess it would be useful too with real-life code examples using SolrJ and
> other libs as well (such as robots.txt lib, commons-cli etc), but whether
> that should be an extension of SimplePostTool or a totally new tool from
> scratch is something to discuss. Please bring on your ideas of how you
> plan to extend it, perhaps even simplifying the code in the process?
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
> 
> 3. feb. 2013 kl. 17:19 skrev Upayavira <u...@odoko.co.uk>:
> 
> > I have a scenario in which I need to post 500,000 documents to Solr as a
> > test. I have these documents in XML files already formatted in Solr's
> > xml format.
> > 
> > Posting to Solr using post.jar it takes 1m55s. With a bit of bash
> > jiggery-pokery, I was able to get this down to 1m08s by running four
> > concurrent post.jar instances, which strikes me as a significant
> > improvement.
> > 
> > I'm considering adding multithreaded capabilities to post.jar, but
> > before I go to that effort, I wanted to see if anyone else would
> > consider it a useful feature. Given that the SimplePostTool is becoming
> > far from simple, I wanted to see whether the feature is likely to be
> > accepted before I put in the effort. Also, I would need to consider
> > which parts of the tool to add that to. Currently I only want it for
> > posting XML docs, but there's also crawling capabilities in it too.
> > 
> > Thoughts?
> > 
> > Upayavira
> 

Reply via email to