Upayavira, ever did this? Ha, look at my email from 20 days ago and this: https://github.com/javanna/elasticshell
Otis -- Solr & ElasticSearch Support http://sematext.com/ On Wed, Feb 6, 2013 at 2:38 PM, Otis Gospodnetic <otis.gospodne...@gmail.com > wrote: > Btw wouldn't this be a chance to create a solr cli tool, much like > es2unix? Maybe with a shell? I'm off-line now, but I recently came across > a java lib that makes this easy... jclam jsomething ... > > Otis > Solr & ElasticSearch Support > http://sematext.com/ > On Feb 6, 2013 8:48 AM, "Jan Høydahl" <jan....@cominvent.com> wrote: > >> With dependencies I meant external jar dependencies. Perhaps extensions >> could have deps while leaving the "core" compilable without? >> >> -- >> Jan Høydahl, search solution architect >> Cominvent AS - www.cominvent.com >> Solr Training - www.solrtraining.com >> >> 5. feb. 2013 kl. 17:10 skrev Upayavira <u...@odoko.co.uk>: >> >> > By dependencies, do you mean other java classes? I was thinking of >> > splitting it out into a few classes, each of which is clearer in its >> > purpose. >> > >> > Upayavira >> > >> > On Tue, Feb 5, 2013, at 02:26 PM, Jan Høydahl wrote: >> >> Wiki page exists already: http://wiki.apache.org/solr/post.jar >> >> >> >> I'm happy to consider a refactoring, especially if it make it SIMPLER >> to >> >> read and interact with and doesn't add a ton of mandatory dependencies. >> >> It should probably still be possible to say something like >> >> >> >> javac org/apache/solr/util/SimplePostTool.java >> >> java -cp . org.apache.solr.util.SimplePostTool -h >> >> >> >> That's just how I've been thinking so far though. If other committers >> are >> >> happy with abandoning the simple-ness and instead create a >> best-practices >> >> based feature-rich tool with dependencies, then I'll not object. >> >> >> >> -- >> >> Jan Høydahl, search solution architect >> >> Cominvent AS - www.cominvent.com >> >> Solr Training - www.solrtraining.com >> >> >> >> 5. feb. 2013 kl. 05:22 skrev Upayavira <u...@odoko.co.uk>: >> >> >> >>> Thx Jan, >> >>> >> >>> All I know is I've got a data set of 500k documents, Solr formatted, >> and >> >>> I want it to be as easy as possible to get them into Solr. I also want >> >>> to be able to show the benefit of multithreading. The outcome would >> >>> really be "make sure your code uses multiple threads to push to Solr" >> >>> rather than "use post.jar in production". I see post.jar as a >> >>> demonstration tool, rather than anything else, and am considering >> adding >> >>> another feature to enhance that. >> >>> >> >>> However, I did stall once I started looking at the SimplePostTool.jar >> >>> class, because it is loosing its connection with the term 'Simple'. >> >>> Adding multithreading, however useful, correct, whatever, would >> >>> completely push it over the edge. Thus, I think the proper approach is >> >>> to refactor the tool into a number of classes, and only then think >> about >> >>> adding multithreading as a completely separate affair. I'm more than >> >>> happy to have a go at that refactoring, especially if you're prepared >> to >> >>> review it. >> >>> >> >>> I guess the other thing that is much needed is a wiki page that >> details >> >>> the features of the tool, and also explains that its role is >> >>> educational, rather than anything else. >> >>> >> >>> Upayavira >> >>> >> >>> On Mon, Feb 4, 2013, at 09:10 PM, Jan Høydahl wrote: >> >>>> Hi, >> >>>> >> >>>> Hmm, the tool is getting bloated for a one-class no-deps tool >> already :) >> >>>> Guess it would be useful too with real-life code examples using >> SolrJ and >> >>>> other libs as well (such as robots.txt lib, commons-cli etc), but >> whether >> >>>> that should be an extension of SimplePostTool or a totally new tool >> from >> >>>> scratch is something to discuss. Please bring on your ideas of how >> you >> >>>> plan to extend it, perhaps even simplifying the code in the process? >> >>>> >> >>>> -- >> >>>> Jan Høydahl, search solution architect >> >>>> Cominvent AS - www.cominvent.com >> >>>> Solr Training - www.solrtraining.com >> >>>> >> >>>> 3. feb. 2013 kl. 17:19 skrev Upayavira <u...@odoko.co.uk>: >> >>>> >> >>>>> I have a scenario in which I need to post 500,000 documents to Solr >> as a >> >>>>> test. I have these documents in XML files already formatted in >> Solr's >> >>>>> xml format. >> >>>>> >> >>>>> Posting to Solr using post.jar it takes 1m55s. With a bit of bash >> >>>>> jiggery-pokery, I was able to get this down to 1m08s by running four >> >>>>> concurrent post.jar instances, which strikes me as a significant >> >>>>> improvement. >> >>>>> >> >>>>> I'm considering adding multithreaded capabilities to post.jar, but >> >>>>> before I go to that effort, I wanted to see if anyone else would >> >>>>> consider it a useful feature. Given that the SimplePostTool is >> becoming >> >>>>> far from simple, I wanted to see whether the feature is likely to be >> >>>>> accepted before I put in the effort. Also, I would need to consider >> >>>>> which parts of the tool to add that to. Currently I only want it for >> >>>>> posting XML docs, but there's also crawling capabilities in it too. >> >>>>> >> >>>>> Thoughts? >> >>>>> >> >>>>> Upayavira >> >>>> >> >> >> >>