RE: Large seed Inject Slow to Accumulo

Markus Jelsma Thu, 10 Mar 2016 02:23:30 -0800
Hello - i am not a Nutch 2.x user but a seed file that small should not take so 
long. There is nothing wrong with a seed file that large, even if it were a few 
million. Are your mappers/reducers slow? Is your HDFS slow? Or is it Accumulo? 
Markus
 
-----Original message-----
> From:Luis Magaña <l...@euphorica.com>
> Sent: Wednesday 9th March 2016 22:07
> To: user@nutch.apache.org
> Subject: Large seed Inject Slow to Accumulo
> 
> Hello,
> 
> I've setup a small sample hadoop cluster of 6 servers, hdfs, zookeeper,
> solr and accumulo.
> 
> I am running nutch on top of the hadoop cluster and injecting 10,000
> URLs in the seed.txt file.
> 
> Everything works as it should, nothing breaks, everything indexes, etc.,
> and the crawl jobs finishes OK. However, the inject stage of those
> 10,000 URLs takes up to 50 minutes.
> 
> I wonder if that is a normal time for an inject or if I should be
> looking at a possible problem (maybe the gora accumulo module?) or if I
> am simply being naive and my seed.txt should not be so large to begin with.
> 
> A bit more information about my setup:
> 
> Hadoop 2.7.2
> Accumulo 1.5.1
> Solr 4.10.3
> 
> Currently accumulo has about 500 tables with some 200 Million entries
> (not sure if that affects), Accumulo logs show no major errors or
> warnings or java exceptions either, neither do the mapreduce logs in hadoop.
> 
> Thank you very much for your help and your excellent crawler.
> 
> 
> -- 
> Luis Magaña
> www.euphorica.com
>
RE: Large seed Inject Slow to Accumulo

Reply via email to