Re: Nutch-Selenium in Nutch 1.10

2015-02-19 Thread Jaydeep Bagrecha
Update: selenium latest version 2.44.0 doesn’t seem to work with firefox latest version(35),so I installed firefox version 29 and it’s crawling properly now. > On Feb 18, 2015, at 2:56 PM, Jaydeep Bagrecha wrote: > > thanks Jiaxin! > > I again repeated the entire installation

Re: Nutch-Selenium in Nutch 1.10

2015-02-18 Thread Jaydeep Bagrecha
r.Fetcher$FetcherThread.run(Fetcher.java:722) -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQ

Re: Nutch-Selenium in Nutch 1.10

2015-02-17 Thread Jaydeep Bagrecha
Hi all, I am trying to install and build selenium with nutch1.10 on Mac Yosemite. having following error after downloading selenium patch(https://issues.apache.org/jira/browse/NUTCH-1933 ) and while using “ant runtime” command (as ment

Any suggestions for avoiding this fetch failure error?

2015-02-16 Thread Jaydeep Bagrecha
Hi all, I am using nutch 1.10(with apache ant,ivy,tika) for crawling a few repositories. Getting this error for majority of urls. "fetch of “url_name" failed with: java.net.ConnectException: Connection refused Could you suggest some ways to avoid this error?(have set th

How to know whether politeness policy is well set?

2015-02-15 Thread Jaydeep Bagrecha
t? Thanks, Jaydeep Bagrecha

Re: 572:Crawl statistics for each repository ?

2015-02-08 Thread Jaydeep Bagrecha
statistics for each one individually. > OR > Do we have to crawl each repo separately(include domain name of only 1 repo > in regex-urlfilter.txt)and get its statistics from corresponding crawldb? Thanks, Jaydeep Bagrecha > On Feb 8, 2015, at 6:24 PM, Mattmann, Chris A (398

572:Crawl statistics for each repository ?

2015-02-08 Thread Jaydeep Bagrecha
Is there a way to crawl all 3 repositories together and get statistics for each one individually? OR Do we have to crawl each repository separately and get its statistics from corresponding crawldb? Thanks, Jaydeep