Re: Indexing nutch crawled data in “Bluemix” solr

2016-06-16 Thread shakiba davari
Thanks so much Lewis. It really helped me. at least now I know that there is a way to make it work. I did used the command as you said: bin/nutch index -D solr.server.url=" https://gateway.watsonplatform.net/retrieve-and-rank/api/v1/solr_clusters/CLUSTER-ID/solr/admin/collections -D

RE: [E] Re: Newbie Question, hadoop error?

2016-06-16 Thread Jamal, Sarfaraz
Hi Lewis, Here is an update (I spoke to one of our java guys) - $ set classpath = C:\\apache-nutch-1.11\\lib $ $classpath /cygdrive/c/apache-nutch-1.11/lib $ ../bin/crawl -i urls/ TestCrawl 2 Injecting seed URLs /cygdrive/c/apache-nutch-1.11/bin/nutch inject TestCrawl crawldb urls/ Exception

Number of crawled links from seed page

2016-06-16 Thread Jigal van Hemert | alterNET internet BV
Hi, On the seed page there are a few hundred links (approx. 400) in a large list of items that must be indexed. I already made sure that the number of inbound and outbound links in the settings are large enough (1 and 5000), but unfortunately only the first 182 links are fetched for crawling

Re: [VOTE] Release Apache Nutch 1.12

2016-06-16 Thread Mattmann, Chris A (3980)
+1 from me, great job Lewis and team! SIGS pass, CHECKSUMS pass: LMC-053601:apache-nutch-1.12-rc1 mattmann$ $HOME/bin/stage_apache_rc apache-nutch 1.12-bin https://dist.apache.org/repos/dist/dev/nutch/1.12/ % Total% Received % Xferd Average Speed TimeTime Time Current