Thanks so much Lewis. It really helped me. at least now I know that there
is a way to make it work.
I did used the command as you said:
bin/nutch index -D solr.server.url="
https://gateway.watsonplatform.net/retrieve-and-rank/api/v1/solr_clusters/CLUSTER-ID/solr/admin/collections
-D
Hi Lewis,
Here is an update
(I spoke to one of our java guys) -
$ set classpath = C:\\apache-nutch-1.11\\lib
$ $classpath
/cygdrive/c/apache-nutch-1.11/lib
$ ../bin/crawl -i urls/ TestCrawl 2
Injecting seed URLs
/cygdrive/c/apache-nutch-1.11/bin/nutch inject TestCrawl crawldb urls/
Exception
Hi,
On the seed page there are a few hundred links (approx. 400) in a large
list of items that must be indexed. I already made sure that the number of
inbound and outbound links in the settings are large enough (1 and
5000), but unfortunately only the first 182 links are fetched for crawling
+1 from me, great job Lewis and team!
SIGS pass, CHECKSUMS pass:
LMC-053601:apache-nutch-1.12-rc1 mattmann$ $HOME/bin/stage_apache_rc
apache-nutch 1.12-bin https://dist.apache.org/repos/dist/dev/nutch/1.12/
% Total% Received % Xferd Average Speed TimeTime Time Current
4 matches
Mail list logo