bin/nutch crawl urls -dir crawl -threads 10 -depth 100 -topN 50 -solrindex http://localhost:8983/solr
I've run that command before and it worked...that's why I asked. grab nutch from trunk and run bin/nutch and see that it is in fact an option. It looks like Hadoop is the culprit now and I am at a loss on how to fix it. Thanks for the feedback. Adam On Mon, Dec 20, 2010 at 4:21 PM, Anurag <anurag.it.jo...@gmail.com> wrote: > > why are using solrindex in the argument.? It is used when we need to index > the crawled data in Solr > For more read http://wiki.apache.org/nutch/NutchTutorial . > > Also for nutch-solr integration this is very useful blog > http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ > I integrated nutch and solr and it works well. > > Thanks > > On Tue, Dec 21, 2010 at 1:57 AM, Adam Estrada-2 [via Lucene] < > ml-node+2122347-622655030-146...@n3.nabble.com<ml-node%2b2122347-622655030-146...@n3.nabble.com> > <ml-node%2b2122347-622655030-146...@n3.nabble.com<ml-node%252b2122347-622655030-146...@n3.nabble.com> > > > > wrote: > > > All, > > > > I have a couple websites that I need to crawl and the following command > > line > > used to work I think. Solr is up and running and everything is fine there > > and I can go through and index the site but I really need the results > added > > > > to Solr after the crawl. Does anyone have any idea on how to make that > > happen or what I'm doing wrong? These errors are being thrown fro Hadoop > > which I am not using at all. > > > > $ bin/nutch crawl urls -dir crawl -threads 10 -depth 100 -topN 50 > > -solrindex > > ht > > tp://localhost:8983/solr > > crawl started in: crawl > > rootUrlDir = http://localhost:8983/solr > > threads = 10 > > depth = 100 > > indexer=lucene > > topN = 50 > > Injector: starting at 2010-12-20 15:23:25 > > Injector: crawlDb: crawl/crawldb > > Injector: urlDir: http://localhost:8983/solr > > Injector: Converting injected urls to crawl db entries. > > Exception in thread "main" java.io.IOException: No FileSystem for scheme: > > http > > at > > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375 > > ) > > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) > > at > org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) > > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196) > > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175) > > at > > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.j > > ava:169) > > at > > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.ja > > va:201) > > at > > org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) > > > > at > > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:7 > > 81) > > at > org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) > > > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249) > > at org.apache.nutch.crawl.Injector.inject(Injector.java:217) > > at org.apache.nutch.crawl.Crawl.main(Crawl.java:124) > > > > > > ------------------------------ > > View message @ > > > http://lucene.472066.n3.nabble.com/Nutch-and-Solr-integration-tp2122347p2122347.html > > To start a new topic under Solr - User, email > > ml-node+472068-1941297125-146...@n3.nabble.com<ml-node%2b472068-1941297125-146...@n3.nabble.com> > <ml-node%2b472068-1941297125-146...@n3.nabble.com<ml-node%252b472068-1941297125-146...@n3.nabble.com> > > > > To unsubscribe from Solr - User, click here< > http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472068&code=YW51cmFnLml0LmpvbGx5QGdtYWlsLmNvbXw0NzIwNjh8LTIwOTgzNDQxOTY= > >. > > > > > > > > -- > Kumar Anurag > > > ----- > Kumar Anurag > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Nutch-and-Solr-integration-tp2122347p2122623.html > Sent from the Solr - User mailing list archive at Nabble.com. >