bin/nutch crawl urls -dir crawl -threads 10 -depth 100 -topN 50 -solrindex
http://localhost:8983/solr

I've run that command before and it worked...that's why I asked.

grab nutch from trunk and run bin/nutch and see that it is in fact an
option. It looks like Hadoop is the culprit now and I am at a loss on how to
fix it.

Thanks for the feedback.
Adam

On Mon, Dec 20, 2010 at 4:21 PM, Anurag <anurag.it.jo...@gmail.com> wrote:

>
> why are using solrindex in the argument.? It is used when we need to index
> the crawled data in Solr
> For more read http://wiki.apache.org/nutch/NutchTutorial .
>
> Also for nutch-solr integration this is very useful blog
> http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/
> I integrated nutch and solr and it works well.
>
> Thanks
>
> On Tue, Dec 21, 2010 at 1:57 AM, Adam Estrada-2 [via Lucene] <
> ml-node+2122347-622655030-146...@n3.nabble.com<ml-node%2b2122347-622655030-146...@n3.nabble.com>
> <ml-node%2b2122347-622655030-146...@n3.nabble.com<ml-node%252b2122347-622655030-146...@n3.nabble.com>
> >
> > wrote:
>
> > All,
> >
> > I have a couple websites that I need to crawl and the following command
> > line
> > used to work I think. Solr is up and running and everything is fine there
> > and I can go through and index the site but I really need the results
> added
> >
> > to Solr after the crawl. Does anyone have any idea on how to make that
> > happen or what I'm doing wrong?  These errors are being thrown fro Hadoop
> > which I am not using at all.
> >
> > $ bin/nutch crawl urls -dir crawl -threads 10 -depth 100 -topN 50
> > -solrindex
> > ht
> > tp://localhost:8983/solr
> > crawl started in: crawl
> > rootUrlDir = http://localhost:8983/solr
> > threads = 10
> > depth = 100
> > indexer=lucene
> > topN = 50
> > Injector: starting at 2010-12-20 15:23:25
> > Injector: crawlDb: crawl/crawldb
> > Injector: urlDir: http://localhost:8983/solr
> > Injector: Converting injected urls to crawl db entries.
> > Exception in thread "main" java.io.IOException: No FileSystem for scheme:
> > http
> >         at
> > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375
> > )
> >         at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
> >         at
> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
> >         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
> >         at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
> >         at
> > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.j
> > ava:169)
> >         at
> > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.ja
> > va:201)
> >         at
> > org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
> >
> >         at
> > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:7
> > 81)
> >         at
> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
> >
> >         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
> >         at org.apache.nutch.crawl.Injector.inject(Injector.java:217)
> >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:124)
> >
> >
> > ------------------------------
> >  View message @
> >
> http://lucene.472066.n3.nabble.com/Nutch-and-Solr-integration-tp2122347p2122347.html
> > To start a new topic under Solr - User, email
> > ml-node+472068-1941297125-146...@n3.nabble.com<ml-node%2b472068-1941297125-146...@n3.nabble.com>
> <ml-node%2b472068-1941297125-146...@n3.nabble.com<ml-node%252b472068-1941297125-146...@n3.nabble.com>
> >
> > To unsubscribe from Solr - User, click here<
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472068&code=YW51cmFnLml0LmpvbGx5QGdtYWlsLmNvbXw0NzIwNjh8LTIwOTgzNDQxOTY=
> >.
> >
> >
>
>
>
> --
> Kumar Anurag
>
>
> -----
> Kumar Anurag
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Nutch-and-Solr-integration-tp2122347p2122623.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Reply via email to