Re: Isn't there redudant/wasteful duplication between nutch crawldb and solr index?

2011-07-16 Thread lewis john mcgibbney
Hi Gabriele, At first this seems like a plausable arguement, however my question concerns what Nutch would do if we wished to change the Solr core which to index to? If we removed this functionality from the crawldb there would be no way to determine what Nutch was to fetch and what it wasn't.

Re: Isn't there redudant/wasteful duplication between nutch crawldb and solr index?

2011-07-16 Thread Gabriele Kahlout
On Sat, Jul 16, 2011 at 1:29 PM, lewis john mcgibbney lewis.mcgibb...@gmail.com wrote: Hi Gabriele, At first this seems like a plausable arguement, Indeed, I think it could be a FAQ. Shall I add it to nutch wiki? however my question concerns what Nutch would do if we wished to change

Re: Isn't there redudant/wasteful duplication between nutch crawldb and solr index?

2011-07-16 Thread lewis john mcgibbney
Please feel free to add this to the wiki as it is a question that will undoubtably arise in the future. Lewis On Sat, Jul 16, 2011 at 12:37 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: On Sat, Jul 16, 2011 at 1:29 PM, lewis john mcgibbney lewis.mcgibb...@gmail.com wrote: Hi

Re: Isn't there redudant/wasteful duplication between nutch crawldb and solr index?

2011-07-16 Thread Julien Nioche
Gabriele What you are describing could be done with Nutch 2.0 by adding a SOLR backend to GORA. SOLR would be used to store the webtable and provided that you setup the schema accordingly you could index the appropriate fields for searching. I think there were plans to add SOLR as a GORA backend.