Re: Real-time Solr integration

2011-07-14 Thread Matthew Painter
p://wiki.apache.org/nutch/Whole-Web%20Crawling%20incremental%20script >> > > >> > > On Tue, Jul 12, 2011 at 2:15 PM, Julien Nioche < >> > > >> > > lists.digitalpeb...@gmail.com> wrote: >> > >> Hi Matthew, >> &g

HTTPS support

2011-07-14 Thread Matthew Painter
I see on the wiki that HTTPS is supported by protocol-httpclient but not protocol-http. However, protocol-httpclient is not recommended for use ( https://issues.apache.org/jira/browse/NUTCH-990). Is there a plan for supporting HTTPS? Happy to help implement if possible :) Thanks Matt

Re: Real-time Solr integration

2011-07-12 Thread Matthew Painter
mmands (as opposed to calling 'nutch crawl') and index at the end of >> a generate-fetch-parse-update-linkdb sequence. You don't need any plugins >> for that >> >> HTH >> >> Julien >> >> >> On 12 July 2011 13:35, Matthew Painter

Realtime Solr Indexing

2011-07-12 Thread Matthew Painter
Hi all, I was wondering about the feasibility of creating a plugin for nutch that create a solr update command, and added it to a queue for indexing after it first parses the page, rather than when crawling has finished. This would allow you to do "real-time" indexing when crawling. Drawbacks: N

Real-time Solr integration

2011-07-12 Thread Matthew Painter
Hi all, I was wondering about the feasibility of creating a plugin for nutch that create a solr update command, and added it to a queue for indexing after it first parses the page, rather than when crawling has finished. This would allow you to do "real-time" indexing when crawling. Drawbacks: N