p://wiki.apache.org/nutch/Whole-Web%20Crawling%20incremental%20script
>> > >
>> > > On Tue, Jul 12, 2011 at 2:15 PM, Julien Nioche <
>> > >
>> > > lists.digitalpeb...@gmail.com> wrote:
>> > >> Hi Matthew,
>> &g
I see on the wiki that HTTPS is supported by protocol-httpclient but not
protocol-http.
However, protocol-httpclient is not recommended for use (
https://issues.apache.org/jira/browse/NUTCH-990).
Is there a plan for supporting HTTPS? Happy to help implement if possible :)
Thanks
Matt
mmands (as opposed to calling 'nutch crawl') and index at the end of
>> a generate-fetch-parse-update-linkdb sequence. You don't need any plugins
>> for that
>>
>> HTH
>>
>> Julien
>>
>>
>> On 12 July 2011 13:35, Matthew Painter
Hi all,
I was wondering about the feasibility of creating a plugin for nutch that
create a solr update command, and added it to a queue for indexing after it
first parses the page, rather than when crawling has finished.
This would allow you to do "real-time" indexing when crawling.
Drawbacks: N
Hi all,
I was wondering about the feasibility of creating a plugin for nutch that
create a solr update command, and added it to a queue for indexing after it
first parses the page, rather than when crawling has finished.
This would allow you to do "real-time" indexing when crawling.
Drawbacks: N
5 matches
Mail list logo