Re: CAS Crawler Crawling Code

Chris Mattmann Thu, 01 May 2014 11:40:42 -0700

Hey Lewis,

That's b/c Crawler doesn't do HTTP connections.
PushPull is the component where that occurs. We
specifically made Crawler only handle local data,
and refactored the protocol layer/functionality
into PushPull and they operate through a shared
directory structure for a 'staging' dir and through
Crawler pre conditions and Actions.


Scope out Push Pull and then we can discuss.

Thanks dude.

Cheers,
Chris

------------------------
Chris Mattmann
[email protected]




-----Original Message-----
From: Lewis John Mcgibbney <[email protected]>
Reply-To: <[email protected]>
Date: Thursday, May 1, 2014 10:35 AM
To: <[email protected]>
Subject: CAS Crawler Crawling Code

>Hi Folks,
>Im sitting jumping between ProductCrawler and StdIngester trying to pin
>point _exactly_ where product fetching actually happens.
>I'm aware of the triple headed nature of crawler workflows e.g.
>preIngestion, postIngestionSuccess and postIngestionFailure... I can see
>the logic within the ProductCrawler code... what I cannot locate is where
>HTTP/transport socket connections are created and used.
>
>Can anyone please point this out?
>Thanks
>Lewis

Re: CAS Crawler Crawling Code

Reply via email to