:01 AM
To: connectors-dev@incubator.apache.org; crawler-comm...@googlegroups.com
Subject: Re: CrawlerCommons & ManifoldCF
There is a link to the discussion group on the main page, becoming a member
of the group is pretty straightforward
On 3 June 2011 00:36, Fuad Efendi wrote:
> I mean "joi
Hi,
We could reuse RobotsData indeed and refactor it a bit.
Ken, you said you'd be keen to contribute your code for robot parsing as
well - do you think it would be quicker than refactoring Manifold's code? Or
does it do support additional features? What about Droids?
Julien
PS: Anyone attendin
onnectors-dev@incubator.apache.org; crawler-comm...@googlegroups.com
> Subject: RE: CrawlerCommons & ManifoldCF
>
> I'd like to join this project but can't find "join" button :) Thanks!
>
> Fuad Efendi
> +1 416-993-2060
> http://www.linkedin.com/in/liferay
&
inal Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: June-02-11 7:05 PM
To: connectors-dev@incubator.apache.org; crawler-comm...@googlegroups.com
Subject: RE: CrawlerCommons & ManifoldCF
I'd like to join this project but can't find "join" button :) Thanks!
Fua
...@gmail.com]
Sent: June-02-11 11:11 AM
To: connectors-dev@incubator.apache.org; crawler-comm...@googlegroups.com
Subject: CrawlerCommons & ManifoldCF
Hi guys,
I'd just like to mention Crawler Commons which is a effort between the
committers of various crawl-related projects (Nutch,
I don't think it would be hard to peel out the robots parser, although
obviously it would need refactoring to live in a more standard library
environment. If you want to look at it, it is in:
https://svn.apache.org/repos/asf/incubator/lcf/trunk/connectors/webcrawler/connector/src/main/java/org/ap
Hi Karl,
Maybe a good start would be to identify which parts of your crawler could be
shared and would not take too much effort to be made generic. I haven't
looked to the code of the crawler in great details but do you think the
robots parser would be a good candidate?
Julien
On 2 June 2011 16:
Absolutely!
We're a bit thin on active committers at the moment, which will
probably limit our ability to take any highly active roles in your
development process. But we do have a pile of code which you might be
able to leverage, and once there is common functionality available I
think we'd all p
Hi guys,
I'd just like to mention Crawler Commons which is a effort between the
committers of various crawl-related projects (Nutch, Bixo or Heritrix) to
put some basic functionalities in common. We currently have mostly a top
level domain finder and a sitemap parser, but are definitely planning t