:01 AM
To: connectors-dev@incubator.apache.org; crawler-comm...@googlegroups.com
Subject: Re: CrawlerCommons & ManifoldCF
There is a link to the discussion group on the main page, becoming a member
of the group is pretty straightforward
On 3 June 2011 00:36, Fuad Efendi wrote:
> I mean "joi
Hi,
We could reuse RobotsData indeed and refactor it a bit.
Ken, you said you'd be keen to contribute your code for robot parsing as
well - do you think it would be quicker than refactoring Manifold's code? Or
does it do support additional features? What about Droids?
Julien
PS: Anyone attendin
onnectors-dev@incubator.apache.org; crawler-comm...@googlegroups.com
> Subject: RE: CrawlerCommons & ManifoldCF
>
> I'd like to join this project but can't find "join" button :) Thanks!
>
> Fuad Efendi
> +1 416-993-2060
> http://www.linkedin.com/in/liferay
&
inal Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: June-02-11 7:05 PM
To: connectors-dev@incubator.apache.org; crawler-comm...@googlegroups.com
Subject: RE: CrawlerCommons & ManifoldCF
I'd like to join this project but can't find "join" button :) Thanks!
Fua
I'd like to join this project but can't find "join" button :)
Thanks!
Fuad Efendi
+1 416-993-2060
http://www.linkedin.com/in/liferay
Tokenizer Inc.
http://www.tokenizer.ca/
Data Mining, Vertical Search
-Original Message-
From: Julien Nioche [mailto:lists.digitalpeb...@gmail.com]
Sent: J
I don't think it would be hard to peel out the robots parser, although
obviously it would need refactoring to live in a more standard library
environment. If you want to look at it, it is in:
https://svn.apache.org/repos/asf/incubator/lcf/trunk/connectors/webcrawler/connector/src/main/java/org/ap
Hi Karl,
Maybe a good start would be to identify which parts of your crawler could be
shared and would not take too much effort to be made generic. I haven't
looked to the code of the crawler in great details but do you think the
robots parser would be a good candidate?
Julien
On 2 June 2011 16:
Absolutely!
We're a bit thin on active committers at the moment, which will
probably limit our ability to take any highly active roles in your
development process. But we do have a pile of code which you might be
able to leverage, and once there is common functionality available I
think we'd all p