RE: CrawlerCommons & ManifoldCF

2011-06-03 Thread Fuad Efendi
:01 AM To: connectors-dev@incubator.apache.org; crawler-comm...@googlegroups.com Subject: Re: CrawlerCommons & ManifoldCF There is a link to the discussion group on the main page, becoming a member of the group is pretty straightforward On 3 June 2011 00:36, Fuad Efendi wrote: > I mean "joi

Re: CrawlerCommons & ManifoldCF

2011-06-03 Thread Julien Nioche
Hi, We could reuse RobotsData indeed and refactor it a bit. Ken, you said you'd be keen to contribute your code for robot parsing as well - do you think it would be quicker than refactoring Manifold's code? Or does it do support additional features? What about Droids? Julien PS: Anyone attendin

Re: CrawlerCommons & ManifoldCF

2011-06-03 Thread Julien Nioche
onnectors-dev@incubator.apache.org; crawler-comm...@googlegroups.com > Subject: RE: CrawlerCommons & ManifoldCF > > I'd like to join this project but can't find "join" button :) Thanks! > > Fuad Efendi > +1 416-993-2060 > http://www.linkedin.com/in/liferay &

RE: CrawlerCommons & ManifoldCF

2011-06-02 Thread Fuad Efendi
inal Message- From: Fuad Efendi [mailto:f...@efendi.ca] Sent: June-02-11 7:05 PM To: connectors-dev@incubator.apache.org; crawler-comm...@googlegroups.com Subject: RE: CrawlerCommons & ManifoldCF I'd like to join this project but can't find "join" button :) Thanks! Fua

RE: CrawlerCommons & ManifoldCF

2011-06-02 Thread Fuad Efendi
I'd like to join this project but can't find "join" button :) Thanks! Fuad Efendi +1 416-993-2060 http://www.linkedin.com/in/liferay Tokenizer Inc. http://www.tokenizer.ca/ Data Mining, Vertical Search -Original Message- From: Julien Nioche [mailto:lists.digitalpeb...@gmail.com] Sent: J

Re: CrawlerCommons & ManifoldCF

2011-06-02 Thread Karl Wright
I don't think it would be hard to peel out the robots parser, although obviously it would need refactoring to live in a more standard library environment. If you want to look at it, it is in: https://svn.apache.org/repos/asf/incubator/lcf/trunk/connectors/webcrawler/connector/src/main/java/org/ap

Re: CrawlerCommons & ManifoldCF

2011-06-02 Thread Julien Nioche
Hi Karl, Maybe a good start would be to identify which parts of your crawler could be shared and would not take too much effort to be made generic. I haven't looked to the code of the crawler in great details but do you think the robots parser would be a good candidate? Julien On 2 June 2011 16:

Re: CrawlerCommons & ManifoldCF

2011-06-02 Thread Karl Wright
Absolutely! We're a bit thin on active committers at the moment, which will probably limit our ability to take any highly active roles in your development process. But we do have a pile of code which you might be able to leverage, and once there is common functionality available I think we'd all p