Re: Incubator PMC/Board report for June 2011 (connectors-dev@incubator.apache.org)

2011-06-02 Thread Tommaso Teofili
it sounds good to me, any others? Tommaso 2011/6/1 Karl Wright daddy...@gmail.com Here's my proposed text: ManifoldCF --Description-- ManifoldCF is an incremental crawler framework and set of connectors designed to pull documents from various kinds of repositories into search engine

[jira] [Commented] (CONNECTORS-110) Max activity and Max bandwidth reports don't work properly under Derby

2011-06-02 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042655#comment-13042655 ] Karl Wright commented on CONNECTORS-110: HSQLDB is now also in roughly the

[jira] [Updated] (CONNECTORS-110) Max activity and Max bandwidth reports don't work properly under Derby or HSQLDB

2011-06-02 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright updated CONNECTORS-110: --- Summary: Max activity and Max bandwidth reports don't work properly under Derby or

[jira] [Created] (CONNECTORS-204) Now that HSQLDB functions with ManifoldCF, write a test-hsqldb ant target to test it

2011-06-02 Thread Karl Wright (JIRA)
Now that HSQLDB functions with ManifoldCF, write a test-hsqldb ant target to test it Key: CONNECTORS-204 URL: https://issues.apache.org/jira/browse/CONNECTORS-204

[jira] [Commented] (CONNECTORS-203) Consider porting ManifoldCF to Java 1.5 code standards

2011-06-02 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042724#comment-13042724 ] Karl Wright commented on CONNECTORS-203: I've merged in all the major

[RESULT][VOTE] Adopt Java 1.5 as the minimum Java release for ManifoldCF

2011-06-02 Thread Karl Wright
Although it hasn't been the quite required 3 days, this vote isn't binding anyway, so I'm going to declare it closed and commit the code. Karl On Mon, May 30, 2011 at 7:32 PM, Karl Wright daddy...@gmail.com wrote: Please have a look at CONNECTORS-203 and vote +1 if you think it's time to move

[jira] [Created] (CONNECTORS-205) Database DISTINCT ON abstraction needs to include ordering information in order to work for HSQLDB

2011-06-02 Thread Karl Wright (JIRA)
Database DISTINCT ON abstraction needs to include ordering information in order to work for HSQLDB -- Key: CONNECTORS-205 URL:

CrawlerCommons ManifoldCF

2011-06-02 Thread Julien Nioche
Hi guys, I'd just like to mention Crawler Commons which is a effort between the committers of various crawl-related projects (Nutch, Bixo or Heritrix) to put some basic functionalities in common. We currently have mostly a top level domain finder and a sitemap parser, but are definitely planning

Re: CrawlerCommons ManifoldCF

2011-06-02 Thread Karl Wright
Absolutely! We're a bit thin on active committers at the moment, which will probably limit our ability to take any highly active roles in your development process. But we do have a pile of code which you might be able to leverage, and once there is common functionality available I think we'd all

Re: CrawlerCommons ManifoldCF

2011-06-02 Thread Julien Nioche
Hi Karl, Maybe a good start would be to identify which parts of your crawler could be shared and would not take too much effort to be made generic. I haven't looked to the code of the crawler in great details but do you think the robots parser would be a good candidate? Julien On 2 June 2011

[jira] [Commented] (CONNECTORS-110) Max activity and Max bandwidth reports don't work properly under Derby or HSQLDB

2011-06-02 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042861#comment-13042861 ] Karl Wright commented on CONNECTORS-110: r1130644 implements this for HSQLDB.

Re: CrawlerCommons ManifoldCF

2011-06-02 Thread Karl Wright
I don't think it would be hard to peel out the robots parser, although obviously it would need refactoring to live in a more standard library environment. If you want to look at it, it is in:

RE: CrawlerCommons ManifoldCF

2011-06-02 Thread Fuad Efendi
I'd like to join this project but can't find join button :) Thanks! Fuad Efendi +1 416-993-2060 http://www.linkedin.com/in/liferay Tokenizer Inc. http://www.tokenizer.ca/ Data Mining, Vertical Search -Original Message- From: Julien Nioche [mailto:lists.digitalpeb...@gmail.com] Sent:

RE: CrawlerCommons ManifoldCF

2011-06-02 Thread Fuad Efendi
I mean join button at http://code.google.com/p/crawler-commons/ I am well familiar with BIXO and Droids; it will be hard to make minor changes in ManifoldCF... although it's possible (without crawler part, only robots rules parser)... -Fuad -Original Message- From: Fuad Efendi