it sounds good to me, any others?
Tommaso
2011/6/1 Karl Wright daddy...@gmail.com
Here's my proposed text:
ManifoldCF
--Description--
ManifoldCF is an incremental crawler framework and set of connectors
designed to pull documents from various kinds of repositories into
search engine
[
https://issues.apache.org/jira/browse/CONNECTORS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042655#comment-13042655
]
Karl Wright commented on CONNECTORS-110:
HSQLDB is now also in roughly the
[
https://issues.apache.org/jira/browse/CONNECTORS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Karl Wright updated CONNECTORS-110:
---
Summary: Max activity and Max bandwidth reports don't work properly under
Derby or
Now that HSQLDB functions with ManifoldCF, write a test-hsqldb ant target to
test it
Key: CONNECTORS-204
URL: https://issues.apache.org/jira/browse/CONNECTORS-204
[
https://issues.apache.org/jira/browse/CONNECTORS-203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042724#comment-13042724
]
Karl Wright commented on CONNECTORS-203:
I've merged in all the major
Although it hasn't been the quite required 3 days, this vote isn't
binding anyway, so I'm going to declare it closed and commit the code.
Karl
On Mon, May 30, 2011 at 7:32 PM, Karl Wright daddy...@gmail.com wrote:
Please have a look at CONNECTORS-203 and vote +1 if you think it's
time to move
Database DISTINCT ON abstraction needs to include ordering information in order
to work for HSQLDB
--
Key: CONNECTORS-205
URL:
Hi guys,
I'd just like to mention Crawler Commons which is a effort between the
committers of various crawl-related projects (Nutch, Bixo or Heritrix) to
put some basic functionalities in common. We currently have mostly a top
level domain finder and a sitemap parser, but are definitely planning
Absolutely!
We're a bit thin on active committers at the moment, which will
probably limit our ability to take any highly active roles in your
development process. But we do have a pile of code which you might be
able to leverage, and once there is common functionality available I
think we'd all
Hi Karl,
Maybe a good start would be to identify which parts of your crawler could be
shared and would not take too much effort to be made generic. I haven't
looked to the code of the crawler in great details but do you think the
robots parser would be a good candidate?
Julien
On 2 June 2011
[
https://issues.apache.org/jira/browse/CONNECTORS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042861#comment-13042861
]
Karl Wright commented on CONNECTORS-110:
r1130644 implements this for HSQLDB.
I don't think it would be hard to peel out the robots parser, although
obviously it would need refactoring to live in a more standard library
environment. If you want to look at it, it is in:
I'd like to join this project but can't find join button :)
Thanks!
Fuad Efendi
+1 416-993-2060
http://www.linkedin.com/in/liferay
Tokenizer Inc.
http://www.tokenizer.ca/
Data Mining, Vertical Search
-Original Message-
From: Julien Nioche [mailto:lists.digitalpeb...@gmail.com]
Sent:
I mean join button at http://code.google.com/p/crawler-commons/
I am well familiar with BIXO and Droids; it will be hard to make minor
changes in ManifoldCF... although it's possible (without crawler part,
only robots rules parser)...
-Fuad
-Original Message-
From: Fuad Efendi
14 matches
Mail list logo