[jira] [Updated] (NUTCH-1534) cassandra/hector exception: InvalidRequestException(why:column name must not be empty)

2013-02-24 Thread Roland (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roland updated NUTCH-1534: -- Environment: nutch 2.1 / cassandra 1.2.1 / gora-cassandra 0.2 / gora-core 0.2.1 running fetch with parse=true f

[jira] [Commented] (NUTCH-1534) cassandra/hector exception: InvalidRequestException(why:column name must not be empty)

2013-02-24 Thread Roland (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13585688#comment-13585688 ] Roland commented on NUTCH-1534: --- Because of the ConcurrentModificationException I changed my

Re: Configuration improvements to GeneratorJob

2013-02-24 Thread feng lu
@Tejas +1 I think: Keep Property - - generate.max.count. keep it because it still used GeneratorJob, Reducer. - GENERATOR_MAX_COUNT Deprecate Property -- - GENERATOR_MIN_SCORE - GENERATOR_COUNT_VALUE_IP Add in nutch-default.xml -

Re: Configuration improvements to GeneratorJob

2013-02-24 Thread Tejas Patil
Hi Lewis, We have not came to a conclusion for this topic. Here is what I propose: 1. keep "generate.max.count" 2. GENERATOR_MIN_SCORE and GENERATOR_MAX_COUNT: once we get to know that if they were kept back in 2.x for some valid reason, then we can safely remove these params. These seem to do not

[jira] [Commented] (NUTCH-1031) Delegate parsing of robots.txt to crawler-commons

2013-02-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13585467#comment-13585467 ] Tejas Patil commented on NUTCH-1031: Hi Sebastian, Thanks for your time and suggesting