crawl.gen.delay

2013-08-15 Thread kaveh minooie
is 'crawl.gen.delay' still being used anywhere? cause I can't find anything in the source code except for here: package org.apache.nutch.crawl; public class GeneratorJob extends NutchTool implements Tool { public static final String GENERATOR_TOP_N = "generate.topN"; public static final St

[jira] [Commented] (NUTCH-1598) ElasticSearchIndexer to read ImmutableSettings from config

2013-08-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13741154#comment-13741154 ] Lewis John McGibbney commented on NUTCH-1598: - great work Markus. This is dyna

minor typo in "What is Apache Nutch?" section

2013-08-15 Thread Andrew Pennebaker
I started reading the Nutch docs and noticed a little typo. >From http://nutch.apache.org/#What+is+Apache+Nutch%3F "Being pluggable and modular of course has it's benefits" should be "Being pluggable and modular of course has its benefits"

Re: Reading additional metadata field: mtdt:_hr_

2013-08-15 Thread Ahmet Emre Aladağ
It's my bad, I discovered that manually entered keys had extra http at the end, so Nutch wasn't able to see those as host keys and skipping them. On 08/14/2013 11:23 PM, Ahmet Emre Aladağ wrote: Hi, I added additional mtdt:_hr_ records in HBase holding scores externally. To get the score s