@Tejas +1
I think:
Keep Property
-
- generate.max.count. keep it because it still used GeneratorJob, Reducer.
- GENERATOR_MAX_COUNT
Deprecate Property
--
- GENERATOR_MIN_SCORE
- GENERATOR_COUNT_VALUE_IP
Add in nutch-default.xml
-
Hi Lewis,
We have not came to a conclusion for this topic.
Here is what I propose:
1. keep "generate.max.count"
2. GENERATOR_MIN_SCORE and GENERATOR_MAX_COUNT: once we get to know that if
they were kept back in 2.x for some valid reason, then we can safely remove
these params. These seem to do not
Hi Lufeng,
On Wed, Feb 20, 2013 at 9:19 PM, feng lu wrote:
> Hi Tejas
>
> Yes , your are right. I misread the description of property
> "generate.count.mode". I'm so sorry, i did also not found any information
> about why disabled the IP based counting mode of "generate.count.mode".
>
> Yes, i s
Hi Tejas
Yes , your are right. I misread the description of property
"generate.count.mode". I'm so sorry, i did also not found any information
about why disabled the IP based counting mode of "generate.count.mode".
Yes, i see that the FetchEntryPartitioner class (combination
of URLPartitioner) is
Hi Lufeng,
On Wed, Feb 20, 2013 at 7:16 PM, feng lu wrote:
> Hi Lewis
>
> Sorry, I am wrong, The GeneratorJob is only used in Nutch 2.x not 1.x.
>
> To the property of GENERATOR_COUNT_VALUE_IP, i think we can add a patch to
> GeneratorJob, instead of deprecated it. patch may like this.
>
> if (G
Hi Lewis
Sorry, I am wrong, The GeneratorJob is only used in Nutch 2.x not 1.x.
To the property of GENERATOR_COUNT_VALUE_IP, i think we can add a patch to
GeneratorJob, instead of deprecated it. patch may like this.
if (GENERATOR_COUNT_VALUE_HOST.equalsIgnoreCase(mode)) {
getConf().set(URL
Hey Lewis,
On Wed, Feb 20, 2013 at 1:05 PM, Lewis John Mcgibbney <
lewis.mcgibb...@gmail.com> wrote:
> Hi,
> Following on from a discussion on user@ I dived into the GeneratorJob
> code and have the following general comment based on my observation...
> Usage of configuration options is really un
Hi Lewis
i think generate.max.count is used by someone who want to limits the number
urls per domain (host). see
http://wiki.apache.org/nutch/Nutch2Crawling#Reducer
The generate.min.score property is already defined in nutch-default.xml.
The generate.(filter|normalise|topN) can be passed through
Hi,
Following on from a discussion on user@ I dived into the GeneratorJob code
and have the following general comment based on my observation... Usage of
configuration options is really unstructured and loosely applied. This
should not be the case. For example
Observations
===
nutch-defau
9 matches
Mail list logo