[ 
https://issues.apache.org/jira/browse/NUTCH-640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doğacan Güney closed NUTCH-640.
-------------------------------

    Resolution: Fixed

Committed as of rev. 701052.

> confusing description "set it to Integer.MAX_VALUE"
> ---------------------------------------------------
>
>                 Key: NUTCH-640
>                 URL: https://issues.apache.org/jira/browse/NUTCH-640
>             Project: Nutch
>          Issue Type: Improvement
>          Components: documentation
>    Affects Versions: 0.9.0
>            Reporter: Stijn Vermeeren
>            Assignee: Doğacan Güney
>            Priority: Minor
>         Attachments: NUTCH-640.patch
>
>
> This property "indexer.max.tokens" has the following description in 
> nutch-default.xml :
> " The maximum number of tokens that will be indexed for a single field
>   in a document. This limits the amount of memory required for
>   indexing, so that collections with very large files will not crash
>   the indexing process by running out of memory.
>   Note that this effectively truncates large documents, excluding
>   from the index tokens that occur further in the document. If you
>   know your source documents are large, be sure to set this value
>   high enough to accomodate the expected size. If you set it to
>   Integer.MAX_VALUE, then the only limit is your memory, but you
>   should anticipate an OutOfMemoryError."
> Apparently, "set it to Integer.MAX_VALUE" here means <<substitute the integer 
> value of Integer.MAX_VALUE>>, and not <<put the text "Integer.MAX_VALUE" 
> between the value tags>>. I think this is very confusing and the description 
> should be improved.
> I first put <value>Integer.MAX_VALUE</value> in my configuration, and it took 
> a long time to figure out what was wrong, especially since Nutch rolled back 
> on the default value of 10000 instead of giving an error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to