[ 
https://issues.apache.org/jira/browse/IMPALA-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Russell resolved IMPALA-6028.
----------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.10.0

> Document max_row_size for upgrade awareness
> -------------------------------------------
>
>                 Key: IMPALA-6028
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6028
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Docs
>    Affects Versions: Impala 2.10.0
>            Reporter: Balazs Jeszenszky
>            Assignee: John Russell
>            Priority: Blocker
>             Fix For: Impala 2.10.0
>
>
> 2.10 introduced the max_row_size query option. This is to replace the 
> read_size startup option from earlier, both of them control the maximum width 
> of a row impala can process. The defaults for these are different, while 
> read_size was 8MB, the new default is 512kb. There is a tradeoff between 
> being able to read large rows and memory usage. The 512kb is expected to work 
> well for most use cases, having 8MB would reserve unnecessarily large chunks 
> of memory in the new buffer pool even if the rows are smaller.
> We should advise users that they may have to append query options to their 
> workflows or change the 512kb default when upgrading to 2.10.
> Also, if users set --read_size > 8mb to process larger rows, we should 
> recommend that they revert --read_size to the default and set the 
> max_row_size query option to the size of the largest rows they need to 
> process. This should greatly reduce memory consumption of HDFS scans. The 
> query option gives them the flexibility of overriding the value per-query, 
> per-pool or globally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to