[ 
https://issues.apache.org/jira/browse/KUDU-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17398905#comment-17398905
 ] 

ASF subversion and git services commented on KUDU-2671:
-------------------------------------------------------

Commit ecbd8700e49bf60f15a9d261f0de425f2ba5413e in kudu's branch 
refs/heads/master from Alexey Serbin
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=ecbd870 ]

KUDU-2671 key encoding for custom hash bucket schemas

This patch updates the PartitionSchema class to properly encode range
keys in the presence of range partitions with custom hash bucket schemas
in a Kudu table.  I added a few new test scenarios and updated the
existing ones accordingly to make it possible to send write operations
into appropriate tablets.

In addition, I'm sneaking in minor style-related updates in
  * src/kudu/tablet/tablet.cc
  * src/kudu/master/catalog_manager.cc

As one can see, the newly added method GetHashBucketSchemasForKey() is
used in PartitionSchema also in case when Partition object is at hand.
A follow-up changelist is going to change that, so there will be no need
to perform a dictionary lookup once the Partition class contains
information about its hash bucket schema.

This is a follow-up to 586b7913258df2d0ee75470ddfb2b88d472ba235.

Change-Id: I81aa1c10998d88a1bd5314fc3132037b1c8bbe4b
Reviewed-on: http://gerrit.cloudera.org:8080/17769
Tested-by: Kudu Jenkins
Reviewed-by: Andrew Wong <aw...@cloudera.com>


> Change hash number for range partitioning
> -----------------------------------------
>
>                 Key: KUDU-2671
>                 URL: https://issues.apache.org/jira/browse/KUDU-2671
>             Project: Kudu
>          Issue Type: Improvement
>          Components: client, java, master, server
>    Affects Versions: 1.8.0
>            Reporter: yangz
>            Assignee: Mahesh Reddy
>            Priority: Major
>              Labels: feature, roadmap-candidate, scalability
>         Attachments: 屏幕快照 2019-01-24 下午12.03.41.png
>
>
> For our usage, the kudu schema design isn't flexible enough.
> We create our table for day range such as dt='20181112' as hive table.
> But our data size change a lot every day, for one day it will be 50G, but for 
> some other day it will be 500G. For this case, it be hard to set the hash 
> schema. If too big, for most case, it will be too wasteful. But too small, 
> there is a performance problem in the case of a large amount of data.
>  
> So we suggest a solution we can change the hash number by the history data of 
> a table.
> for example
>  # we create schema with one estimated value.
>  # we collect the data size by day range
>  # we create new day range partition by our collected day size.
> We use this feature for half a year, and it work well. We hope this feature 
> will be useful for the community. Maybe the solution isn't so complete. 
> Please help us make it better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to