Re: mixing range and hash partitioning

2017-02-24 Thread Paul Brannan
nique_ptr own(T * raw_ptr) { std::unique_ptr p(raw_ptr); return p; } } int main() { sp::shared_ptr client; check_ok(KuduClientBuilder() .add_master_server_addr("localhost") .Build(&client)); KuduSchema schema; KuduSchemaBuilder b; b.AddColumn("date")-&g

Re: mixing range and hash partitioning

2017-02-24 Thread Dan Burkert
Hi Paul, I can't reproduce the behavior you are describing, I always get a single unbounded range partition when creating the table without specifying range bounds or splits (regardless of hash partitioning). I searched and couldn't find a unit test for this behavior, so I wrote one - you might co

Re: mixing range and hash partitioning

2017-02-24 Thread Paul Brannan
I can verify that dropping the unbounded range partition allows me to later add bounded partitions. If I only have range partitioning (by commenting out the call to add_hash_partitions), adding a bounded partition succeeds, regardless of whether I first drop the unbounded partition. This seems su

Re: kudu table design question

2017-02-24 Thread tenny susanto
On my impala parquet table, each day partition is about 500MB - 1GB. So using range partition by day, query time went down to 35 sec from 123 sec Query against the impala table is 2 seconds. On Fri, Feb 24, 2017 at 1:34 PM, Dan Burkert wrote: > Hi Tenny, > > 1000 partitions is on the uppe

Re: File descriptor limit for WAL

2017-02-24 Thread Todd Lipcon
On Fri, Feb 24, 2017 at 12:39 PM, Adar Dembo wrote: > It's definitely safe to increase the ulimit for open files; we > typically test with higher values (like 32K or 64K). We don't use > select(2) directly; any fd polling in Kudu is done via libev which I > believe uses epoll(2) under the hood. T

Re: kudu table design question

2017-02-24 Thread Dan Burkert
Hi Tenny, 1000 partitions is on the upper end of what I'd recommend - with 3x replication that's 125 tablet replicas per tablet server (something more like 20 or 30 would be ideal depending on hardware). How much data does each day have? I would aim for tablet size on the order of 50GiB, so if i

Re: mixing range and hash partitioning

2017-02-24 Thread Dan Burkert
Hi Paul, I think the issue you are running into is that if you don't add a range partition explicitly during table creation (by calling add_range_partition or inserting a split with add_range_partition_split), Kudu will default to creating 1 unbounded range partition. So your two options are to a

Re: kudu table design question

2017-02-24 Thread tenny susanto
I have 24 tablet servers. I added an id column because I needed a unique column to be the primary key as kudu required primary key to be specified. My original table actually has 20 columns with no single primary key column. I concatenated 5 of them to build a unique id column which I made it as

Re: File descriptor limit for WAL

2017-02-24 Thread Adar Dembo
I think range partitioning is a fine solution for your use case, though you should know that we're not recommending more than 4 TB of total data (post-encoding/compression) per tserver at the moment. I don't expect anything to break outright if you exceed that, but startup will get slower and slowe

mixing range and hash partitioning

2017-02-24 Thread Paul Brannan
I'm trying to create a table with one-column range-partitioned and another column hash-partitioned. Documentation for add_hash_partitions and set_range_partition_columns suggest this should be possible ("Tables must be created with either range, hash, or range and hash partitioning"). I have a sc

Re: File descriptor limit for WAL

2017-02-24 Thread Paul Brannan
I'm using the debs from the cloudera-kudu ppa with little change to the default configuration, so one master and one tablet server. I set num_replicas(1) when creating each table. I used range partitioning with (if I understand correctly) one large open-ended range. So that should have 334 table

Re: Delete row by partial key

2017-02-24 Thread Paul Brannan
I think this makes sense for isset_bitmap_ and owned_strings_bitmap_. I have the source checked out from github and will try to build and run the regression tests so I can play with this idea. In general though it seems I only want to copy the primary keys (any given RowPtr might have other colum