Re: Why does Cassandra need to have 2B column limit? why can't we have unlimited ?

Matope Ono Fri, 14 Oct 2016 23:46:42 -0700

Please forget the part in my sentence.
For more correctly, maybe I should have said like "He could compact 10
sstables each of them have a 15GB partition".
What I wanted to say is we can store much more rows(and columns) in a
partition than before 3.6.


2016-10-15 15:34 GMT+09:00 Kant Kodali <k...@peernova.com>:

> "Robert said he could treat safely 10 15GB partitions at his presentation"
> This sounds like there is there is a row limit too not only columns??
>
> If I am reading this correctly 10 15GB partitions  means 10 partitions
> (like 10 row keys,  thats too small) with each partition of size 15GB.
> (thats like 15 million columns where each column can have a data of size
> 1KB).
>
> On Fri, Oct 14, 2016 at 11:30 PM, Kant Kodali <k...@peernova.com> wrote:
>
>> "Robert said he could treat safely 10 15GB partitions at his presentation"
>> This sounds like there is there is a row limit too not only columns??
>>
>> If I am reading this correctly 10 15GB partitions  means 10 partitions
>> (like 10 row keys,  thats too small) with each partition of size 15GB.
>> (thats like 10 million columns where each column can have a data of size
>> 1KB).
>>
>>
>>
>>
>>
>> On Fri, Oct 14, 2016 at 9:54 PM, Matope Ono <matope....@gmail.com> wrote:
>>
>>> Thanks to CASSANDRA-11206, I think we can have much larger partition
>>> than before 3.6.
>>> (Robert said he could treat safely 10 15GB partitions at his
>>> presentation. https://www.youtube.com/watch?v=N3mGxgnUiRY)
>>>
>>> But is there still 2B columns limit on the Cassandra code?
>>> If so, out of curiosity, I'd like to know where the bottleneck is. Could
>>> anyone let me know about it?
>>>
>>> Thanks Yasuharu.
>>>
>>>
>>> 2016-10-13 1:11 GMT+09:00 Edward Capriolo <edlinuxg...@gmail.com>:
>>>
>>>> The "2 billion column limit" press clipping "puffery". This statement
>>>> seemingly became popular because highly traffic traffic-ed story, in which
>>>> a tech reporter embellished on a statement to make a splashy article.
>>>>
>>>> The effect is something like this:
>>>> http://www.healthnewsreview.org/2012/08/iced-tea-kidney-ston
>>>> es-and-the-study-that-never-existed/
>>>>
>>>> Iced tea does not cause kidney stones! Cassandra does not store rows
>>>> with 2 billion columns! It is just not true.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Oct 12, 2016 at 4:57 AM, Kant Kodali <k...@peernova.com> wrote:
>>>>
>>>>> Well 1) I have not sent it to postgresql mailing lists 2) I thought
>>>>> this is an open ended question as it can involve ideas from everywhere
>>>>> including the Cassandra java driver mailing lists so sorry If that 
>>>>> bothered
>>>>> you for some reason.
>>>>>
>>>>> On Wed, Oct 12, 2016 at 1:41 AM, Dorian Hoxha <dorian.ho...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Also, I'm not sure, but I don't think it's "cool" to write to
>>>>>> multiple lists in the same message. (based on postgresql mailing lists
>>>>>> rules).
>>>>>> Example I'm not subscribed to those, and now the messages are
>>>>>> separated.
>>>>>>
>>>>>> On Wed, Oct 12, 2016 at 10:37 AM, Dorian Hoxha <
>>>>>> dorian.ho...@gmail.com> wrote:
>>>>>>
>>>>>>> There are some issues working on larger partitions.
>>>>>>> Hbase doesn't do what you say! You have also to be carefull on hbase
>>>>>>> not to create large rows! But since they are globally-sorted, you can
>>>>>>> easily sort between them and create small rows.
>>>>>>>
>>>>>>> In my opinion, cassandra people are wrong, in that they say
>>>>>>> "globally sorted is the devil!" while all fb/google/etc actually use
>>>>>>> globally-sorted most of the time! You have to be careful though (just 
>>>>>>> like
>>>>>>> with random partition)
>>>>>>>
>>>>>>> Can you tell what rowkey1, page1, col(x) actually are ? Maybe there
>>>>>>> is a way.
>>>>>>> The most "recent", means there's a timestamp in there ?
>>>>>>>
>>>>>>> On Wed, Oct 12, 2016 at 9:58 AM, Kant Kodali <k...@peernova.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> I understand Cassandra can have a maximum of 2B rows per partition
>>>>>>>> but in practice some people seem to suggest the magic number is 100K. 
>>>>>>>> why
>>>>>>>> not create another partition/rowkey automatically (whenever we reach a 
>>>>>>>> safe
>>>>>>>> limit that  we consider would be efficient)  with auto increment 
>>>>>>>> bigint  as
>>>>>>>> a suffix appended to the new rowkey? so that the driver can return the 
>>>>>>>> new
>>>>>>>> rowkey  indicating that there is a new partition and so on...Now I
>>>>>>>> understand this would involve allowing partial row key searches which
>>>>>>>> currently Cassandra wouldn't do (but I believe HBASE does) and thinking
>>>>>>>> about token ranges and potentially many other things..
>>>>>>>>
>>>>>>>> My current problem is this
>>>>>>>>
>>>>>>>> I have a row key followed by bunch of columns (this is not time
>>>>>>>> series data)
>>>>>>>> and these columns can grow to any number so since I have 100K limit
>>>>>>>> (or whatever the number is. say some limit) I want to break the 
>>>>>>>> partition
>>>>>>>> into level/pages
>>>>>>>>
>>>>>>>> rowkey1, page1->col1, col2, col3......
>>>>>>>> rowkey1, page2->col1, col2, col3......
>>>>>>>>
>>>>>>>> now say my Cassandra db is populated with data and say my
>>>>>>>> application just got booted up and I want to most recent value of a 
>>>>>>>> certain
>>>>>>>> partition but I don't know which page it belongs to since my 
>>>>>>>> application
>>>>>>>> just got booted up? how do I solve this in the most efficient that is
>>>>>>>> possible in Cassandra today? I understand I can create MV, other tables
>>>>>>>> that can hold some auxiliary data such as number of pages per 
>>>>>>>> partition and
>>>>>>>> so on..but that involves the maintenance cost of that other table 
>>>>>>>> which I
>>>>>>>> cannot afford really because I have MV's, secondary indexes for other 
>>>>>>>> good
>>>>>>>> reasons. so it would be great if someone can explain the best way 
>>>>>>>> possible
>>>>>>>> as of today with Cassandra? By best way I mean is it possible with one
>>>>>>>> request? If Yes, then how? If not, then what is the next best way to 
>>>>>>>> solve
>>>>>>>> this?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> kant
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why does Cassandra need to have 2B column limit? why can't we have unlimited ?

Reply via email to