Re: 200TB in Cassandra ?

Franc Carter Fri, 20 Apr 2012 19:44:45 -0700

On Sat, Apr 21, 2012 at 1:05 AM, Jake Luciani <jak...@gmail.com> wrote:


> What other solutions are you considering?  Any OLTP style access of 200TB
> of data will require substantial IO.


We currently use an in-house written database because when we first started
our system there was nothing that handled our problem economically. We
would like to use something more off the shelf to reduce maintenance and
development costs.

We've been looking at Hadoop for the computational component. However it
looks like HDFS does not map to our storage patterns well as the latency is
quite high. In addition the resilience model of the Name Node is a concern
in our environment.

We were thinking through whether using Cassandra for the Hadoop data store
is viable for us, however we've come to the conclusion that it doesn't map
well in this case.


>
> Do you know how big your working dataset will be?
>

The system is batch, jobs could range between very small up to a moderate
percentage of the data set. It' even possible that we could need to read
the entire data set. How much we get resident is a cost/performance
trade-off we need to make

cheers


>
> -Jake
>
>
> On Fri, Apr 20, 2012 at 3:30 AM, Franc Carter 
> <franc.car...@sirca.org.au>wrote:
>
>> On Fri, Apr 20, 2012 at 6:27 AM, aaron morton <aa...@thelastpickle.com>wrote:
>>
>>> Couple of ideas:
>>>
>>> * take a look at compression in 1.X
>>> http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression
>>> * is there repetition in the binary data ? Can you save space by
>>> implementing content addressable storage ?
>>>
>>
>> The data is already very highly space optimised. We've come to the
>> conclusion that Cassandra is probably not the right fit the use case this
>> time
>>
>> cheers
>>
>>
>>>
>>> Cheers
>>>
>>>
>>>   -----------------
>>> Aaron Morton
>>> Freelance Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 20/04/2012, at 12:55 AM, Dave Brosius wrote:
>>>
>>>  I think your math is 'relatively' correct. It would seem to me you
>>> should focus on how you can reduce the amount of storage you are using per
>>> item, if at all possible, if that node count is prohibitive.
>>>
>>> On 04/19/2012 07:12 AM, Franc Carter wrote:
>>>
>>>
>>>  Hi,
>>>
>>>  One of the projects I am working on is going to need to store about
>>> 200TB of data - generally in manageable binary chunks. However, after doing
>>> some rough calculations based on rules of thumb I have seen for how much
>>> storage should be on each node I'm worried.
>>>
>>>    200TB with RF=3 is 600TB = 600,000GB
>>>   Which is 1000 nodes at 600GB per node
>>>
>>>  I'm hoping I've missed something as 1000 nodes is not viable for us.
>>>
>>>  cheers
>>>
>>>  --
>>> *Franc Carter* | Systems architect | Sirca Ltd
>>>  <marc.zianideferra...@sirca.org.au>
>>> franc.car...@sirca.org.au | www.sirca.org.au
>>> Tel: +61 2 9236 9118
>>>  Level 9, 80 Clarence St, Sydney NSW 2000
>>> PO Box H58, Australia Square, Sydney NSW 1215
>>>
>>>
>>>
>>>
>>
>>
>> --
>>
>> *Franc Carter* | Systems architect | Sirca Ltd
>>  <marc.zianideferra...@sirca.org.au>
>>
>> franc.car...@sirca.org.au | www.sirca.org.au
>>
>> Tel: +61 2 9236 9118
>>
>> Level 9, 80 Clarence St, Sydney NSW 2000
>>
>> PO Box H58, Australia Square, Sydney NSW 1215
>>
>>
>
>
> --
> http://twitter.com/tjake
>



-- 

*Franc Carter* | Systems architect | Sirca Ltd
 <marc.zianideferra...@sirca.org.au>

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: 200TB in Cassandra ?

Reply via email to