This is a difficult question to answer for a variety of reasons, but I'll
give it a try, maybe it will be helpful, maybe not.

The most obvious problem with this is that Thrift is buffer based, not
streaming. That means that whatever the size of your chunk it needs to
be received, deserialized, and processed by cassandra within a timeframe
that we call the rpc_timeout (by default this is 10 seconds).

Bigger buffers mean larger allocations, larger allocations mean that the
JVM is working harder, and  is more prone to fragmentation on the heap.

With mixed workloads (lots of high latency, large requests and many very
small low latency requests) larger buffers can also, over time, clog up the
thread pool in a way that can cause your shorter queries to have to wait
for your longer running queries to complete (to free up worker threads)
making everything slow. This isn't a problem unique to Cassandra,
everything that uses worker queues runs into some variant of this problem.

As with everything else, you'll probably need to test your specific use
case to see what 'too big' is for you.

On Mon, Apr 2, 2012 at 9:23 AM, Franc Carter <franc.car...@sirca.org.au>wrote:

>
> Hi,
>
> We are in the early stages of thinking about a project that needs to store
> data that will be accessed by Hadoop. One of the concerns we have is around
> the Latency of HDFS as our use case is is not for reading all the data and
> hence we will need custom RecordReaders etc.
>
> I've seen a couple of comments that you shouldn't put large chunks in to a
> value - however 'large' is not well defined for the range of people using
> these solutions ;-)
>
> Doe anyone have a rough rule of thumb for how big a single value can be
> before we are outside sanity?
>
> thanks
>
> --
>
> *Franc Carter* | Systems architect | Sirca Ltd
>  <marc.zianideferra...@sirca.org.au>
>
> franc.car...@sirca.org.au | www.sirca.org.au
>
> Tel: +61 2 9236 9118
>
> Level 9, 80 Clarence St, Sydney NSW 2000
>
> PO Box H58, Australia Square, Sydney NSW 1215
>
>


-- 
Ben Coverston
DataStax -- The Apache Cassandra Company

Reply via email to