Re: Largest 'sensible' value

2012-04-03 Thread Jonathan Ellis
We use 2MB chunks for our CFS implementation of HDFS: http://www.datastax.com/dev/blog/cassandra-file-system-design On Mon, Apr 2, 2012 at 4:23 AM, Franc Carter franc.car...@sirca.org.au wrote: Hi, We are in the early stages of thinking about a project that needs to store data that will be

Re: Largest 'sensible' value

2012-04-03 Thread Franc Carter
On Wed, Apr 4, 2012 at 8:56 AM, Jonathan Ellis jbel...@gmail.com wrote: We use 2MB chunks for our CFS implementation of HDFS: http://www.datastax.com/dev/blog/cassandra-file-system-design thanks On Mon, Apr 2, 2012 at 4:23 AM, Franc Carter franc.car...@sirca.org.au wrote: Hi, We

Largest 'sensible' value

2012-04-02 Thread Franc Carter
Hi, We are in the early stages of thinking about a project that needs to store data that will be accessed by Hadoop. One of the concerns we have is around the Latency of HDFS as our use case is is not for reading all the data and hence we will need custom RecordReaders etc. I've seen a couple of

Re: Largest 'sensible' value

2012-04-02 Thread Ben Coverston
This is a difficult question to answer for a variety of reasons, but I'll give it a try, maybe it will be helpful, maybe not. The most obvious problem with this is that Thrift is buffer based, not streaming. That means that whatever the size of your chunk it needs to be received, deserialized,

Re: Largest 'sensible' value

2012-04-02 Thread Franc Carter
On Tue, Apr 3, 2012 at 4:18 AM, Ben Coverston ben.covers...@datastax.comwrote: This is a difficult question to answer for a variety of reasons, but I'll give it a try, maybe it will be helpful, maybe not. The most obvious problem with this is that Thrift is buffer based, not streaming. That