On Fri, Aug 26, 2011 at 12:18 AM, Eric Evans <eev...@acunu.com> wrote:
> On Thu, Aug 25, 2011 at 6:31 AM, Ruby Stevenson <ruby...@gmail.com> wrote: > > - Although Cassandra (and other decentralized NoSQL data store) has > > been reported to handle very large data in total, my preliminary > > understanding is the individual "column value" is quite limited. I > > have read some posts saying you shouldn't store file this big in > > Cassandra for example, use a path instead and let file system handle > > it. Is this true? > > http://wiki.apache.org/cassandra/FAQ#large_file_and_blob_storage > > -- > Eric Evans > Acunu | http://www.acunu.com | @acunu > It is important to note that even distributed storage file solutions like GlusterFS, NFS, Iscsi are not as good as local disk either. The reason is simple, best case scenario on a local file system file lives in VFS cache you maybe be talking like micro or nanoseconds to read a block. Even if not in vfs cache you are bounded by BUS speeds and disk speeds. Network disks solutions like (isci) require dedicated expensive infini-ban or ethernet networks to work well. Also that when using something like ISCI your system gets to leverage its local VFS cache so not all the read traffic has to cross the network. The best case scenario for Cassandra would be a block of data living in the row cache on a node. This data still has to traverse the network. That is going to be slower then a local file. However depending on what you are doing storing file data in cassandra could be a big win.