The answer to this questions is very much dependent on the throughput, desired latency and access patters (R/W or R/O)? In general what I have seen working for high throughput environment is to either use a distributed file system like Ceph/Gluster or object store like S3 and keep the pointer in the NoSQL database Cassandra, Dynamo etc.NoSQL DBs are mostly log structured and require compaction frequently, which for high throughput environment proves to be quite devastating.
On Wed, Jan 20, 2016 at 9:59 AM, Kevin Burton <bur...@spinn3r.com> wrote: > There's also the 'support' issue.. C* is hard enough as it is... maybe you > can bring in another system like ES or HDFS but the more you bring in the > more your complexity REALLY goes through the roof. > > Better to keep things simple. > > I really like the chunking idea for C*... seems like an easy way to store > tons of data. > > On Tue, Jan 19, 2016 at 4:13 PM, Robert Coli <rc...@eventbrite.com> wrote: > >> On Tue, Jan 19, 2016 at 2:07 PM, Richard L. Burton III < >> mrbur...@gmail.com> wrote: >> >>> I would ask why do this over say HDFS, S3, etc. seems like this problem >>> has been solved with other solutions that are specifically designed for >>> blob storage? >>> >> >> HDFS's default block size is 64mb. If you are storing objects smaller >> than this, that might be bad! It also doesn't have http transport, which >> other things do. >> >> Etc.. >> >> =Rob >> >> > > > > -- > > We’re hiring if you know of any awesome Java Devops or Linux Operations > Engineers! > > Founder/CEO Spinn3r.com > Location: *San Francisco, CA* > blog: http://burtonator.wordpress.com > … or check out my Google+ profile > <https://plus.google.com/102718274791889610666/posts> > >