If I would do this, I would have have two tables; chunks and data: CREATE TABLE file_chunks { filename string, chunk int, size int, // Optional if you want to query the total size of a file. PRIMARY KEY (filename, chunk) }
CREATE TABLE chunks { filename string, chunk int, data blob, PRIMARY KEY ((filename, chunk)) } By keeping the data chunks in a separate table, you'd make sure to spread the data more evenly across the cluster. If the size of the files differ in a size this is a much better approach. Also, using `(filename, chunk)` in `data` table makes it possible for you do have a background process that makes sure to delete rows in `chunks` that no longer exist in `file_chunks`. Jens On Friday, October 21, 2016, jason zhao yang <zhaoyangsingap...@gmail.com> wrote: > 1. usually before storing object, serialization is needed, so we can know > the size. > 2. add "chunk id" as last clustering key. > > Vikas Jaiman <er.vikasjai...@gmail.com > <javascript:_e(%7B%7D,'cvml','er.vikasjai...@gmail.com');>>于2016年10月21日周五 > 下午11:46写道: > >> Thanks for your answer but I am just curious about: >> >> i)How do you identify the size of the object which you are going to chunk? >> >> ii) While reading or updating how it is going to read all those chunks? >> >> Vikas >> >> On Thu, Oct 20, 2016 at 9:25 PM, Justin Cameron <jus...@instaclustr.com >> <javascript:_e(%7B%7D,'cvml','jus...@instaclustr.com');>> wrote: >> >>> You can, but it is not really very efficient or cost-effective. You may >>> encounter issues with streaming, repairs and compaction if you have very >>> large blobs (100MB+), so try to keep them under 10MB if possible. >>> >>> I'd suggest storing blobs in something like Amazon S3 and keeping just >>> the bucket name & blob id in Cassandra. >>> >>> On Thu, 20 Oct 2016 at 12:03 Vikas Jaiman <er.vikasjai...@gmail.com >>> <javascript:_e(%7B%7D,'cvml','er.vikasjai...@gmail.com');>> wrote: >>> >>>> Hi, >>>> >>>> Normally people would like to store smaller values in Cassandra. Is >>>> there anyone using it to store for larger values (e.g 500KB or more) and if >>>> so what are the issues you are facing . I Would like to know the tweaks >>>> also which you are considering. >>>> >>>> Thanks, >>>> Vikas >>>> >>> -- >>> >>> Justin Cameron >>> >>> Senior Software Engineer | Instaclustr >>> >>> >>> >>> >>> This email has been sent on behalf of Instaclustr Pty Ltd (Australia) >>> and Instaclustr Inc (USA). >>> >>> This email and any attachments may contain confidential and legally >>> privileged information. If you are not the intended recipient, do not copy >>> or disclose its content, but please reply to this email immediately and >>> highlight the error to the sender and then immediately delete the message. >>> >>> >> >> >> -- >> > -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook <https://www.facebook.com/#!/tink.se> Linkedin <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary> Twitter <https://twitter.com/tink>