If I would do this, I would have have two tables; chunks and data:

CREATE TABLE file_chunks {
  filename string,
  chunk int,
  size int, // Optional if you want to query the total size of a file.
  PRIMARY KEY (filename, chunk)
}

CREATE TABLE chunks {
  filename string,
  chunk int,
  data blob,
  PRIMARY KEY ((filename, chunk))
}

By keeping the data chunks in a separate table, you'd make sure to spread
the data more evenly across the cluster. If the size of the files differ in
a size this is a much better approach. Also, using `(filename, chunk)` in
`data` table makes it possible for you do have a background process that
makes sure to delete rows in `chunks` that no longer exist in `file_chunks`.

Jens

On Friday, October 21, 2016, jason zhao yang <zhaoyangsingap...@gmail.com>
wrote:

> 1. usually before storing object, serialization is needed, so we can know
> the size.
> 2. add "chunk id" as last clustering key.
>
> Vikas Jaiman <er.vikasjai...@gmail.com
> <javascript:_e(%7B%7D,'cvml','er.vikasjai...@gmail.com');>>于2016年10月21日周五
> 下午11:46写道:
>
>> Thanks for your answer but I am just curious about:
>>
>> i)How do you identify the size of the object which you are going to chunk?
>>
>> ii) While reading or updating how it is going to read all those chunks?
>>
>> Vikas
>>
>> On Thu, Oct 20, 2016 at 9:25 PM, Justin Cameron <jus...@instaclustr.com
>> <javascript:_e(%7B%7D,'cvml','jus...@instaclustr.com');>> wrote:
>>
>>> You can, but it is not really very efficient or cost-effective. You may
>>> encounter issues with streaming, repairs and compaction if you have very
>>> large blobs (100MB+), so try to keep them under 10MB if possible.
>>>
>>> I'd suggest storing blobs in something like Amazon S3 and keeping just
>>> the bucket name & blob id in Cassandra.
>>>
>>> On Thu, 20 Oct 2016 at 12:03 Vikas Jaiman <er.vikasjai...@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','er.vikasjai...@gmail.com');>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Normally people would like to store smaller values in Cassandra. Is
>>>> there anyone using it to store for larger values (e.g 500KB or more) and if
>>>> so what are the issues you are facing . I Would like to know the tweaks
>>>> also which you are considering.
>>>>
>>>> Thanks,
>>>> Vikas
>>>>
>>> --
>>>
>>> Justin Cameron
>>>
>>> Senior Software Engineer | Instaclustr
>>>
>>>
>>>
>>>
>>> This email has been sent on behalf of Instaclustr Pty Ltd (Australia)
>>> and Instaclustr Inc (USA).
>>>
>>> This email and any attachments may contain confidential and legally
>>> privileged information.  If you are not the intended recipient, do not copy
>>> or disclose its content, but please reply to this email immediately and
>>> highlight the error to the sender and then immediately delete the message.
>>>
>>>
>>
>>
>> --
>>
>

-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
 Twitter <https://twitter.com/tink>

Reply via email to