Michael, Your approach solves the problem, thanks for the solution. I was thinking of another approach as well where in I would create another column family say file_archive, so whenever an update is made to the File table, I will create a new version in the File and move the old version to the new file_archive table. Please let me know if the second approach is fine.
Regards, Dawood On Wed, Sep 4, 2013 at 2:47 AM, Laing, Michael <michael.la...@nytimes.com>wrote: > I use the technique described in my previous message to handle millions of > messages and their versions. > > Actually, I use timeuuid's instead of timestamps, as they have more > 'uniqueness'. Also I index my maps by a timeuuid that is the complement > (based on a future date) of a current timeuuid. Since maps are kept sorted > by key, this means I can just pop off the first one to get the most recent. > > The downside of this approach is that you get more stuff returned to you > from Cassandra than you need. To mitigate that I queue a job to examine and > correct the situation if, upon doing a read, the number of versions for a > particular key is higher than some threshold, e.g. 50. There are many ways > to approach this problem. > > Our actual implementation proceeds to another level, as we also have > replicas of versions. This happens because we process important > transactions in parallel and can expect up to 9 replicas of each version. > We journal them all and use them for reporting latencies in our processing > pipelines as well as for replay when we need to recover application state. > > Regards, > > Michael > > > On Tue, Sep 3, 2013 at 3:15 PM, Laing, Michael > <michael.la...@nytimes.com>wrote: > >> try the following. -ml >> >> -- put this in <file> and run using 'cqlsh -f <file> >> >> DROP KEYSPACE latest; >> >> CREATE KEYSPACE latest WITH replication = { >> 'class': 'SimpleStrategy', >> 'replication_factor' : 1 >> }; >> >> USE latest; >> >> CREATE TABLE file ( >> parentid text, -- row_key, same for each version >> id text, -- column_key, same for each version >> contenttype map<timestamp, text>, -- differs by version, version is >> the key to the map >> PRIMARY KEY (parentid, id) >> ); >> >> update file set contenttype = contenttype + {'2011-03-04':'pdf1'} where >> parentid = 'd1' and id = 'f1'; >> update file set contenttype = contenttype + {'2011-03-05':'pdf2'} where >> parentid = 'd1' and id = 'f1'; >> update file set contenttype = contenttype + {'2011-03-04':'pdf3'} where >> parentid = 'd1' and id = 'f2'; >> update file set contenttype = contenttype + {'2011-03-05':'pdf4'} where >> parentid = 'd1' and id = 'f2'; >> >> select * from file where parentid = 'd1'; >> >> -- returns: >> >> -- parentid | id | contenttype >> >> ------------+----+-------------------------------------------------------------------------- >> -- d1 | f1 | {'2011-03-04 00:00:00-0500': 'pdf1', '2011-03-05 >> 00:00:00-0500': 'pdf2'} >> -- d1 | f2 | {'2011-03-04 00:00:00-0500': 'pdf3', '2011-03-05 >> 00:00:00-0500': 'pdf4'} >> >> -- use an app to pop off the latest version from the map >> >> -- map other varying fields using the same technique as used for >> contenttype >> >> >> >> On Tue, Sep 3, 2013 at 2:31 PM, Vivek Mishra <mishra.v...@gmail.com>wrote: >> >>> create table file(id text , parentid text,contenttype text,version >>> timestamp, descr text, name text, PRIMARY KEY(id,version) ) WITH CLUSTERING >>> ORDER BY (version DESC); >>> >>> insert into file (id, parentid, version, contenttype, descr, name) >>> values ('f2', 'd1', '2011-03-06', 'pdf', 'f2 file', 'file1'); >>> insert into file (id, parentid, version, contenttype, descr, name) >>> values ('f2', 'd1', '2011-03-05', 'pdf', 'f2 file', 'file1'); >>> insert into file (id, parentid, version, contenttype, descr, name) >>> values ('f1', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1'); >>> insert into file (id, parentid, version, contenttype, descr, name) >>> values ('f1', 'd1', '2011-03-04', 'pdf', 'f1 file', 'file1'); >>> create index on file(parentid); >>> >>> >>> select * from file where id='f1' and parentid='d1' limit 1; >>> >>> select * from file where parentid='d1' limit 1; >>> >>> >>> Will it work for you? >>> >>> -Vivek >>> >>> >>> >>> >>> On Tue, Sep 3, 2013 at 11:29 PM, Vivek Mishra <mishra.v...@gmail.com>wrote: >>> >>>> My bad. I did miss out to read "latest version" part. >>>> >>>> -Vivek >>>> >>>> >>>> On Tue, Sep 3, 2013 at 11:20 PM, dawood abdullah < >>>> muhammed.daw...@gmail.com> wrote: >>>> >>>>> I have tried with both the options creating secondary index and also >>>>> tried adding parentid to primary key, but I am getting all the files with >>>>> parentid 'yyy', what I want is the latest version of file with the >>>>> combination of parentid, fileid. Say below are the records inserted in the >>>>> file table: >>>>> >>>>> insert into file (id, parentid, version, contenttype, description, >>>>> name) values ('f1', 'd1', '2011-03-04', 'pdf', 'f1 file', 'file1'); >>>>> insert into file (id, parentid, version, contenttype, description, >>>>> name) values ('f1', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1'); >>>>> insert into file (id, parentid, version, contenttype, description, >>>>> name) values ('f2', 'd1', '2011-03-05', 'pdf', 'f1 file', 'file1'); >>>>> insert into file (id, parentid, version, contenttype, description, >>>>> name) values ('f2', 'd1', '2011-03-06', 'pdf', 'f1 file', 'file1'); >>>>> >>>>> I want to write a query which returns me second and last record and >>>>> not the first and third record, because for the first and third record >>>>> there exists a latest version, for the combination of id and parentid. >>>>> >>>>> I am confused If at all this is achievable, please suggest. >>>>> >>>>> Dawood >>>>> >>>>> >>>>> >>>>> On Tue, Sep 3, 2013 at 10:58 PM, Vivek Mishra >>>>> <mishra.v...@gmail.com>wrote: >>>>> >>>>>> create secondary index over parentid. >>>>>> OR >>>>>> make it part of clustering key >>>>>> >>>>>> -Vivek >>>>>> >>>>>> >>>>>> On Tue, Sep 3, 2013 at 10:42 PM, dawood abdullah < >>>>>> muhammed.daw...@gmail.com> wrote: >>>>>> >>>>>>> Jan, >>>>>>> >>>>>>> The solution you gave works spot on, but there is one more >>>>>>> requirement I forgot to mention. Following is my table structure >>>>>>> >>>>>>> CREATE TABLE file ( >>>>>>> id text, >>>>>>> contenttype text, >>>>>>> createdby text, >>>>>>> createdtime timestamp, >>>>>>> description text, >>>>>>> name text, >>>>>>> parentid text, >>>>>>> version timestamp, >>>>>>> PRIMARY KEY (id, version) >>>>>>> >>>>>>> ) WITH CLUSTERING ORDER BY (version DESC); >>>>>>> >>>>>>> >>>>>>> The query (select * from file where id = 'xxx' limit 1;) provided >>>>>>> solves the problem of finding the latest version file. But I have one >>>>>>> more >>>>>>> requirement of finding all the latest version files having parentid say >>>>>>> 'yyy'. >>>>>>> >>>>>>> Please suggest how can this query be achieved. >>>>>>> >>>>>>> Dawood >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Sep 3, 2013 at 12:43 AM, dawood abdullah < >>>>>>> muhammed.daw...@gmail.com> wrote: >>>>>>> >>>>>>>> In my case version can be timestamp as well. What do you suggest >>>>>>>> version number to be, do you see any problems if I keep version as >>>>>>>> counter >>>>>>>> / timestamp ? >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Sep 3, 2013 at 12:22 AM, Jan Algermissen < >>>>>>>> jan.algermis...@nordsc.com> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> On 02.09.2013, at 20:44, dawood abdullah < >>>>>>>>> muhammed.daw...@gmail.com> wrote: >>>>>>>>> >>>>>>>>> > Requirement is like I have a column family say File >>>>>>>>> > >>>>>>>>> > create table file(id text primary key, fname text, version int, >>>>>>>>> mimetype text, content text); >>>>>>>>> > >>>>>>>>> > Say, I have few records inserted, when I modify an existing >>>>>>>>> record (content is modified) a new version needs to be created. As I >>>>>>>>> need >>>>>>>>> to have provision to revert to back any old version whenever required. >>>>>>>>> > >>>>>>>>> >>>>>>>>> So, can version be a timestamp? Or does it need to be an integer? >>>>>>>>> >>>>>>>>> In the former case, make use of C*'s ordering like so: >>>>>>>>> >>>>>>>>> CREATE TABLE file ( >>>>>>>>> file_id text, >>>>>>>>> version timestamp, >>>>>>>>> fname text, >>>>>>>>> .... >>>>>>>>> PRIMARY KEY (file_id,version) >>>>>>>>> ) WITH CLUSTERING ORDER BY (version DESC); >>>>>>>>> >>>>>>>>> Get the latest file version with >>>>>>>>> >>>>>>>>> select * from file where file_id = 'xxx' limit 1; >>>>>>>>> >>>>>>>>> If it has to be an integer, use counter columns. >>>>>>>>> >>>>>>>>> Jan >>>>>>>>> >>>>>>>>> >>>>>>>>> > Regards, >>>>>>>>> > Dawood >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > On Mon, Sep 2, 2013 at 10:47 PM, Jan Algermissen < >>>>>>>>> jan.algermis...@nordsc.com> wrote: >>>>>>>>> > Hi Dawood, >>>>>>>>> > >>>>>>>>> > On 02.09.2013, at 16:36, dawood abdullah < >>>>>>>>> muhammed.daw...@gmail.com> wrote: >>>>>>>>> > >>>>>>>>> > > Hi >>>>>>>>> > > I have a requirement of versioning to be done in Cassandra. >>>>>>>>> > > >>>>>>>>> > > Following is my column family definition >>>>>>>>> > > >>>>>>>>> > > create table file_details(id text primary key, fname text, >>>>>>>>> version int, mimetype text); >>>>>>>>> > > >>>>>>>>> > > I have a secondary index created on fname column. >>>>>>>>> > > >>>>>>>>> > > Whenever I do an insert for the same 'fname', the version >>>>>>>>> should be incremented. And when I retrieve a row with fname it should >>>>>>>>> return me the latest version row. >>>>>>>>> > > >>>>>>>>> > > Is there a better way to do in Cassandra? Please suggest what >>>>>>>>> approach needs to be taken. >>>>>>>>> > >>>>>>>>> > Can you explain more about your use case? >>>>>>>>> > >>>>>>>>> > If the version need not be a small number, but could be a >>>>>>>>> timestamp, you could make use of C*'s ordering feature , have the >>>>>>>>> database >>>>>>>>> set the new version as a timestamp and retrieve the latest one with a >>>>>>>>> simple LIMIT 1 query. (I'll explain more when this is an option for >>>>>>>>> you). >>>>>>>>> > >>>>>>>>> > Jan >>>>>>>>> > >>>>>>>>> > P.S. Me being a REST/HTTP head, an alarm rings when I see >>>>>>>>> 'version' next to 'mimetype' :-) What exactly are you versioning here? >>>>>>>>> Maybe we can even change the situation from a functional POV? >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > > >>>>>>>>> > > Regards, >>>>>>>>> > > >>>>>>>>> > > Dawood >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >