I appreciate both your answers. I'll use them soon. Thanks!
On Mon, Jul 4, 2011 at 11:48 AM, Silvère Lestang <silvere.lest...@gmail.com> wrote: > We do pretty much the same thing here, dynamic column with a timestamp for > column name and a different value type for each row. We use the > serialization/deserialization classes provided with Hector and store the > type of the value in the key of the row. Example of row key: > "b6c8a1e7281761e62230ea76daa3d841#INT" => every values are Integer > "7f30a6a2bbb1b921afc8216d8c5d9257#DOUBLE" => every values are Double > .... > If I'll have to do it again, I'll try to use (Dynamic)CompositeType for > value or an equivalent mechanism as suggested by Roland. > > On 3 July 2011 15:07, Roland Gude <roland.g...@yoochoose.com> wrote: >> >> You could do the serialization for all your supported datatypes yourself >> (many libraries for serialization are available and a pretty thorough >> benchmarking for them can be found here: >> https://github.com/eishay/jvm-serializers/wiki) and prepend the serialized >> bytes with an identifier for your datatype. >> This would not avoid casting though but would still be better performing >> then serializing to strings as it is done in your example. >> Prepending the values with the id seems to be better to me, because you >> can be sure that a new insertion to some field overwrites the correct column >> even if it changed the type. >> >> -----Ursprüngliche Nachricht----- >> Von: osishkin osishkin [mailto:osish...@gmail.com] >> Gesendet: Sonntag, 3. Juli 2011 13:52 >> An: user@cassandra.apache.org >> Betreff: Multi-type column values in single CF >> >> Hi all, >> >> I need to store column values that are of various data types in a >> single column family, i.e I have column values that are integers, >> others that are strings, and maybe more later. All column names are >> strings (no comparator problem for me). >> The thing is I need to store unstructured data - I do not have fixed >> and known-in-advacne column names, so I can not use a fixed static map >> for casting the values back to their original type on retrieval from >> cassandra. >> >> My immediate naive thought is to simply prefix every column name with >> the type the value needs to be cast back to. >> For example i'll do the follwing conversion to the columns of some key - >> {'attr1': 'val1','attr2': 100} ~> {'str_attr1' : 'val1', 'int_attr2' : >> '100'} >> and only then send it to cassandra. This way I know to what should I >> cast it back. >> >> But all this casting back and forth on the client side seems to me to >> be very bad for performance. >> Another option is to split the columns on dedicated column families >> with mathcing validation types - a column family for integer values, >> one for string, one for timestamp etc. >> But that does not seem very efficient either (and worse for any >> rollback mechanism), since now I have to perform several get calls on >> multiple CFs where once I had only one. >> >> I thought perhaps someone has encountered a similar situation in the >> past, and can offer some advice on the best course of action. >> >> Thank you, >> Osi >> >> > >