My company is using an RDBMS for storing time-series data. This application
was developed before Cassandra and NoSQL. I'd like to move to C*, but ...

The application supports data coming from multiple models of devices.
Because there is enough variability in the data, the main table to hold the
device data only has some core columns defined. The other columns are
non-specific; a set of columns for numeric and a set for character. So for
these non-specific columns, their use is defined in the code. The use of
column 'numeric_1' might hold a millisecond time for one device and a fault
code for another device. This appears to have been done to keep from
modifying the schema whenever a new device was introduced. And they rolled
their own db interface to support this mess.

Now, we could just use C* like an RDBMS - defining CFs to mimic the tables.
But this just pushes a bad design from one platform to another.

Clearly there needs to be a code re-write. But what suggestions does anyone
have on how to make this shift to C*?

Would you just layout all of the columns represented by the different
devices, naming them as they are used, and having jagged rows? Or is there
some other way to approach this?

Of course, the data miners already have scripts/methods for accessing the
data from the RDBMS now in the user-unfriendly form it's in now. This would
have to be addressed as well, but until I know how to store it, mining it
gets ahead of things.

Thanks.

Les

Reply via email to