My company is using an RDBMS for storing time-series data. This application was developed before Cassandra and NoSQL. I'd like to move to C*, but ...
The application supports data coming from multiple models of devices. Because there is enough variability in the data, the main table to hold the device data only has some core columns defined. The other columns are non-specific; a set of columns for numeric and a set for character. So for these non-specific columns, their use is defined in the code. The use of column 'numeric_1' might hold a millisecond time for one device and a fault code for another device. This appears to have been done to keep from modifying the schema whenever a new device was introduced. And they rolled their own db interface to support this mess. Now, we could just use C* like an RDBMS - defining CFs to mimic the tables. But this just pushes a bad design from one platform to another. Clearly there needs to be a code re-write. But what suggestions does anyone have on how to make this shift to C*? Would you just layout all of the columns represented by the different devices, naming them as they are used, and having jagged rows? Or is there some other way to approach this? Of course, the data miners already have scripts/methods for accessing the data from the RDBMS now in the user-unfriendly form it's in now. This would have to be addressed as well, but until I know how to store it, mining it gets ahead of things. Thanks. Les