Thanks for the inputs. Now my question is how should the app populate the duplicate data, i.e., if I have an employee record (along with his FN, LN,..) for the month of Apr and later I am populating the same record for the month of may (with salary changed), should my application first read/fetch the corresponding data for apr and re-insert with modification for month of may?
Regards, Seenu. On Tue, Jul 7, 2015 at 11:32 AM, Peer, Oded <oded.p...@rsa.com> wrote: > The data model suggested isn’t optimal for the “end of month” query you > want to run since you are not querying by partition key. > > The query would look like “select EmpID, FN, LN, basic from salaries where > month = 1” which requires filtering and has unpredictable performance. > > > > For this type of query to be fast you can use the “month” column as the > partition key and the “EmpID” and the clustering column. > > This approach also has drawbacks: > > 1. This data model creates a wide row. Depending on the number of > employees this partition might be very large. You should limit partition > sizes to 25MB > > 2. Distributing data according to month means that only a small number of > nodes will hold all of the salary data for a specific month which might > cause hotspots on those nodes. > > > > Choose the approach that works best for you. > > > > > > *From:* Carlos Alonso [mailto:i...@mrcalonso.com] > *Sent:* Monday, July 06, 2015 7:04 PM > *To:* user@cassandra.apache.org > *Subject:* Re: Example Data Modelling > > > > Hi Srinivasa, > > > > I think you're right, In Cassandra you should favor denormalisation when > in RDBMS you find a relationship like this. > > > > I'd suggest a cf like this > > CREATE TABLE salaries ( > > EmpID varchar, > > FN varchar, > > LN varchar, > > Phone varchar, > > Address varchar, > > month integer, > > basic integer, > > flexible_allowance float, > > PRIMARY KEY(EmpID, month) > > ) > > > > That way the salaries will be partitioned by EmpID and clustered by month, > which I guess is the natural sorting you want. > > > > Hope it helps, > > Cheers! > > > Carlos Alonso | Software Engineer | @calonso > <https://twitter.com/calonso> > > > > On 6 July 2015 at 13:01, Srinivasa T N <seen...@gmail.com> wrote: > > Hi, > > I have basic doubt: I have an RDBMS with the following two tables: > > Emp - EmpID, FN, LN, Phone, Address > Sal - Month, Empid, Basic, Flexible Allowance > > My use case is to print the Salary slip at the end of each month and > the slip contains emp name and his other details. > > Now, if I want to have the same in cassandra, I will have a single cf > with emp personal details and his salary details. Is this the right > approach? Should we have the employee personal details duplicated each > month? > > Regards, > Seenu. > > >