Thanks for the inputs.

Now my question is how should the app populate the duplicate data, i.e., if
I have an employee record (along with his FN, LN,..) for the month of Apr
and later I am populating the same record for the month of may (with salary
changed), should my application first read/fetch the corresponding data for
apr and re-insert with modification for month of may?

Regards,
Seenu.

On Tue, Jul 7, 2015 at 11:32 AM, Peer, Oded <oded.p...@rsa.com> wrote:

>  The data model suggested isn’t optimal for the “end of month” query you
> want to run since you are not querying by partition key.
>
> The query would look like “select EmpID, FN, LN, basic from salaries where
> month = 1” which requires filtering and has unpredictable performance.
>
>
>
> For this type of query to be fast you can use the “month” column as the
> partition key and the “EmpID” and the clustering column.
>
> This approach also has drawbacks:
>
> 1. This data model creates a wide row. Depending on the number of
> employees this partition might be very large. You should limit partition
> sizes to 25MB
>
> 2. Distributing data according to month means that only a small number of
> nodes will hold all of the salary data for a specific month which might
> cause hotspots on those nodes.
>
>
>
> Choose the approach that works best for you.
>
>
>
>
>
> *From:* Carlos Alonso [mailto:i...@mrcalonso.com]
> *Sent:* Monday, July 06, 2015 7:04 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Example Data Modelling
>
>
>
> Hi Srinivasa,
>
>
>
> I think you're right, In Cassandra you should favor denormalisation when
> in RDBMS you find a relationship like this.
>
>
>
> I'd suggest a cf like this
>
> CREATE TABLE salaries (
>
>   EmpID varchar,
>
>   FN varchar,
>
>   LN varchar,
>
>   Phone varchar,
>
>   Address varchar,
>
>   month integer,
>
>   basic integer,
>
>   flexible_allowance float,
>
>   PRIMARY KEY(EmpID, month)
>
> )
>
>
>
> That way the salaries will be partitioned by EmpID and clustered by month,
> which I guess is the natural sorting you want.
>
>
>
> Hope it helps,
>
> Cheers!
>
>
>   Carlos Alonso | Software Engineer | @calonso
> <https://twitter.com/calonso>
>
>
>
> On 6 July 2015 at 13:01, Srinivasa T N <seen...@gmail.com> wrote:
>
> Hi,
>
>    I have basic doubt: I have an RDBMS with the following two tables:
>
>    Emp - EmpID, FN, LN, Phone, Address
>    Sal - Month, Empid, Basic, Flexible Allowance
>
>    My use case is to print the Salary slip at the end of each month and
> the slip contains emp name and his other details.
>
>    Now, if I want to have the same in cassandra, I will have a single cf
> with emp personal details and his salary details.  Is this the right
> approach?  Should we have the employee personal details duplicated each
> month?
>
> Regards,
> Seenu.
>
>
>

Reply via email to