I was wondering if I could have a bit more insight as why we are seeing
different insertion times between regular column families and super columns.
We have a group object (with its name) that may have a series of attributes
(name/value). There can be up a million group object and different groups can
share several attributes. In our first design we had a super column we have the
column path as
ColumnPath ("Index", [attribute value], [group name]) and row key is
the attribute name. The value
we are inserting is an empty byte array
In the second design we simply our model and
ColumnPath ("Index", null, [group name]) and the row key is simply the
attribute name concatenated with the attribute value. The value inserted
again is an empty array
In the first case we, inserting 250K group it took about 1.5 hours and in the
second case it took 45 minutes. In both tests, we started Cassandra with no
data, using OPP in two nodes (each 16 core 64 GB)
We are wondering why inserting when using super columns we get lower
performance.
Thanks,
Carlos
This email message and any attachments are for the sole use of the intended
recipients and may contain proprietary and/or confidential information which
may be privileged or otherwise protected from disclosure. Any unauthorized
review, use, disclosure or distribution is prohibited. If you are not an
intended recipient, please contact the sender by reply email and destroy the
original message and any copies of the message as well as any attachments to
the original message.