Hi,
I have a general question concerning the Cassandra technology. I already
read 2 books but after all I am more and more confused about the
question if Cassandra is the right technology. My goal is to store
Business Data form a workflow engine into Cassandra. I want to use
Cassandra as a kind of archive service because of its fault tolerant and
decentralized approach.
But here are two things which are confusing me. On the one hand the
project claims that a single column value can be 2 GB (1 MB is
recommended). On the other hand people explain that a partition should
not be larger than 100MB.
I plan only one single simple table:
CREATE TABLE documents (
created text,
id text,
data text,
PRIMARY KEY (created,id)
);
'created' is the partition key holding the date in ISO fomat
(YYYY-MM-DD). The 'id' is a clustering key and is unique.
But my 'data' column holds a XML document with business data. This cell
contains many unstructured data and also media data. The data cell will
be between 1 and 10 MB. BUT it can also hold more than 100MB and less
than 2GB in some cases.
Is Cassandra able to handle this kind of table? Or is Cassandra at the
end not recommended for this kind of data?
For example I would like to ask if data for a specific date is available :
SELECT created,id WHERE created = '2018-06-10'
I select without the data column and just ask if data exists. Is the
performance automatically poor only because the data cell (no primary
key) of some rows is grater then 100MB? Or is cassandra running out of
heap space in any case? It is perfectly clear that it makes no sense to
select multiple cells which each contain over 100 MB of data in one
single query. But this is a fundamental problem and has nothing to do
with Cassandra. My java application running in Wildfly would also not be
able to handle a data result with multiple GB of data. But I would
expect hat I can select a set of keys just to decide whether to load one
single data cell.
Cassandra seems like a great system. But many people seem to claim that
it is only suitable for mapping a user status list ala Facebook? Is this
true? Thanks for you comments in advance.
===
Ralph