Re: General questions about Cassandra

Chris Gerken Fri, 17 Feb 2012 08:07:36 -0800

Don,

That's a good idea, but you have to be careful not to preclude the use of 
dynamic column families (e.g. CF's with time series-like schemas) which is what 
Cassandra's best at.  The right approach is to build your own "ORM"/persistence 
layer (or generate one with some tools) that can hide the API differences 
between static and dynamic CF's.  Once you're there, hadoop and Pig both come 
very close to what you're asking for.


In other words, you should be asking for a means to apply a Java method to 
selected objects (not rows) that are persisted in a Cassandra column family.

thx

- Chris
 
Chris Gerken

chrisger...@mindspring.com
512.587.5261
http://www.linkedin.com/in/chgerken



On Feb 17, 2012, at 9:35 AM, Don Smith wrote:

> Are there plans to build-in some sort of map-reduce framework into Cassandra 
> and CQL?   It seems that users should be able to apply a Java method to 
> selected rows in parallel  on the distributed Cassandra JVMs.   I believe 
> Solandra uses such an integration.
> 
> Don
> ________________________________________
> From: Alessio Cecchi [ales...@skye.it]
> Sent: Friday, February 17, 2012 4:42 AM
> To: user@cassandra.apache.org
> Subject: General questions about Cassandra
> 
> Hi,
> 
> we have developed a software that store logs from mail servers in MySQL,
> but for huge enviroments we are developing a version that store this
> data in HBase. Raw logs are, once a day, first normalized, so the output
> is like this:
> 
> username,date of login, IP Address, protocol
> username,date of login, IP Address, protocol
> username,date of login, IP Address, protocol
> [...]
> 
> and after inserted into the database.
> 
> As I was saying, for huge installation (from 1 to 10 million of logins
> per day, keep for 12 months) we are working with HBase, but I would also
> consider Cassandra.
> 
> The advantage of HBase is MapReduce which makes searching the logs very
> fast by splitting the "query" concurrently on multiple hosts.
> 
> Query will be launched from a web interface (will be few requests per
> day) and the search keys are user and time range.
> 
> But Cassandra seems less complex to manage and simply to run, so I want
> to evaluate it instead of HBase.
> 
> My question is, can also Cassandra split a "query" over the cluster like
> MapReduce? Reading on-line Cassandra seems fast in insert data but
> slower than HBase to "query". Is it really so?
> 
> We want not install Hadoop over Cassandra.
> 
> Any suggestion is welcome :-)
> 
> --
> Alessio Cecchi is:
> @ ILS ->  http://www.linux.it/~alessice/
> on LinkedIn ->  http://www.linkedin.com/in/alessice
> Assistenza Sistemi GNU/Linux ->  http://www.cecchi.biz/
> @ PLUG ->  ex-Presidente, adesso senatore a vita, http://www.prato.linux.it
> @ LOLUG ->  Socio http://www.lolug.net
>

Re: General questions about Cassandra

Reply via email to