Having participated in the design of a few of these systems being mentioned,
I'll chime in here and point out that the combination of Flume and Hive
makes CDH3 very useful for log processing and that use case is directly in
the wheelhouse of the system, especially for large collections of log files
(as search logs tend to be).

On Wed, Jul 28, 2010 at 2:59 PM, Jeremy Hanna <jeremy.hanna1...@gmail.com>wrote:

> > "As a result, we designed and built Flume...
> > (I wonder if this could deliver into Cassanda :) )
>
>
> Yes - apparently it's pretty easy to do - I was thinking of doing it but
> haven't found the time yet.
>
> https://issues.cloudera.org//browse/FLUME-20
>
> On Jul 28, 2010, at 4:30 PM, Aaron Morton wrote:
>
> >
> >> If you are looking to store web logs and then do ad hoc queries you
> might/should be using Hadoop (depending on how big your logs are)
> >
> > I agree, take a look at the Cloudera Hadopp 3 CDH3, they include an app
> called Flume for moving data...
> >
> > "As a result, we designed and built Flume. Flume is a distributed service
> that makes it very easy to collect and aggregate your data into a persistent
> store such as HDFS. Flume can read data from almost any source – log files,
> Syslog packets, the standard output of any Unix process – and can deliver it
> to a batch processing system like Hadoop or a real-time data store like
> HBase. All this can be configured dynamically from a single, central
> location – no more tedious configuration file editing and process
> restarting. Flume will collect the data from wherever existing applications
> are storing it, and whisk it away for further analysis and processing."
> >
> > (I wonder if this could deliver into Cassanda :) )
> >
> > If it's straight log file processing Hadoop may be a better fit.
> >
> > Aaron
>
>

Reply via email to