Re: Simple- Just copying plain files into the cluster (hdfs) using flume - possible?

Bob Metelsky Mon, 02 Feb 2015 15:51:07 -0800

Steve - I appreciate you time on this...

Yes, I want to use flume to copy .xml  or .whatever files from a server
outside the cluster to hdfs. That server does l have flume installed on it


Id like the same behavior as "spooling directory" but from a remote machine
--> to hdfs

So, from all my reading flume looks like it completely designed for
streaming "live" logs and program outputs...

Doesn't seem to be known for  being a filewatcher and grabbing files as
they show up, then shiping and writing to hdfs

Of can it?

Ok I can think fragmentation with individual "small" files but doesn't
"spool directory behaviour" face the same issue?

I've done quite a bit of reading but one can easily get into the weeds :) -
All I need to do is this simple task.

Thanks



On Mon, Feb 2, 2015 at 5:17 PM, Steve Morin <[email protected]> wrote:

> So you want 1 to 1 replication of the logs to HDFS?
>
> As a footnote people usually don't do this because the log files are often
> too small (think fragmentation) which causes performance problems when used
> on Hadoop
>
> On Feb 2, 2015, at 13:30, Bob Metelsky <[email protected]> wrote:
>
> Hi I have a simple requirement
>
> on server1 (NOT in the cluster, but has flume installed)
> I have a process that constantly generates xml files in a known directory
>
> I need to transfer them to server2 (IN the hadoop cluster)
> and into hdfs as xml files
>
> from what Im reading avro, thrift rpc, et all - are designed for other uses
>
> Is there a way to have flume just copy over plain files? txt, xml...
> Im thinking there should be but I cant find it
>
> The closest I see is the "spooling directory" but that seems to be the
> files are already inside the cluster.
>
> Can flume do this? Is there an example,I've read the flume documentation
> and nothing is jumping out
>
> Thanks!
>
>

Re: Simple- Just copying plain files into the cluster (hdfs) using flume - possible?

Reply via email to