Steve - I appreciate you time on this... Yes, I want to use flume to copy .xml or .whatever files from a server outside the cluster to hdfs. That server does l have flume installed on it
Id like the same behavior as "spooling directory" but from a remote machine --> to hdfs So, from all my reading flume looks like it completely designed for streaming "live" logs and program outputs... Doesn't seem to be known for being a filewatcher and grabbing files as they show up, then shiping and writing to hdfs Of can it? Ok I can think fragmentation with individual "small" files but doesn't "spool directory behaviour" face the same issue? I've done quite a bit of reading but one can easily get into the weeds :) - All I need to do is this simple task. Thanks On Mon, Feb 2, 2015 at 5:17 PM, Steve Morin <[email protected]> wrote: > So you want 1 to 1 replication of the logs to HDFS? > > As a footnote people usually don't do this because the log files are often > too small (think fragmentation) which causes performance problems when used > on Hadoop > > On Feb 2, 2015, at 13:30, Bob Metelsky <[email protected]> wrote: > > Hi I have a simple requirement > > on server1 (NOT in the cluster, but has flume installed) > I have a process that constantly generates xml files in a known directory > > I need to transfer them to server2 (IN the hadoop cluster) > and into hdfs as xml files > > from what Im reading avro, thrift rpc, et all - are designed for other uses > > Is there a way to have flume just copy over plain files? txt, xml... > Im thinking there should be but I cant find it > > The closest I see is the "spooling directory" but that seems to be the > files are already inside the cluster. > > Can flume do this? Is there an example,I've read the flume documentation > and nothing is jumping out > > Thanks! > >
