If you can NFS mount that directory to your local machine with flume it sounds like what you've listed out would work well.
On Thu, Apr 17, 2014 at 2:54 AM, Something Something < [email protected]> wrote: > If I am going to 'rsync' a file from remote host & copy it to hdfs via > Flume, then why use Flume? I can rsync & then just do a 'hadoop fs -put', > no? I must be missing something. I guess, the only benefit of using Flume > is that I can add Interceptors if I want to. Current requirements don't > need that. We just want to copy data as is. > > Here's the real use case: An application is writing to xyz.log file. > Once this file gets over certain size it gets rolled over to xyz1.log & so > on. Kinda like Log4j. What we really want is as soon as a line gets > written to xyz.log, it should go to HDFS via Flume. > > Can I do something like this? > > 1) Share the log directory under Linux. > 2) Use > test1.sources.mylog.type = exec > test1.sources.mylog.command = tail -F /home/user1/shares/logs/xyz.log > > I believe this will work, but is this the right way? Thanks for your help. > > > > > > On Wed, Apr 16, 2014 at 5:51 PM, Laurance George < > [email protected]> wrote: > >> Agreed with Jeff. Rsync + cron ( if it needs to be regular) is probably >> your best bet to ingest files from a remote machine that you only have read >> access to. But then again you're sorta stepping outside of the use case of >> flume at some level here as rsync is now basically a part of your flume >> topology. However, if you just need to back-fill old log data then this is >> perfect! In fact, it's what I do myself. >> >> >> On Wed, Apr 16, 2014 at 8:46 PM, Jeff Lord <[email protected]> wrote: >> >>> The spooling directory source runs as part of the agent. >>> The source also needs write access to the files as it renames them upon >>> completion of ingest. Perhaps you could use rsync to copy the files >>> somewhere that you have write access to? >>> >>> >>> On Wed, Apr 16, 2014 at 5:26 PM, Something Something < >>> [email protected]> wrote: >>> >>>> Thanks Jeff. This is useful. Can the spoolDir be on a different >>>> machine? We may have to setup a different process to copy files into >>>> 'spoolDir', right? Note: We have 'read only' access to these files. Any >>>> recommendations about this? >>>> >>>> >>>> On Wed, Apr 16, 2014 at 5:16 PM, Jeff Lord <[email protected]> wrote: >>>> >>>>> http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source >>>>> >>>>> >>>>> On Wed, Apr 16, 2014 at 5:14 PM, Something Something < >>>>> [email protected]> wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> Needless to say I am newbie to Flume, but I've got a basic flow >>>>>> working in which I am importing a log file from my linux box to hdfs. I >>>>>> am >>>>>> using >>>>>> >>>>>> a1.sources.r1.command = tail -F /var/log/xyz.log >>>>>> >>>>>> which is working like a stream of messages. This is good! >>>>>> >>>>>> Now what I want to do is copy log files from a directory on a remote >>>>>> machine on a regular basis. For example: >>>>>> >>>>>> username@machinename:/var/log/logdir/<multiple files> >>>>>> >>>>>> One way to do it is to simply 'scp' files from the remote directory >>>>>> into my box on a regular basis, but what's the best way to do this in >>>>>> Flume? Please let me know. >>>>>> >>>>>> Thanks for the help. >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >> >> -- >> Laurance George >> > > -- Laurance George
