Using the exec source with a tail -f is not considered a production solution. It mainly exists for testing purposes.
On Thu, Apr 17, 2014 at 7:03 AM, Laurance George < [email protected]> wrote: > If you can NFS mount that directory to your local machine with flume it > sounds like what you've listed out would work well. > > > On Thu, Apr 17, 2014 at 2:54 AM, Something Something < > [email protected]> wrote: > >> If I am going to 'rsync' a file from remote host & copy it to hdfs via >> Flume, then why use Flume? I can rsync & then just do a 'hadoop fs -put', >> no? I must be missing something. I guess, the only benefit of using Flume >> is that I can add Interceptors if I want to. Current requirements don't >> need that. We just want to copy data as is. >> >> Here's the real use case: An application is writing to xyz.log file. >> Once this file gets over certain size it gets rolled over to xyz1.log & so >> on. Kinda like Log4j. What we really want is as soon as a line gets >> written to xyz.log, it should go to HDFS via Flume. >> >> Can I do something like this? >> >> 1) Share the log directory under Linux. >> 2) Use >> test1.sources.mylog.type = exec >> test1.sources.mylog.command = tail -F /home/user1/shares/logs/xyz.log >> >> I believe this will work, but is this the right way? Thanks for your >> help. >> >> >> >> >> >> On Wed, Apr 16, 2014 at 5:51 PM, Laurance George < >> [email protected]> wrote: >> >>> Agreed with Jeff. Rsync + cron ( if it needs to be regular) is probably >>> your best bet to ingest files from a remote machine that you only have read >>> access to. But then again you're sorta stepping outside of the use case of >>> flume at some level here as rsync is now basically a part of your flume >>> topology. However, if you just need to back-fill old log data then this is >>> perfect! In fact, it's what I do myself. >>> >>> >>> On Wed, Apr 16, 2014 at 8:46 PM, Jeff Lord <[email protected]> wrote: >>> >>>> The spooling directory source runs as part of the agent. >>>> The source also needs write access to the files as it renames them upon >>>> completion of ingest. Perhaps you could use rsync to copy the files >>>> somewhere that you have write access to? >>>> >>>> >>>> On Wed, Apr 16, 2014 at 5:26 PM, Something Something < >>>> [email protected]> wrote: >>>> >>>>> Thanks Jeff. This is useful. Can the spoolDir be on a different >>>>> machine? We may have to setup a different process to copy files into >>>>> 'spoolDir', right? Note: We have 'read only' access to these files. Any >>>>> recommendations about this? >>>>> >>>>> >>>>> On Wed, Apr 16, 2014 at 5:16 PM, Jeff Lord <[email protected]> wrote: >>>>> >>>>>> http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source >>>>>> >>>>>> >>>>>> On Wed, Apr 16, 2014 at 5:14 PM, Something Something < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> Needless to say I am newbie to Flume, but I've got a basic flow >>>>>>> working in which I am importing a log file from my linux box to hdfs. >>>>>>> I am >>>>>>> using >>>>>>> >>>>>>> a1.sources.r1.command = tail -F /var/log/xyz.log >>>>>>> >>>>>>> which is working like a stream of messages. This is good! >>>>>>> >>>>>>> Now what I want to do is copy log files from a directory on a remote >>>>>>> machine on a regular basis. For example: >>>>>>> >>>>>>> username@machinename:/var/log/logdir/<multiple files> >>>>>>> >>>>>>> One way to do it is to simply 'scp' files from the remote directory >>>>>>> into my box on a regular basis, but what's the best way to do this in >>>>>>> Flume? Please let me know. >>>>>>> >>>>>>> Thanks for the help. >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>> >>> -- >>> Laurance George >>> >> >> > > > -- > Laurance George >
