Re: Import files from a directory on remote machine

Laurance George Thu, 17 Apr 2014 07:04:51 -0700

If you can NFS mount that directory to your local machine with flume it
sounds like what you've listed out would work well.



On Thu, Apr 17, 2014 at 2:54 AM, Something Something <
[email protected]> wrote:

> If I am going to 'rsync' a file from remote host & copy it to hdfs via
> Flume, then why use Flume?  I can rsync & then just do a 'hadoop fs -put',
> no?  I must be missing something.  I guess, the only benefit of using Flume
> is that I can add Interceptors if I want to.  Current requirements don't
> need that.  We just want to copy data as is.
>
> Here's the real use case:   An application is writing to xyz.log file.
> Once this file gets over certain size it gets rolled over to xyz1.log & so
> on.  Kinda like Log4j.  What we really want is as soon as a line gets
> written to xyz.log, it should go to HDFS via Flume.
>
> Can I do something like this?
>
> 1)  Share the log directory under Linux.
> 2)  Use
> test1.sources.mylog.type = exec
> test1.sources.mylog.command = tail -F /home/user1/shares/logs/xyz.log
>
> I believe this will work, but is this the right way?  Thanks for your help.
>
>
>
>
>
> On Wed, Apr 16, 2014 at 5:51 PM, Laurance George <
> [email protected]> wrote:
>
>> Agreed with Jeff.  Rsync + cron ( if it needs to be regular) is probably
>> your best bet to ingest files from a remote machine that you only have read
>> access to.  But then again you're sorta stepping outside of the use case of
>> flume at some level here as rsync is now basically a part of your flume
>> topology.  However, if you just need to back-fill old log data then this is
>> perfect!  In fact, it's what I do myself.
>>
>>
>> On Wed, Apr 16, 2014 at 8:46 PM, Jeff Lord <[email protected]> wrote:
>>
>>> The spooling directory source runs as part of the agent.
>>> The source also needs write access to the files as it renames them upon
>>> completion of ingest. Perhaps you could use rsync to copy the files
>>> somewhere that you have write access to?
>>>
>>>
>>> On Wed, Apr 16, 2014 at 5:26 PM, Something Something <
>>> [email protected]> wrote:
>>>
>>>> Thanks Jeff.  This is useful.  Can the spoolDir be on a different
>>>> machine?  We may have to setup a different process to copy files into
>>>> 'spoolDir', right?  Note:  We have 'read only' access to these files.  Any
>>>> recommendations about this?
>>>>
>>>>
>>>> On Wed, Apr 16, 2014 at 5:16 PM, Jeff Lord <[email protected]> wrote:
>>>>
>>>>> http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source
>>>>>
>>>>>
>>>>> On Wed, Apr 16, 2014 at 5:14 PM, Something Something <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Needless to say I am newbie to Flume, but I've got a basic flow
>>>>>> working in which I am importing a log file from my linux box to hdfs.  I 
>>>>>> am
>>>>>> using
>>>>>>
>>>>>> a1.sources.r1.command = tail -F /var/log/xyz.log
>>>>>>
>>>>>> which is working like a stream of messages.  This is good!
>>>>>>
>>>>>> Now what I want to do is copy log files from a directory on a remote
>>>>>> machine on a regular basis.  For example:
>>>>>>
>>>>>> username@machinename:/var/log/logdir/<multiple files>
>>>>>>
>>>>>> One way to do it is to simply 'scp' files from the remote directory
>>>>>> into my box on a regular basis, but what's the best way to do this in
>>>>>> Flume?  Please let me know.
>>>>>>
>>>>>> Thanks for the help.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Laurance George
>>
>
>


-- 
Laurance George

Re: Import files from a directory on remote machine

Reply via email to