Thanks Ahmed Vila. I will consider the suggestions you have mentioned above when I design the flume agent.
*Thanks & Regards,* *Shiva Ram* *Website: http://datamaking.com <http://datamaking.com>Facebook Page: www.facebook.com/datamaking <http://www.facebook.com/datamaking>* On Fri, Oct 2, 2015 at 3:12 PM, Ahmed Vila <[email protected]> wrote: > Hi Shiva, > > If your files are immutable (once the file is placed in a directory, they > won't be changed ever afterwards), then the best source to use is spooling > directory. > If the files are mutable, then avoid spooling directory source as Flume > will throw an exception and shut the source down, so you'll have to restart > it. > > You can put flume on a different server than the one where files reside > and have that folder mounted as a local folder via NFS or similar. > That isn't an option if you'll mount source folder across the firewall, > two networks or an internet. > > With exec source it's hard to achieve cross-node execution as it will have > to execute a real bash command you provide it with on a remote node. > If you still achieve it, it will be very slow due to constant SSH > negotiation. > > Either way, I would most definitely recommend to put flume on a same node > where the source folder is, or at least closest to the source like in the > same network. > That way you can minimize influence of network jitters and dropouts to the > source. All sources that pull data will fail ungracefully if they encounter > an error fetching data and you'll end up restarting flume. > > If the HDFS is cross-network or across the internet, I would suggest > bonding two flumes on both sides of a wire via AvroSink on source node and > AvroSource on destination node since they support fundamental things for > such harsh transport environment, like serialization, compression, SSL > security over a single TCP connection and a need to have only one port open > etc. > Then, you configure Flume on destination to drain via HdfsSink into the > HDFS. > > > On Fri, Oct 2, 2015 at 7:08 AM, Shiva Ram <[email protected]> > wrote: > >> Set files are placed in the remote server[not a hadoop cluster node], >> which source type is suitable for collecting these files from remote server >> to HDFS using Flume. The initial study on Flume, I came to know source type >> "Exec", "Spooling Directory" can be used to collect these file, I want to >> know whether Flume service should run the remote server[source system from >> where i want to get the data]? Thanks. >> >> *Thanks & Regards,* >> >> *Shiva Ram* >> *Website: http://datamaking.com <http://datamaking.com>Facebook Page: >> www.facebook.com/datamaking <http://www.facebook.com/datamaking>* >> >> On Fri, Oct 2, 2015 at 10:36 AM, <[email protected]> wrote: >> >>> Hi! This is the ezmlm program. I'm managing the >>> [email protected] mailing list. >>> >>> Acknowledgment: I have added the address >>> >>> [email protected] >>> >>> to the user mailing list. >>> >>> Welcome to [email protected]! >>> >>> Please save this message so that you know the address you are >>> subscribed under, in case you later want to unsubscribe or change your >>> subscription address. >>> >>> >>> --- Administrative commands for the user list --- >>> >>> I can handle administrative requests automatically. Please >>> do not send them to the list address! Instead, send >>> your message to the correct command address: >>> >>> To subscribe to the list, send a message to: >>> <[email protected]> >>> >>> To remove your address from the list, send a message to: >>> <[email protected]> >>> >>> Send mail to the following for info and FAQ for this list: >>> <[email protected]> >>> <[email protected]> >>> >>> Similar addresses exist for the digest list: >>> <[email protected]> >>> <[email protected]> >>> >>> To get messages 123 through 145 (a maximum of 100 per request), mail: >>> <[email protected]> >>> >>> To get an index with subject and author for messages 123-456 , mail: >>> <[email protected]> >>> >>> They are always returned as sets of 100, max 2000 per request, >>> so you'll actually get 100-499. >>> >>> To receive all messages with the same subject as message 12345, >>> send a short message to: >>> <[email protected]> >>> >>> The messages should contain one line or word of text to avoid being >>> treated as sp@m, but I will ignore their content. >>> Only the ADDRESS you send to is important. >>> >>> You can start a subscription for an alternate address, >>> for example "[email protected]", just add a hyphen and your >>> address (with '=' instead of '@') after the command word: >>> <[email protected]> >>> >>> To stop subscription for this address, mail: >>> <[email protected]> >>> >>> In both cases, I'll send a confirmation message to that address. When >>> you receive it, simply reply to it to complete your subscription. >>> >>> If despite following these instructions, you do not get the >>> desired results, please contact my owner at >>> [email protected]. Please be patient, my owner is a >>> lot slower than I am ;-) >>> >>> --- Enclosed is a copy of the request I received. >>> >>> Return-Path: <[email protected]> >>> Received: (qmail 43413 invoked by uid 99); 2 Oct 2015 05:06:54 -0000 >>> Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) >>> by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Oct 2015 05:06:54 >>> +0000 >>> Received: from localhost (localhost [127.0.0.1]) >>> by spamd1-us-west.apache.org (ASF Mail Server at >>> spamd1-us-west.apache.org) with ESMTP id A1269C14BD >>> for <user-sc.1443762280.dmfagcompebfcpjencib-shivaram.hadoop2015= >>> [email protected]>; Fri, 2 Oct 2015 05:06:53 +0000 (UTC) >>> X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org >>> X-Spam-Flag: NO >>> X-Spam-Score: 3.131 >>> X-Spam-Level: *** >>> X-Spam-Status: No, score=3.131 tagged_above=-999 required=6.31 >>> tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, >>> FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=3, >>> RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, >>> URIBL_BLOCKED=0.001] >>> autolearn=disabled >>> Authentication-Results: spamd1-us-west.apache.org (amavisd-new); >>> dkim=pass (2048-bit key) header.d=gmail.com >>> Received: from mx1-us-east.apache.org ([10.40.0.8]) >>> by localhost (spamd1-us-west.apache.org [10.40.0.7]) >>> (amavisd-new, port 10024) >>> with ESMTP id CjJlyeYvk98Y >>> for <user-sc.1443762280.dmfagcompebfcpjencib-shivaram.hadoop2015= >>> [email protected]>; >>> Fri, 2 Oct 2015 05:06:49 +0000 (UTC) >>> Received: from mail-ig0-f180.google.com (mail-ig0-f180.google.com >>> [209.85.213.180]) >>> by mx1-us-east.apache.org (ASF Mail Server at >>> mx1-us-east.apache.org) with ESMTPS id D4FBA42B32 >>> for <user-sc.1443762280.dmfagcompebfcpjencib-shivaram.hadoop2015= >>> [email protected]>; Fri, 2 Oct 2015 05:06:48 +0000 (UTC) >>> Received: by igxx6 with SMTP id x6so9676936igx.1 >>> for <user-sc.1443762280.dmfagcompebfcpjencib-shivaram.hadoop2015= >>> [email protected]>; Thu, 01 Oct 2015 22:06:42 -0700 (PDT) >>> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; >>> d=gmail.com; s=20120113; >>> >>> h=mime-version:in-reply-to:references:date:message-id:subject:from:to >>> :content-type; >>> bh=W4CNcckri44NbE1Oxr7dX2Sqd3SyZ+fbygPB84QfoW4=; >>> >>> b=U5ECXsUfh+BabyrKs3fWSkau4ItIQmhGMFojV40mE9Wmd9njMInTSCoHP0tKetDy9W >>> >>> 3wOkHIUKhlcJN1V8Q2XVLXvQ9pxsgOXIBh6CJLKuWW+ROySftRYURLypX8kvjl480Uvp >>> >>> iosJBrfG9VCP6WGaRTFqLr7ncGr7kSafiAlnUYnfkK9j6DgZZMv31gynAD+uyjQYgmI9 >>> >>> U01YKPiG0nzWf2usFbSFS0ZwNU0iPCeWGzWZsTi4irbpOJGwh0H1bfORasby80kg2VPW >>> >>> ECUbqM8luLRGqp+JigZzSB6nmMdTiWjFrVjFdVDc1a2MMqZH7Bx9/0f3STIglhFTYolj >>> CtvA== >>> MIME-Version: 1.0 >>> X-Received: by 10.50.70.98 with SMTP id l2mr2264433igu.52.1443762402446; >>> Thu, >>> 01 Oct 2015 22:06:42 -0700 (PDT) >>> Received: by 10.107.15.210 with HTTP; Thu, 1 Oct 2015 22:06:42 -0700 >>> (PDT) >>> In-Reply-To: <[email protected]> >>> References: <[email protected]> >>> Date: Fri, 2 Oct 2015 10:36:42 +0530 >>> Message-ID: <CAA8xGAEzME9N= >>> [email protected]> >>> Subject: Re: confirm subscribe to [email protected] >>> From: Shiva Ram <[email protected]> >>> To: user-sc.1443762280.dmfagcompebfcpjencib-shivaram.hadoop2015= >>> [email protected] >>> Content-Type: multipart/alternative; >>> boundary=047d7b3a959223534105211821a4 >>> >>> >> > > > -- > > Best regards, > Ahmed Vila | Senior software developer > DevLogic | Sarajevo | Bosnia and Herzegovina > > Office : +387 33 942 123 > Mobile: +387 62 139 348 > > Website: www.devlogic.eu > E-mail : [email protected] > --------------------------------------------------------------------- > This e-mail and any attachment is for authorised use by the intended > recipient(s) only. This email contains confidential information. It should > not be copied, disclosed to, retained or used by, any party other than the > intended recipient. Any unauthorised distribution, dissemination or copying > of this E-mail or its attachments, and/or any use of any information > contained in them, is strictly prohibited and may be illegal. If you are > not an intended recipient then please promptly delete this e-mail and any > attachment and all copies and inform the sender directly via email. Any > emails that you send to us may be monitored by systems or persons other > than the named communicant for the purposes of ascertaining whether the > communication complies with the law and company policies. > > --------------------------------------------------------------------- > This e-mail and any attachment is for authorised use by the intended > recipient(s) only. This email contains confidential information. It should > not be copied, disclosed to, retained or used by, any party other than the > intended recipient. Any unauthorised distribution, dissemination or copying > of this E-mail or its attachments, and/or any use of any information > contained in them, is strictly prohibited and may be illegal. If you are > not an intended recipient then please promptly delete this e-mail and any > attachment and all copies and inform the sender directly via email. Any > emails that you send to us may be monitored by systems or persons other > than the named communicant for the purposes of ascertaining whether the > communication complies with the law and company policies.
