I don't have much experience with HDFS sink, will let experienced users answer this for you.
Hari/Johny - Expert advice please On Mon, Mar 9, 2015 at 8:26 AM, Lin Ma <[email protected]> wrote: > Thanks Ashish, > > One further question on HDFS sink. If I configure the destination directory > on HDFS to be Year Month Day Hour, etc. pattern, Flume will put the data > event it received automatically to the related directory and created new > directory with time elapsed further? Or I have to setup some Key/Value > headers event in order for HDFS sink to recognize event time and put into > appropriate time based folder? > > regards, > Lin > > On Sun, Mar 8, 2015 at 6:32 PM, Ashish <[email protected]> wrote: >> >> Your understanding is correct :) >> >> On Mon, Mar 9, 2015 at 6:54 AM, Lin Ma <[email protected]> wrote: >> > Thanks Ashish, >> > >> > Followed your guidance, and found below instructions of which have >> > further >> > questions to confirm with you, it seems we need to close the files and >> > never >> > touch it for Flume to process correctly, so not sure if it is good >> > practice >> > that -- (1) let the application write log file in existing way, like >> > hourly >> > or 5 mins pattern, (2) close and move the files to another directory as >> > input Source for Flume Agent which Flume could process as Spooling >> > Directory? >> > >> > “This source will watch the specified directory for new files, and will >> > parse events out of new files as they appear. ” >> > >> > " >> > >> > If a file is written to after being placed into the spooling directory, >> > Flume will print an error to its log file and stop processing. >> > If a file name is reused at a later time, Flume will print an error to >> > its >> > log file and stop processing. >> > >> > " >> > >> > regards, >> > Lin >> > >> > On Sun, Mar 8, 2015 at 12:23 AM, Ashish <[email protected]> wrote: >> >> >> >> Please look at following >> >> Spooling Directory Source >> >> [http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source] >> >> and >> >> HDFS Sink (http://flume.apache.org/FlumeUserGuide.html#hdfs-sink) >> >> >> >> Spooling Directory Source need immutable files, means files should not >> >> be written to once they are being consumed. In short your application >> >> cannot write to the file being read by Flume. >> >> >> >> Log format is not an issue, as long as you don't want it to be >> >> interpreted by Flume components. Since it's log assuming single log >> >> per line with line separator at the end of line. >> >> >> >> You can also look at Exec source >> >> (http://flume.apache.org/FlumeUserGuide.html#exec-source) for tailing >> >> to a file being written by application. Documentation covers details >> >> on all the links. >> >> >> >> HTH ! >> >> >> >> >> >> On Sun, Mar 8, 2015 at 12:32 PM, Lin Ma <[email protected]> wrote: >> >> > Hi Flume masters, >> >> > >> >> > I want to install Flume on a box, and consume local log file as >> >> > source >> >> > and >> >> > send to remote HDFS sink. The log format is private and text (not >> >> > Avro >> >> > or >> >> > JSON format). >> >> > >> >> > I am reading the guide on Flume and many advanced Source >> >> > configuration, >> >> > wondering for the plain local log file source, any reference samples? >> >> > And >> >> > not sure if Flume could consume the local file while the application >> >> > is >> >> > still writing the log file? Thanks. >> >> > >> >> > regards, >> >> > Lin >> >> >> >> >> >> >> >> -- >> >> thanks >> >> ashish >> >> >> >> Blog: http://www.ashishpaliwal.com/blog >> >> My Photo Galleries: http://www.pbase.com/ashishpaliwal >> > >> > >> >> >> >> -- >> thanks >> ashish >> >> Blog: http://www.ashishpaliwal.com/blog >> My Photo Galleries: http://www.pbase.com/ashishpaliwal > > -- thanks ashish Blog: http://www.ashishpaliwal.com/blog My Photo Galleries: http://www.pbase.com/ashishpaliwal
