Hi, Have you considered taking snapshot of files at close of business and compare it with the new snapshot and process only new ones? Just a simple shell script will do.
HTH Let your email find you with BlackBerry from Vodafone -----Original Message----- From: Vijaya Narayana Reddy Bhoomi Reddy <vijaya.bhoomire...@whishworks.com> Date: Wed, 25 Mar 2015 09:55:57 To: <user@hadoop.apache.org> Reply-To: user@hadoop.apache.org Subject: Identifying new files on HDFS Hi, We have a requirement to process only new files in HDFS on a daily basis. I am sure this is a general requirement in many ETL kind of processing scenarios. Just wondering if there is a way to identify new files that are added to a path in HDFS? For example, assume already some files were present for sometime. Now I have added new files today. So wanted to process only those new files. What is the best way to achieve this. Thanks & Regards Vijay *Vijay Bhoomireddy*, Big Data Architect 1000 Great West Road, Brentford, London, TW8 9DW *T: +44 20 3475 7980* *M: **+44 7481 298 360* *W: *ww <http://www.whishworks.com/>w.whishworks.com <http://www.whishworks.com/> <https://www.linkedin.com/company/whishworks> <http://www.whishworks.com/blog/> <https://twitter.com/WHISHWORKS> <https://www.facebook.com/whishworksit> -- The contents of this e-mail are confidential and for the exclusive use of the intended recipient. If you receive this e-mail in error please delete it from your system immediately and notify us either by e-mail or telephone. You should not copy, forward or otherwise disclose the content of the e-mail. The views expressed in this communication may not necessarily be the view held by WHISHWORKS.