Agree with Shahab. Warm Regards, Tariq cloudfront.blogspot.com
On Tue, May 14, 2013 at 12:32 AM, Shahab Yunus <shahab.yu...@gmail.com>wrote: > The count file will be a very small file, right? Once it is generated on > HDFS, you can automate its downloading or movement anywhere you want. This > should not take much time. > > Regards, > Shahab > > > On Mon, May 13, 2013 at 2:58 PM, Mix Nin <pig.mi...@gmail.com> wrote: > >> Hi, >> >> The final count file should reside in local directory, but not in HDFS >> directory. The above scripts will store text file in HDFS directory. >> The count file would need to be sent to other team who do not work on >> HDFS. >> >> Thanks >> >> >> >> On Mon, May 13, 2013 at 11:36 AM, Mohammad Tariq <donta...@gmail.com>wrote: >> >>> If it is just counting the no. of records in a file then how about >>> having a short 3 liner : >>> LOGS= LOAD 'log'; >>> LOGS_GROUP= GROUP LOGS ALL; >>> LOG_COUNT = FOREACH LOGS_GROUP GENERATE COUNT(LOGS); >>> >>> It did the trick for me. >>> >>> Warm Regards, >>> Tariq >>> cloudfront.blogspot.com >>> >>> >>> On Mon, May 13, 2013 at 11:57 PM, Shahab Yunus >>> <shahab.yu...@gmail.com>wrote: >>> >>>> Not terribly efficient but at the top of my head: GROUP ALL and then do >>>> a COUNT (or COUNT (*). You can implement a follow-up script or add this in >>>> the existing script once the file has been generated. >>>> >>>> Regards, >>>> Shahab >>>> >>>> >>>> On Mon, May 13, 2013 at 2:16 PM, Mix Nin <pig.mi...@gmail.com> wrote: >>>> >>>>> Ok, let re modify my requirement. I should have specified in the >>>>> beginning itself. >>>>> >>>>> I need to get count of records in an HDFS file created by a PIG script >>>>> and the store the count in a text file. This should be done automatically >>>>> on a daily basis without manual intervention >>>>> >>>>> >>>>> On Mon, May 13, 2013 at 11:13 AM, Rahul Bhattacharjee < >>>>> rahul.rec....@gmail.com> wrote: >>>>> >>>>>> How about the second approach , get the application/job id which the >>>>>> pig creates and submits to cluster and then find the job output counter >>>>>> for >>>>>> that job from the JT. >>>>>> >>>>>> Thanks, >>>>>> Rahul >>>>>> >>>>>> >>>>>> On Mon, May 13, 2013 at 11:37 PM, Mix Nin <pig.mi...@gmail.com>wrote: >>>>>> >>>>>>> It is a text file. >>>>>>> >>>>>>> If we want to use wc, we need to copy file from HDFS and then use >>>>>>> wc, and this may take time. Is there a way without copying file from >>>>>>> HDFS >>>>>>> to local directory? >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> >>>>>>> On Mon, May 13, 2013 at 11:04 AM, Rahul Bhattacharjee < >>>>>>> rahul.rec....@gmail.com> wrote: >>>>>>> >>>>>>>> few pointers. >>>>>>>> >>>>>>>> what kind of files are we talking about. for text you can use wc , >>>>>>>> for avro data files you can use avro-tools. >>>>>>>> >>>>>>>> or get the job that pig is generating , get the counters for that >>>>>>>> job from the jt of your hadoop cluster. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Rahul >>>>>>>> >>>>>>>> >>>>>>>> On Mon, May 13, 2013 at 11:21 PM, Mix Nin <pig.mi...@gmail.com>wrote: >>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> What is the bets way to get the count of records in an HDFS file >>>>>>>>> generated by a PIG script. >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >