On Mon, Feb 28, 2011 at 7:39 PM, Thejas M Nair <te...@yahoo-inc.com> wrote:

>  Hi Charles,
> Which load function are you using ?
>
I'm using a UD load function ..

Is the default (PigStorage?).
>
Nops ...


>  In the hadoop counters for the job in the jobtracker ui, do you see the
> expected number of input records being read?
>
Is possible to see the counter in the history interface on JobTracker?
I will run the jobs again to compare the counter, but my guess is probably
not!

-Thejas
>
>
>
>
> On 2/28/11 10:57 AM, "Charles Gonçalves" <charles...@gmail.com> wrote:
>
> I'm not using any filtering in the script.
> I'm just want to see the total traffic per day in all logs.
>
> If I combine 1000 log files into  one and run the script on this log files
> I
> got the correct answer for those logs.
> But when I'm run with   all the *43458* log files I got a incorrect output.
> The correct would be an histogram for each day from 2010-10 but the result
> contain only data from 2010-10-21.
> And if I process all the logs with an awk script I got the correct answer.
>
>
> On Mon, Feb 28, 2011 at 3:29 PM, Daniel Dai <jiany...@yahoo-inc.com>
> wrote:
>
> > Not sure if I get your question. In 0.8, Pig combine small files into one
> > map, so it is possible you get less output files.
>
> This is not the problem.
> But thanks anyway!
>
> If that is your concern, you can try to disable split combine using
> > "-Dpig.splitCombination=false"
> >
> > Daniel
> >
> >
> > Charles Gonçalves wrote:
> >
> >> I tried to process a big number of small files on pig and I got a
> strange
> >> problem.
> >>
> >> 2011-02-27 00:00:58,746 [Thread-15] INFO
> >>  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
> paths
> >> to process : *43458*
> >> 2011-02-27 00:00:58,755 [Thread-15] INFO
> >>  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
> >> input
> >> paths to process : *43458*
> >> 2011-02-27 00:01:14,173 [Thread-15] INFO
> >>  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
> >> input
> >> paths (combined) to process : *329*
> >>
> >> When the script finish to process, the result is just about a subgroup
> of
> >> the input files.
> >> These are logs from a whole month,  but the results are just from the
> day
> >> 21.
> >>
> >>
> >> Maybe I'm missing something.
> >> Any Ideas?
> >>
> >>
> >>
> >
> >
>
>
> --
> *Charles Ferreira Gonçalves *
> http://homepages.dcc.ufmg.br/~charles/
> UFMG - ICEx - Dcc
> Cel.: 55 31 87741485
> Tel.:  55 31 34741485
> Lab.: 55 31 34095840
>
>
>


-- 
*Charles Ferreira Gonçalves *
http://homepages.dcc.ufmg.br/~charles/
UFMG - ICEx - Dcc
Cel.: 55 31 87741485
Tel.:  55 31 34741485
Lab.: 55 31 34095840

Reply via email to