I've got a directory with a bunch of MapReduce data in it. I want to know
how many Key, Value pairs it contains. I could write a mapper-only
process that takes Writeable, Writeable pairs as input and updates a
counter, but it seems like this utility should already exist. Does it, or
do I have
What format is the input data in?
At first glance, I would run an identity mapper and use a
NullOutputFormat so you don't get any data written. The built in
counters already count the number of key, value pairs read in by the
mappers.
-Joey
On Fri, May 20, 2011 at 9:34 AM, W.P. McNeill
The cheapest way would be to check the counters as you write them in
the first place and keep a running score. :)
Sent from my mobile. Please excuse the typos.
On 2011-05-20, at 10:35 AM, W.P. McNeill bill...@gmail.com wrote:
I've got a directory with a bunch of MapReduce data in it. I want
The keys are Text and the values are large custom data structures serialized
with Avro.
I also have counters for the job that generates these files that gives me
this information but sometimes...Well, it's a long story. Suffice to say
that it's nice to have a post-hoc method too. :-)
The
Are you storing the data in sequence files?
-Joey
On Fri, May 20, 2011 at 10:33 AM, W.P. McNeill bill...@gmail.com wrote:
The keys are Text and the values are large custom data structures serialized
with Avro.
I also have counters for the job that generates these files that gives me
this
No.