Btw, also checkout Avro's MapReduce components. Its a much better
serialization framework, and you'll have lesser issues figuring out
datatypes to use + more performance from good use of codecs.
On Wed, Jul 20, 2011 at 11:37 AM, Harsh J wrote:
> If your key is a "fixed" one of four attributes, wh
If your key is a "fixed" one of four attributes, why not simply use an
ArrayWritable of Text objects, over a MapWritable?
On Wed, Jul 20, 2011 at 5:32 AM, Choonho Son wrote:
> I am newbie.
>
> Most of example shows that,
> job.setOutputKeyClass(Text.class);
>
> is it possible job.setOutputKeyClas
unsubscribe
Hi,
As per my knowledege is concerned
MapWritable doesn't implement Comparable, so I think it cannot be used as a
key. If u want that functionality, then u have to have a subclass that
implements Comparable and there u will define your key comparison logic.or
the other option would be to use Sorted
I am newbie.
Most of example shows that,
job.setOutputKeyClass(Text.class);
is it possible job.setOutputKeyClass(MapWritable.class);
because my key is combination of values(src IP, src Port, dst Port,
dst IP), so I want to use MapWritable as a key.
example code is like:
MapWritable mkey = new
Is this reproducible? If so, I'd urge you to check your local disks...
Arun
On Jul 19, 2011, at 12:41 PM, Kai Ju Liu wrote:
> Hi Marcos. The issue appears to be the following. A reduce task is unable to
> fetch results from a map task on HDFS. The map task is re-run, but the map
> task is now
Interesting to see the upper bound for Hadoop.
However I guess this is a rare problem.
I'll try to implement what we discussed so far and train myself.
Regards,
Em
Am 19.07.2011 21:40, schrieb Steve Lewis:
> If the size of a record is too big to be processed by a node you
> probably need to re-a
Hi Marcos. The issue appears to be the following. A reduce task is unable to
fetch results from a map task on HDFS. The map task is re-run, but the map
task is now unable to retrieve information that it needs to run. Here is the
error from the second map task:
java.io.FileNotFoundException:
/mnt/h
If the size of a record is too big to be processed by a node you probably
need to re-architect using a different
record which scales better and combines cleanly
You also need to ask at the start what data you need to retrieve and how you
intend to retrieve it-
at some point a database may start to
Of course it won't scale or at least not as good as your suggested
model. Chances are good that my idea is not an option for a
production-system and not as usefull as the less-complex variant. So you
are right!
The reason why I asked was to get an idea of what should be done, if a
record is too bi
I assumed the problem was count the number of people visiting Moscow after
London without considering iany intermediate stops. This leads to a data
structure which is easy to combine. The structure you propose adds more
information and is difficult to combine. I doubt it could handle a billion
peop
Thanks!
So you invert the data and than walk through each inverted result.
Good point!
What do you think about prefixing each city-name with the index in the list?
This way you can say:
London: 1_Moscow:2, 1_Paris:2, 2_Moscow:1, 2_Riga:4, 2_Paris:1,
3_Berlin:1...
>From this list you can see that
Assume Joe visits Washington, London, Paris and Moscow
You start with records like
Joe:Washington:20-Jan-2011
Joe:London:14-Feb2011
Joe:Paris :9-Mar-2011
You want
Joe: Washington, London, Paris and Moscow
For the next step the person is irrelevant
you want
Washington: London:1, Paris:1 ,Mosco
Hi Steven,
thanks for your response! For the ease of use we can make those
assumptions you made - maybe this makes it much easier to help. Those
little extras are something for after solving the "easy" version of the
task. :)
What do you mean with the following?
> The second job takes Person : l
It is a little unclear what you start with and where you want to end up.
Let us assume that you have a collection of triplets of
person : place : time
we might imagine this information stored on a line of text.
It somewhat simplifies the problem to assume that the number of places
visited by one p
Yes,
we can set mapred.tasktracker.map.tasks.maximum for each node .
Thanks & Regards
Rajesh Putta
M Tech CSE
IIIT-H
On Tue, Jul 19, 2011 at 6:36 PM, Mohamed Riadh Trad
wrote:
> Hi,
>
> I am running hadoop on a cluster with nodes having different
> configurations. Is it possible to set specific
Hi,
I am running hadoop on a cluster with nodes having different configurations. Is
it possible to set specific mapred.tasktracker.map.tasks.maximum for each node?
Bests,
Trad Mohamed Riadh, M.Sc, Ing.
PhD. student
INRIA-TELECOM PARISTECH - ENPC School of International Management
Office: 11-15
17 matches
Mail list logo