Thanks a lot for the reply..
On Mon, Oct 21, 2013 at 10:39 AM, Dieter De Witte wrote:
> Anseh,
>
> Let's assume that your job is fully scalable, then it should take: 100 000
> 000 / 600 000 times the amount of time of the first job, which is 1000 / 6
> = 167 times longer. This is an ideal, proba
Anseh,
Let's assume that your job is fully scalable, then it should take: 100 000
000 / 600 000 times the amount of time of the first job, which is 1000 / 6
= 167 times longer. This is an ideal, probably it will be something like
200 times. Also try using units in your questions + scientific notat
OK... thanks a lot for the link... it is so useful... ;)
On Sun, Oct 20, 2013 at 6:59 PM, Amr Shahin wrote:
> Try profiling the job (
> http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Profiling)
> And yeah the machine specs could be the reason, that's why hadoop was
> invented in the
Try profiling the job (
http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Profiling)
And yeah the machine specs could be the reason, that's why hadoop was
invented in the first place ;)
On Sun, Oct 20, 2013 at 8:39 AM, Anseh Danesh wrote:
> I try it in a small set of data, in about 6
I try it in a small set of data, in about 60 data and it does not take
too long. the execution time was reasonable. but in the set of 1
data it really works too bad. any thing else, I have 2 processors in my
machine, I think this amount of data is very huge for my processor and this
way
Try running the job locally on a small set of the data and see if it takes
too long. If so, you map code might have some performance issues
On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh wrote:
> Hi all.. I have a question.. I have a mapreduce program that get input
> from cassandra. my input is
Hi anesh,
It doesn't depend on the number of map tasks and since your reducer doesn't
start yet it doesn't depend on that as well. Maybe check the counters of
your jobs, is the number of map input records going up, if not then you're
stuck somewhere otherwise you might have a really big dataset :)
Hi all.. I have a question.. I have a mapreduce program that get input from
cassandra. my input is a little big, about 1 data. my problem is
that my program takes too long to process, but I think mapreduce is good
and fast for large volume of data. so I think maybe I have problems in
number