good to know... this puppy does scale :) and hadoop is awesome for what
it does...
Ashish
-Original Message-
From: Elia Mazzawi [mailto:[EMAIL PROTECTED]
Sent: Wednesday, June 11, 2008 11:54 AM
To: core-user@hadoop.apache.org
Subject: Re: hadoop benchmarked, too slow to use
we
Yes. That does count as huge.
Congratulations!
On Wed, Jun 11, 2008 at 11:53 AM, Elia Mazzawi <[EMAIL PROTECTED]>
wrote:
>
> we concatenated the files to bring them close to and less than 64mb and the
> difference was huge without changing anything else
> we went from 214 minutes to 3 minutes !
, June 10, 2008 5:00 PM
To: core-user@hadoop.apache.org
Subject: Re: hadoop benchmarked, too slow to use
so it would make sense for me to configure hadoop for smaller chunks?
Elia Mazzawi wrote:
yes chunk size was 64mb, and each file has some data it used 7
mappers
and 1 reducer.
10X th
one, I don't think you
will be
able to do better than the unix command - the data set is too
small.
Ashish
-Original Message-
From: Elia Mazzawi [mailto:[EMAIL PROTECTED] Sent:
Tuesday, June 10, 2008 5:00 PM
To: core-user@hadoop.apache.org
Subject: Re: hadoop benchmarked, too
ge-
From: Elia Mazzawi [mailto:[EMAIL PROTECTED] Sent:
Tuesday, June 10, 2008 5:00 PM
To: core-user@hadoop.apache.org
Subject: Re: hadoop benchmarked, too slow to use
so it would make sense for me to configure hadoop for smaller chunks?
Elia Mazzawi wrote:
yes chunk size was 64mb, and each
will be
able to do better than the unix command - the data set is too small.
Ashish
-Original Message-
From: Elia Mazzawi [mailto:[EMAIL PROTECTED]
Sent: Tuesday, June 10, 2008 5:00 PM
To: core-user@hadoop.apache.org
Subject: Re: hadoop benchmarked, too slow to use
so it would ma
TED]
> Sent: Tuesday, June 10, 2008 5:00 PM
> To: core-user@hadoop.apache.org
> Subject: Re: hadoop benchmarked, too slow to use
>
> so it would make sense for me to configure hadoop for smaller chunks?
>
> Elia Mazzawi wrote:
>>
>> yes chunk size was 64mb, and each
t: Re: hadoop benchmarked, too slow to use
so it would make sense for me to configure hadoop for smaller chunks?
Elia Mazzawi wrote:
>
> yes chunk size was 64mb, and each file has some data it used 7 mappers
> and 1 reducer.
>
> 10X the data took 214 minutes
> vs 26 minutes fo
On Jun 10, 2008, at 3:56 PM, Elia Mazzawi wrote:
Hello,
we were considering using hadoop to process some data,
we have it set up on 8 nodes ( 1 master + 7 slaves)
we filled the cluster up with files that contain tab delimited data.
string \tab string etc
then we ran the example grep with a re
so it would make sense for me to configure hadoop for smaller chunks?
Elia Mazzawi wrote:
yes chunk size was 64mb, and each file has some data
it used 7 mappers and 1 reducer.
10X the data took 214 minutes
vs 26 minutes for the smaller set
i uploaded the same data 10 times in different direct
TED]
Sent: Tuesday, June 10, 2008 4:26 PM
To: core-user@hadoop.apache.org
Subject: Re: hadoop benchmarked, too slow to use
yes there was only 1 reducer, how many should i try ?
Joydeep Sen Sarma wrote:
how many reducers? Perhaps u are defaulting to one reducer.
One variable is how fast the
-user@hadoop.apache.org
Subject: Re: hadoop benchmarked, too slow to use
yes there was only 1 reducer, how many should i try ?
Joydeep Sen Sarma wrote:
> how many reducers? Perhaps u are defaulting to one reducer.
>
> One variable is how fast the java regex evaluation is wrt to sed. One
> option
yes chunk size was 64mb, and each file has some data
it used 7 mappers and 1 reducer.
10X the data took 214 minutes
vs 26 minutes for the smaller set
i uploaded the same data 10 times in different directories ( so more
files, same size )
Ashish Thusoo wrote:
Apart from the setup times, the
Why not do a little experiment and see what the timing results are when
using a range of reducers
eg 1, 2, 5, 7, 13
Miles
2008/6/11 Elia Mazzawi <[EMAIL PROTECTED]>:
>
> yes there was only 1 reducer, how many should i try ?
>
>
>
> Joydeep Sen Sarma wrote:
>
>> how many reducers? Perhaps u are
yes there was only 1 reducer, how many should i try ?
Joydeep Sen Sarma wrote:
how many reducers? Perhaps u are defaulting to one reducer.
One variable is how fast the java regex evaluation is wrt to sed. One
option is to use hadoop streaming and use ur sed fragment as the mapper.
That will b
I could rerun the benchmark with a single node server to see what happens.
my concern is, the 8 node setup was 10X slower than the bash command, so
I was starting to suspect that the cluster is not running properly, but
everything looks good in the logs. no timeouts and such.
Miles Osborne wro
Apart from the setup times, the fact that you have 3500 files means that
you are going after around 220GB of data as each file would have atleast
one chunk (this calculation is assuming a chunk size of 64MB and this
assumes that each file has atleast some data). Mappers would probably
need to read
how many reducers? Perhaps u are defaulting to one reducer.
One variable is how fast the java regex evaluation is wrt to sed. One
option is to use hadoop streaming and use ur sed fragment as the mapper.
That will be another way of measuring hadoop overhead that eliminates
some variables.
Hadoop a
I compared the 2 results they were the same,
for the system command the sed before the sort, is working properly, i
did ctrl V then tab to input a tab character in the terminal, and viewed
the result its stripping out the rest of the data okay.
Ashish Venugopal wrote:
Just a small note (doe
I suspect that many people are using Hadoop with a moderate number of nodes
and expecting to see a win over a sequential, single node version. The
result (and I've seen this too) is typically that the single node version
wins hands-down.
Apart from speeding-up the Hadoop job (eg via compression,
Just a small note (does not answer your question, but deals with your
testing command), when running the system command version below, its
important to test with
sort -k 1 -t $TAB
where TAB is something like:
TAB=`echo "\t"`
to ensure that you sort by key, rather than the whole line. Sorting by
21 matches
Mail list logo