over network IO.
Cheers,
Erez Katz
--- On Sat, 3/6/10, Phil McCarthy wrote:
> From: Phil McCarthy
> Subject: Parallelizing HTTP calls with MapReduce
> To: mapreduce-user@hadoop.apache.org
> Date: Saturday, March 6, 2010, 9:29 AM
> Hi,
>
> I'm new to Hadoop, an
t;
>
> No real reason to use Java/C++ here, most of the time will be spend over
> network IO.
>
>
> Cheers,
>
> Erez Katz
>
>
> --- On Sat, 3/6/10, Phil McCarthy wrote:
>
> > From: Phil McCarthy
> > Subject: Parallelizing HTTP calls with MapReduce
&g
o use Java/C++ here, most of the time will be spend over
network IO.
Cheers,
Erez Katz
--- On Sat, 3/6/10, Phil McCarthy wrote:
> From: Phil McCarthy
> Subject: Parallelizing HTTP calls with MapReduce
> To: mapreduce-user@hadoop.apache.org
> Date: Saturday, March 6, 2010,
Thanks for the detailed answer, this will be useful stuff to know once
I'm optimizing/tuning.
I'm actually still at the stage of figuring out how to approach
applying the mapreduce pattern to the task, so I'll take your
suggestion of asking again on common-user.
Thanks!
On Sun, Mar 7, 2010 at 8:
On 03/06/2010 09:29 AM, Phil McCarthy wrote:
Hi,
I'm new to Hadoop, and I'm trying to figure out the best way to use it
with EC2 to make large number of calls to a web API,
Consider using a http client library / connection that is thread-safe
potentially.
and then process
and store the resul
Hi,
I'm new to Hadoop, and I'm trying to figure out the best way to use it
with EC2 to make large number of calls to a web API, and then process
and store the results. I'm completely new to Hadoop, so I'm wondering
what's the best high-level approach, in terms of using MapReduce to
parallelize the