Re: Parallelizing HTTP calls with MapReduce

2010-03-09 Thread philmccarthy
over network IO. Cheers, Erez Katz --- On Sat, 3/6/10, Phil McCarthy wrote: > From: Phil McCarthy > Subject: Parallelizing HTTP calls with MapReduce > To: mapreduce-user@hadoop.apache.org > Date: Saturday, March 6, 2010, 9:29 AM > Hi, > > I'm new to Hadoop, an

Re: Parallelizing HTTP calls with MapReduce

2010-03-08 Thread Aaron Kimball
t; > > No real reason to use Java/C++ here, most of the time will be spend over > network IO. > > > Cheers, > > Erez Katz > > > --- On Sat, 3/6/10, Phil McCarthy wrote: > > > From: Phil McCarthy > > Subject: Parallelizing HTTP calls with MapReduce &g

Re: Parallelizing HTTP calls with MapReduce

2010-03-07 Thread Erez Katz
o use Java/C++ here, most of the time will be spend over network IO. Cheers, Erez Katz --- On Sat, 3/6/10, Phil McCarthy wrote: > From: Phil McCarthy > Subject: Parallelizing HTTP calls with MapReduce > To: mapreduce-user@hadoop.apache.org > Date: Saturday, March 6, 2010,

Re: Parallelizing HTTP calls with MapReduce

2010-03-07 Thread Phil McCarthy
Thanks for the detailed answer, this will be useful stuff to know once I'm optimizing/tuning. I'm actually still at the stage of figuring out how to approach applying the mapreduce pattern to the task, so I'll take your suggestion of asking again on common-user. Thanks! On Sun, Mar 7, 2010 at 8:

Re: Parallelizing HTTP calls with MapReduce

2010-03-07 Thread Kay Kay
On 03/06/2010 09:29 AM, Phil McCarthy wrote: Hi, I'm new to Hadoop, and I'm trying to figure out the best way to use it with EC2 to make large number of calls to a web API, Consider using a http client library / connection that is thread-safe potentially. and then process and store the resul

Parallelizing HTTP calls with MapReduce

2010-03-06 Thread Phil McCarthy
Hi, I'm new to Hadoop, and I'm trying to figure out the best way to use it with EC2 to make large number of calls to a web API, and then process and store the results. I'm completely new to Hadoop, so I'm wondering what's the best high-level approach, in terms of using MapReduce to parallelize the