architecture help

2009-11-15 Thread yz5od2
Hi, a) I have a Mapper ONLY job, the job reads in records, then parses them apart. No reduce phase b) I would like this mapper job to save the record into a shared mysql database on the network. c) I am running a 4 node cluster, and obviously running out of connections very quickly, th

Re: architecture help

2009-11-15 Thread Jeff Zhang
Each map task will run in an separate JVM. So you should create connection pool for each task, And all the mapper instances in one task share the same connection pool. Another suggestion is that you can use JNDI to manger the connection . It can be shared by all the map tasks in your cluster. Je

Re: architecture help

2009-11-15 Thread Amogh Vasekar
>> I would like the connection management to live separately >>from the mapper instances per node. The JVM reuse option in Hadoop might be helpful for you in this case. Amogh On 11/16/09 6:22 AM, "yz5od2" wrote: Hi, a) I have a Mapper ONLY job, the job reads in records, then parses them apart.

Re: architecture help

2009-11-16 Thread yz5od2
Thanks all for the replies, that makes sense. I think I am allocating connection resources per-mapper, instead of per-task. How do I programatically allocate a "pool" or shared resource for a task, that all Mapper instances can have access to? 1) I have 4 nodes, each node has a map capacity

Re: architecture help

2009-11-16 Thread Jason Venner
What version of hadoop are you using? It may be that you are creating a new connection in each map call. Create your connection in the configure, and close it in the close, perhaps committing every 1000 calls in the mapper, On Mon, Nov 16, 2009 at 3:33 PM, yz5od2 wrote: > Thanks all for the repl

Re: architecture help

2009-11-16 Thread Jeff Zhang
The easiest way is making your connection pool class as the static member of your mapper class. Jeff Zhang On Mon, Nov 16, 2009 at 7:33 AM, yz5od2 wrote: > Thanks all for the replies, that makes sense. I think I am allocating > connection resources per-mapper, instead of per-task. > > How do I

Re: architecture help

2009-11-16 Thread yz5od2
Thanks all, I ended up figuring out what the issue was. I was using a static member, however I was mis-tracking the initialization/setup phase, so I was mistakenly, re-initializing the pool on every call to map(), duh, imagine the problem that caused! After fixing that, things are working f