Re: task jvm bootstrapping via distributed cache

2012-08-03 Thread Arun C Murthy
Stan, You can ask TT to create a symlink to your jar shipped via DistCache: http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html#DistributedCache That should give you what you want. hth, Arun On Jul 30, 2012, at 3:23 PM, Stan Rosenberg wrote: Hi, I am seeking a way to

Re: Reading fields from a Text line

2012-08-03 Thread Harsh J
That is not really a bug. Only if you use @Override will you be really asserting that you've overriden the right method (since new API uses inheritance instead of interfaces). Without that kinda check, its easy to make mistakes and add in methods that won't get considered by the framework (and

Re: Reading fields from a Text line

2012-08-03 Thread Bejoy KS
That is a good pointer Harsh. Thanks a lot. But if IdentityMapper is being used shouldn't the job.xml reflect that? But Job.xml always shows mapper as our CustomMapper. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Harsh J ha...@cloudera.com Date:

Re: Reading fields from a Text line

2012-08-03 Thread Harsh J
Bejoy, In the new API, the default map() function, if not properly overridden, is the identity map function. There is no IdentityMapper class in the new API, the Mapper class itself is identity by default. On Fri, Aug 3, 2012 at 1:07 PM, Bejoy KS bejoy.had...@gmail.com wrote: That is a good

Re: Reading fields from a Text line

2012-08-03 Thread Bejoy KS
Ok Got it now. That is a good piece of information. Thank You :) Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Harsh J ha...@cloudera.com Date: Fri, 3 Aug 2012 16:28:27 To: mapreduce-user@hadoop.apache.org; bejoy.had...@gmail.com Cc: Mohammad

Newest version of Hadoop?

2012-08-03 Thread Andrew.Botelho
What is the newest API for Hadoop and MapReduce? What number version is it? Andrew Botelho EMC Corporation 55 Constitution Blvd., Franklin, MA andrew.bote...@emc.com Mobile: 508-813-2026

Re: task jvm bootstrapping via distributed cache

2012-08-03 Thread Stan Rosenberg
Arun, I don't believe the symlink is of help. The symlink is created in the task's current working directory (cwd), but I don't know what cwd is when I launch with 'hadoop jar ...'. Thanks, stan On Fri, Aug 3, 2012 at 2:39 AM, Arun C Murthy a...@hortonworks.com wrote: Stan, You can ask TT

Re: task jvm bootstrapping via distributed cache

2012-08-03 Thread Harsh J
Stan, What Arun says would surely work. For instance, read this command: hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.0.jar pi -files share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.0.0.jar#foo.jar -Dmapred.child.java.opts=-javaagent:./foo.jar 1 1 What this would do

Re: Newest version of Hadoop?

2012-08-03 Thread Harsh J
Hi, The latest release package available is 2.0.0 (with 2.0.1 expected to be out with some minor fixes and a security fix soon). Its API docs are here: http://hadoop.apache.org/common/docs/current/api/ On Fri, Aug 3, 2012 at 9:34 PM, andrew.bote...@emc.com wrote: What is the newest API for

Re: task jvm bootstrapping via distributed cache

2012-08-03 Thread Stan Rosenberg
On Fri, Aug 3, 2012 at 1:31 PM, Harsh J ha...@cloudera.com wrote: What this would do is merely take your passed -files jar (client-common) and symlink it into the JVM's working directory (the task's working directory) _before_ the JVM is begun, as foo.jar. So if I pass additionally, JVM opts

RE: DBOutputWriter timing out writing to database

2012-08-03 Thread Jarus, Nathan
Thanks for the alternatives, but I'd ideally like to do all this inside the MR job itself as I want to be able to programmatically run it regularly, and any additional steps just add complexity. Looking through sample code on Google, I never see anybody using the Progressable passed in to the

Re: task jvm bootstrapping via distributed cache

2012-08-03 Thread Arun C Murthy
Just do -javaagent:./profiler.jar? On Aug 3, 2012, at 9:32 AM, Stan Rosenberg wrote: Arun, I don't believe the symlink is of help. The symlink is created in the task's current working directory (cwd), but I don't know what cwd is when I launch with 'hadoop jar ...'. Thanks, stan

Re: task jvm bootstrapping via distributed cache

2012-08-03 Thread Stan Rosenberg
On Fri, Aug 3, 2012 at 4:19 PM, Arun C Murthy a...@hortonworks.com wrote: Just do -javaagent:./profiler.jar? Yep, that should work. Thanks!

Re: Issue with Hadoop Streaming

2012-08-03 Thread Subir S
In streaming contents of the file will be streamed to mapper through STDIN, not the file names. Fix the perl script accordingly. Thanks, Subir On 8/3/12, Devi Kumarappan kpala...@att.net wrote: After specifying NLineInputFormat option, streaming job fails with Error from