Reading Global Counters in Streaming?

2011-11-14 Thread Ryan Rosario
Hi, I am writing a Hadoop Streaming job in Python. I know that I can increment counters by writing a special format to sys.stderr. Is it possible to *read* the values of counters from my Python program? I am using the global counter as the denominator of a probability, and must have this value

Streaming Jobs fail/killed: Could not initialize class org.apache.log4j.LogManager

2011-10-30 Thread Ryan Rosario
Hi, I am having problems running Hadoop Streaming both 0.21.0 as well as 0.20.203.0. When I run the job using the following command, the job hangs at Map 0% Reduce 0% for about a minute and then fails with the status Killed : Could not initialize class org.apache.log4j.LogManager. This then seems

Trouble Submitting Job as another User

2010-04-03 Thread Ryan Rosario
Hi, I am trying to set up a Hadoop cluster so that any of our users can access HDFS and submit jobs and I am having trouble with this. I added a HDFS path for mapred.system.dir in mapred-site.xml as suggested in an FAQ. I start/stop the cluster with system user _hadoop. I would like to be able

Re: Trouble Submitting Job as another User

2010-04-03 Thread Ryan Rosario
/value  /property Abhishek On Sat, Apr 3, 2010 at 5:36 PM, Ryan Rosario uclamath...@gmail.com wrote: Hi, I am trying to set up a Hadoop cluster so that any of our users can access HDFS and submit jobs and I am having trouble with this. I added a HDFS path for mapred.system.dir in mapred

Outputting a Separate File for each Line of Output

2009-10-28 Thread Ryan Rosario
In Streaming tasks, how can I output a separate file with the key as the filename, for each line of output, instead of collecting it in a big file? Thanks, Ryan

Re: Streaming ignoring stderr output

2009-10-26 Thread Ryan Rosario
26, 2009 at 8:03 AM, Koji Noguchi knogu...@yahoo-inc.com wrote: This doesn't solve your stderr/stdout problem, but you can always set the timeout to be a bigger value if necessary. -Dmapred.task.timeout=__ (in milliseconds) Koji On 10/25/09 12:00 PM, Ryan Rosario uclamath...@gmail.com

Streaming ignoring stderr output

2009-10-25 Thread Ryan Rosario
I am using a Python script as a mapper for a Hadoop Streaming (hadoop 0.20.0) job, with reducer NONE. My jobs keep getting killed with task failed to respond after 600 seconds. I tried sending a heartbeat every minute to stderr using sys.stderr.write in my mapper, but nothing is being output to