Re: Hadoop Streaming job Fails - Permission Denied error

2011-09-13 Thread Jeremy Lewi
them? The env program can't find them and that's >> probably why your scripts with shbang don't run. >> >> On Tue, Sep 13, 2011 at 1:12 PM, Bejoy KS wrote: >> > Thanks Jeremy. But I didn't follow 'redirect "stdout" to "stderr" a

Re: Hadoop Streaming job Fails - Permission Denied error

2011-09-12 Thread Jeremy Lewi
nd you a pointer. J On Mon, Sep 12, 2011 at 8:27 AM, Bejoy KS wrote: > Thanks Jeremy. I tried with your first suggestion and the mappers ran into > completion. But then the reducers failed with another exception related to > pipes. I believe it may be due to permission issues again. I

Re: Hadoop Streaming job Fails - Permission Denied error

2011-09-12 Thread Jeremy Lewi
I would suggest you try putting your mapper/reducer py files in a directory that is world readable at every level . i.e /tmp/test. I had similar problems when I was using streaming and I believe my workaround was to put the mapper/reducers outside my home directory. The other more involved alternat

Emit an entire file

2011-06-28 Thread Jeremy Cunningham
contains all the files that meet my criteria. Thanks, Jeremy

Re: mapreduce and python

2011-06-20 Thread Jeremy Lewi
Hassen, I've been very succesful using Hadoop Streaming, Dumbo, and TypedBytes as a solution for using python to implement mappers and reducers. TypedBytes is a hadoop encoding format that allows binary data (including lists and maps) to be encoded in a format that permits the serialized data to

Re: Sequence file format in python and serialization

2011-06-02 Thread Jeremy Lewi
JJ If you want to use complex types in a streaming job I think you need to encode the values using the typedbytes format within the sequence file; i.e the key and value in the sequence file are both typedbytes writable. This is independent of the language the mapper and reducer is written in becau

Re: Problems with LinuxTaskController, LocalJobRunner, and localRunner directory

2011-05-06 Thread jeremy
Thanks Todd. Unfortunately, I'm using Hadoop cascading, so I'm not sure if there's an easy mechanism to force LocalJobs it fires off to use a different configuration. I'll talk to the Cascading folks and find out. J Quoting Todd Lipcon : Hi Jeremy, That'

Problems with LinuxTaskController, LocalJobRunner, and localRunner directory

2011-05-06 Thread jeremy
next time I restart the daemons, the task tracker will fail because it can't rename /var/lib/hadoop-0.20/cache/pseudo/localRunner. Does anybody have suggestions how to fix this? Thanks Jeremy

Re: debug mapreduce basics

2011-04-28 Thread Jeremy Lewi
Mike, Check out this wiki http://code.google.com/p/hadoop-clusternet/wiki/DebuggingJobsUsingEclipse It shows how if your running in stand alone mode you can run a job in debug mode so that you can then start a remote debugging session with Eclipse. You can then step through your code. I've found

Re: what is the difference between the two packages "org.apache.hadoop.mapreduce" and "org.apache.hadoop.mapred"?

2011-04-26 Thread Jeremy Lewi
org.apache.hadoop.mapreduce is a newer api. To avoid breaking backwards compatibility the older api, org.apache.hadoop.mapred was preserved, and the newer api was just given a new name. Check out http://www.slideshare.net/sh1mmer/upgrading-to-the-new-map-reduce-api J On Tue, 2011-04-26 at 20:17

Re: Creating another map reduce task in a map function

2011-04-23 Thread Jeremy Lewi
could have another operator after that which would process each word. Jeremy On Sat, 2011-04-23 at 19:39 +0530, ranjith k wrote: > Thank u..harsh. > I have a map function. The inputFormat is line input format. i need to > run another map reduce task from the map function for each word

Can't set stream.addenvironment in job configuration file

2011-04-07 Thread Jeremy Lewi
onfiguration file is overwritten by the call jobConf._set("stream.addenvironment",addTaskEnvironment_); in StreamJob.setJobConf()? I'm using CDH3B. Thanks Jeremy

Re: IllegalArgumentException when doing fs.open with an s3n prefix path

2011-02-09 Thread Jeremy Hanna
Wow, sorry, that was just my sad excuse. Thanks again. On Feb 9, 2011, at 6:53 PM, Andrew Hitchcock wrote: > Ah, nice catch. I'll go fix that message now :) > > On Wed, Feb 9, 2011 at 4:50 PM, Jeremy Hanna > wrote: >> Bah - you're right. I don't know

Re: IllegalArgumentException when doing fs.open with an s3n prefix path

2011-02-09 Thread Jeremy Hanna
Bah - you're right. I don't know why I thought the real error was obscured, besides being distracted by "you should of" should be "you should have". Thanks and apologies... Jeremy On Feb 9, 2011, at 6:10 PM, Andrew Hitchcock wrote: > "This file sys

IllegalArgumentException when doing fs.open with an s3n prefix path

2011-02-09 Thread Jeremy Hanna
Anyone know why I would be getting an error doing a filesystem.open on a file with a s3n prefix? for the input path "s3n://backlog.dev/129664890/" - I get the following stacktrace: java.lang.IllegalArgumentException: This file system object (hdfs://ip-10-114-89-36.ec2.internal:9000) does n

Re: elastic mapreduce - custom outputformat?

2011-02-03 Thread Jeremy Hanna
it is file based, it > will write to the given file output path, else to Cassandra, DB, whatever you > specify.. > > Thanks and Regards, > Sonal > Connect Hadoop with databases, Salesforce, FTP servers and others > Nube Technologies > > > > > > &g

Re: elastic mapreduce - custom outputformat?

2011-02-01 Thread Jeremy Hanna
e to hdfs as long as you give a > path like s3:// and hdfs:// . > > Koji > > > On 2/1/11 11:13 AM, "Jeremy Hanna" wrote: > > I wanted to input from s3 but output to someplace else in aws with > elastic mapreduce. Their docs seem to only suggest that they only > read from/write to s3. Is that correct? >

elastic mapreduce - custom outputformat?

2011-02-01 Thread Jeremy Hanna
I wanted to input from s3 but output to someplace else in aws with elastic mapreduce. Their docs seem to only suggest that they only read from/write to s3. Is that correct?

elastic mapreduce - custom outputformat?

2011-02-01 Thread Jeremy Hanna
I wanted to input from s3 but output to someplace else in aws with elastic mapreduce. Their docs seem to only suggest that they only read from/write to s3. Is that correct?