Re: ***LIST MANAGER***

2012-08-09 Thread John Armstrong
On 08/09/2012 01:24 AM, Mike Lyon wrote: I'm takng a wild guess, but i think they are legitimately trying to unsubscribe but are clueless and don't RTFM The scariest part is that these are supposedly people with enough technical chops to set up and program to a Hadoop cluster. I suddenly do

Re: Trigger job from Java application causes ClassNotFound

2012-07-27 Thread John Armstrong
On 07/26/2012 09:20 PM, Steve Armstrong wrote: Do you mean I need to deploy the mahout jars to the lib directory of the master node? Or all the data nodes? Or is there a way to simply tell the hadoop job launcher to upload the jars itself? Every node that runs a Task (mapper or reducer) needs

Re: Do I have to sort?

2012-06-18 Thread John Armstrong
On 06/18/2012 10:19 AM, Mark Kerzner wrote: If only reducers could be told to start their work on the first maps that they see, my processing would begin to show results much earlier, before all the mappers are done. The sort/shuffle phase isn't just about ordering the keys, it's about

Re: Do I have to sort?

2012-06-18 Thread John Armstrong
On 06/18/2012 10:40 AM, Mark Kerzner wrote: that sounds very interesting, and I may implement such a workflow, but can I write back to HDFS in the mapper? In the reducer it is a standard context.write(), but it is a different context. Both Mapper.Context and Reducer.Context descend from

Re: Using a combiner

2012-03-15 Thread John Armstrong
Another important note: the combiner runs can stack. Let's say Prashant is right that the default spill number that triggers the combiner is 3, and that we have a mapper that generates 9 spills. These spills will generate 3 combiner runs, which meets the threshold again, and so we get

Re: Setting up Hadoop single node setup on Mac OS X

2012-03-05 Thread John Armstrong
On 02/27/2012 11:53 AM, W.P. McNeill wrote: You don't need any virtualization. Mac OS X is Linux and runs Hadoop as is. Nitpick: OS X is NEXTSTEP based on Mach, which is a different POSIX-compliant system from Linux.

Re: How do I synchronize Hadoop jobs?

2012-02-15 Thread John Armstrong
Actually, I think this is what Oozie is for. It seems to leap out as a great example of a forked workflow. hth On 02/15/2012 02:23 PM, W.P. McNeill wrote: Say I have two Hadoop jobs, A and B, that can be run in parallel. I have another job, C, that takes the output of both A and B as input.

Re: Wrong version of jackson parser while running a hadoop job

2012-01-23 Thread John Armstrong
On Fri, 13 Jan 2012 13:59:12 -0800 (PST), vvkbtnkr vvkbt...@yahoo.com wrote: I am running a hadoop jar and keep getting this error - java.lang.NoSuchMethodError: org.codehaus.jackson.JsonParser.getValueAsLong() Nobody seems to have answered this while I was on vacation, so... Okay, here's

Re: Debugging mapper

2011-09-15 Thread John Armstrong
On Thu, 15 Sep 2011 12:51:48 -0700, Frank Astier fast...@yahoo-inc.com wrote: I’m using IntelliJ and the WordCount example in Hadoop (which uses MiniMRCluster). Is it possible to set an IntelliJ debugger breakpoint straight into the map function of the mapper? - I’ve tried, but so far, the

Jobs failing on submit

2011-08-26 Thread John Armstrong
One of my colleagues has noticed this problem for a while, and now it's biting me. Jobs seem to be failing before every really starting. It seems to be limited (so far) to running in pseudo-distributed mode, since that's where he saw the problem and where I'm now seeing it; it hasn't come up on

Re: Jobs failing on submit

2011-08-26 Thread John Armstrong
On Fri, 26 Aug 2011 11:46:42 -0700, Ramya Sunil ra...@hortonworks.com wrote: How many tasktrackers do you have? Can you check if your tasktrackers are running and the total available map and reduce capacity in your cluster? In pseudo-distributed there's one tasktracker, which is running, and

Re: Jobs failing on submit

2011-08-26 Thread John Armstrong
On Fri, 26 Aug 2011 12:20:47 -0700, Ramya Sunil ra...@hortonworks.com wrote: Can you also post the configuration of the scheduler you are using? You might also want to check the jobtracker logs. It would help in further debugging. Where would I find the scheduler configuration? I haven't

Re: Making sure I understand HADOOP_CLASSPATH

2011-08-22 Thread John Armstrong
On Mon, 22 Aug 2011 11:01:23 -0700, W.P. McNeill bill...@gmail.com wrote: If it is, what is the proper way to make MyJar.jar available to both the Job Client and the Task Trackers? Do you mean the task trackers, or the tasks themselves? What process do you want to be able to run the code in

Re: one quesiton in the book of hadoop:definitive guide 2 edition

2011-08-05 Thread John Armstrong
On Fri, 5 Aug 2011 08:50:02 +0800 (CST), Daniel,Wu hadoop...@163.com wrote: The book also mentioned the value if mutable, I think the key might also be mutable, means as we loop each value in iterableNullWritable, the content of the key object is reset. The mutability of the value is one of

Re:Re:Re:Re: one quesiton in the book of hadoop:definitive guide 2 edition

2011-08-04 Thread John Armstrong
On Thu, 4 Aug 2011 14:07:12 +0800 (CST), Daniel,Wu hadoop...@163.com wrote: I am using the new API (released is from cloudera). We can see from the output, for each call of reduce function, 100 records were processed, but as the reduce is defined as reduce(IntPair key, IterableNullWritable

Re:Re:Re: one quesiton in the book of hadoop:definitive guide 2 edition

2011-08-03 Thread John Armstrong
On Wed, 3 Aug 2011 10:35:51 +0800 (CST), Daniel,Wu hadoop...@163.com wrote: So the key of a group is determined by the first coming record in the group, if we have 3 records in a group 1: (1900,35) 2:(1900,34) 3:(1900,33) if (1900,35) comes in as the first row, then the result key will be

Re:Re: one quesiton in the book of hadoop:definitive guide 2 edition

2011-08-02 Thread John Armstrong
On Tue, 2 Aug 2011 21:49:22 +0800 (CST), Daniel,Wu hadoop...@163.com wrote: we usually use something like values.next() to loop every rows in a specific group, but I didn't see any code to loop the list, at least it need to get the first row in the list, which is something like values.get().

Re: Using -libjar option

2011-08-01 Thread John Armstrong
On Mon, 1 Aug 2011 12:11:27 -0400, Aquil H. Abdullah aquil.abdul...@gmail.com wrote: but it still isn't clear to me how the -libjars option is parsed, whether or not I need to explicitly add it to the classpath inside my run method, or where it needs to be placed in the command-line? IIRC

Re: Using -libjar option

2011-08-01 Thread John Armstrong
On Mon, 1 Aug 2011 13:21:27 -0400, Aquil H. Abdullah aquil.abdul...@gmail.com wrote: [AA] I am currently invoking my application as follows: hadoop jar /home/test/hadoop/test.option.demo.jar test.option.demo.OptionDemo -libjar /home/test/hadoop/lib/mytestlib.jar I believe the problem might

Re: Using -libjar option

2011-08-01 Thread John Armstrong
On Mon, 1 Aug 2011 15:30:49 -0400, Aquil H. Abdullah aquil.abdul...@gmail.com wrote: Don't I feel sheepish... Happens to the best, or so they tell me. OK, so I've hacked this sample code below, from the ConfigurationPrinter example in Hadoop: The Definitive Guide. If -libjars had been added

Re: Class loading problem

2011-07-28 Thread John Armstrong
On Thu, 28 Jul 2011 10:05:57 -0400, Kumar, Ranjan ranjan.kum...@morganstanleysmithbarney.com wrote: I have a class to define data I am reading from a MySQL database. According to online tutorials I created a class called MyRecord and extended it from Writable, DBWritable. While running it with

Re: localhost permission denied

2011-07-19 Thread John Armstrong
On Tue, 19 Jul 2011 20:47:31 +0100, Kobina Kwarko kobina.kwa...@gmail.com wrote: Hello, Please any assistance?? I am using Hadoop for a school project and managed to install it on two computers testing with the wordcount example. However, after stopping Hadoop and restarting the computers

Re: Writing out a single file

2011-07-05 Thread John Armstrong
On Tue, 05 Jul 2011 08:09:16 -0700, Mark static.void@gmail.com wrote: Is there anyway I can write out the results of my mapreduce job into 1 local file... ie the opposite of getmerge? I don't know about writing directly to the local filesystem, but if you specify a single reducer you

Requiring configuration parameters in Tools

2011-06-30 Thread John Armstrong
I've got my map/reduce programs implementing Tool, so I can pass various parameters as configuration properties. I'm currently using an ad-hoc method of verifying that a program has been passed all the configuration properties it requires, but I'm wondering if there's a more Hadoopy way this

Re: Reading HDFS files via Spring

2011-06-27 Thread John Armstrong
On Sun, 26 Jun 2011 17:34:34 -0700, Mark static.void@gmail.com wrote: Hello all, We have a recommendation system that reads in similarity data via a Spring context.xml as follows: bean id=similarity class=org.apache.mahout.cf.taste.impl.similarity.file.FileItemSimilarity

Re: DistributedCache

2011-06-07 Thread John Armstrong
On Tue, 7 Jun 2011 09:41:21 -0300, Juan P. gordoslo...@gmail.com wrote: Not 100% clear on what you meant. You are saying I should put the file into my HDFS cluster or should I use DistributedCache? If you suggest the latter, could you address my original question? I mean that you can

Re: Why inter-rack communication in mapreduce slow?

2011-06-06 Thread John Armstrong
On Mon, 06 Jun 2011 09:18:45 -0400, dar...@ontrenet.com wrote: I never understood how hadoop can throttle an inter-rack fiber switch. Its supposed to operate on the principle of move-the-code to the data because of the I/O cost of moving the data, right? But what happens when a reducer on rack

Re: Why inter-rack communication in mapreduce slow?

2011-06-06 Thread John Armstrong
On Mon, 06 Jun 2011 09:26:11 -0400, dar...@ontrenet.com wrote: I'm not a hadoop jedi, but in that case, wouldn't one of the hadoop trackers get bottlenecked to resolve those dependencies? Again, this exposes the oddity of hadoop IMO, it tries to NOT be I/O bound, but seems its very I/O

Re: Why inter-rack communication in mapreduce slow?

2011-06-06 Thread John Armstrong
On Mon, 06 Jun 2011 09:34:56 -0400, dar...@ontrenet.com wrote: Yeah, that's a good point. I wonder though, what the load on the tracker nodes (port et. al) would be if a inter-rack fiber switch at 10's of GBS' is getting maxed. Seems to me that if there is that much traffic being mitigate

Re: DistributedCache

2011-06-06 Thread John Armstrong
On Mon, 06 Jun 2011 16:14:14 -0500, Shi Yu sh...@uchicago.edu wrote: I still don't understand, in a cluster you have a shared directory to all the nodes, right? Just put the configuration file in that directory and load it in all the mappers, isn't that simple? So I still don't understand

Re: change of default port 8020

2011-06-02 Thread John Armstrong
On Thu, 02 Jun 2011 17:23:08 +0300, George Kousiouris gkous...@mail.ntua.gr wrote: Are there anywhere instructions on how to change from the default ports of Hadoop and HDFS? My main interest is in default port 8020. I think this is part of fs.default.name. You would go into core-site.xml and