Re: Remote access to cluster using user as hadoop

2009-07-22 Thread Mathias Herberts
On Thu, Jul 23, 2009 at 06:49, Palleti, Pallavi wrote: > Hi all, > > > > We figured out that anyone who have configured their local hadoop with > remote cluster hadoop details and having user name as hadoop can get > administrative rights  of the cluster. For example, if I create an user > as hadoo

Re: Spill failed error

2009-07-22 Thread Vibhooti Verma
This mainly happens when you do not have enough space. Please clean up and run again. On Tue, Jul 21, 2009 at 10:33 PM, George Pang wrote: > Hi users, > > Please help with this one - I got an error at running a two - node cluster > on big files, the error is : > > 2365222 [main] ERROR > org.apac

Re: Testing Mappers in Hadoop 0.20.0

2009-07-22 Thread David Hall
For what it's worth, we ended up solving this problem (today) by using EasyMock with ClassExtension. It's an awful lot of magic, but it seems to work just fine for our purposes. It would be great if doing bytecode weaving under the hood weren't necessary just to write test code, though. -- David

RE: Output of a Reducer as a zip file?

2009-07-22 Thread Amogh Vasekar
Does MultipleOutputFormat suffice? Cheers! Amogh -Original Message- From: Mark Kerzner [mailto:markkerz...@gmail.com] Sent: Thursday, July 23, 2009 6:24 AM To: core-u...@hadoop.apache.org Subject: Output of a Reducer as a zip file? Hi, my output consists of a number of binary files, cor

Re: Remote access to cluster using user as hadoop

2009-07-22 Thread Ted Dunning
Do not allow direct access to the hadoop cluster from untrusted machines. Also, until further security measures are implemented, hadoop trusts the origin machine and library to identify the user correctly. Soon there will be a better level of authentication, but for now that is it. This works ou

Remote access to cluster using user as hadoop

2009-07-22 Thread Palleti, Pallavi
Hi all, We figured out that anyone who have configured their local hadoop with remote cluster hadoop details and having user name as hadoop can get administrative rights of the cluster. For example, if I create an user as hadoop locally in my machine and have conf directory details from the cl

Re: Testing Mappers in Hadoop 0.20.0

2009-07-22 Thread Aaron Kimball
er, +CC mapreduce-dev... - A On Wed, Jul 22, 2009 at 8:17 PM, Aaron Kimball wrote: > +CC mapred-dev > > Hm.. Making this change is actually really difficult. > > After changing Mapper.java, I understand why this was made a > non-static member.  By making Context non-static, it can inherit from > M

Re: Testing Mappers in Hadoop 0.20.0

2009-07-22 Thread Aaron Kimball
+CC mapred-dev Hm.. Making this change is actually really difficult. After changing Mapper.java, I understand why this was made a non-static member. By making Context non-static, it can inherit from MapContext and bind to the type qualifiers already specified in the class definition. So you can'

Re: Testing Mappers in Hadoop 0.20.0

2009-07-22 Thread Aaron Kimball
Both of those are good points. I'll submit a patch. - Aaron On Wed, Jul 22, 2009 at 6:24 PM, Ted Dunning wrote: > To amplify David's point, why is the argument a Mapper.Context rather than > MapContext? > > Also, why is the Mapper.Context not static? > > On Wed, Jul 22, 2009 at 5:29 PM, David Hall

Re: Restarting a killed job from where it left

2009-07-22 Thread Mithila Nagendra
Is this a property introduced by Hadoop version 0.19.0? Where can I find out more about this? Thanks! Mithila On Tue, Jul 14, 2009 at 8:16 PM, akhil1988 wrote: > > Thanks, Tom this was what I was looking for. > Just to confirm it's usage - it means that upon jobtracker restart it will > automoti

Re: Testing Mappers in Hadoop 0.20.0

2009-07-22 Thread Ted Dunning
To amplify David's point, why is the argument a Mapper.Context rather than MapContext? Also, why is the Mapper.Context not static? On Wed, Jul 22, 2009 at 5:29 PM, David Hall wrote: > This is nice, but doesn't it suffer from the same problem? MRUnit uses > the mapred API, which is deprecated, a

Output of a Reducer as a zip file?

2009-07-22 Thread Mark Kerzner
Hi, my output consists of a number of binary files, corresponding text files, and one descriptor file. Is there a way to for my reducer to produce a zip of all binary files, another zip of all text ones, and a separate text descriptor? If not, how close to this can I get? For example, I could code

Re: Amazon Elastic MapReduce and S3

2009-07-22 Thread Hitchcock, Andrew
I second Todd's recommendation. Elastic MapReduce currently doesn't have a mechanism for users to change mapred.tasktracker.map.tasks.maximum. However, by default we run more mappers per core than is generally recommended, because we've found it results in better performance in the EC2/S3 enviro

Re: Testing Mappers in Hadoop 0.20.0

2009-07-22 Thread David Hall
This is nice, but doesn't it suffer from the same problem? MRUnit uses the mapred API, which is deprecated, and the new API doesn't use OutputCollector, but a non-static inner class. -- David On Wed, Jul 22, 2009 at 4:52 PM, Aaron Kimball wrote: > Hi David, > > I wrote a contrib module called MRU

Re: Testing Mappers in Hadoop 0.20.0

2009-07-22 Thread Jakob Homan
It looks like there's quite a bit more documentation about MRUnit on the Cloudera site that's not included in the regular documentation. Looks like about twice as much. It would be great if this could be added to the content that's in mrunit/doc Thanks, Jakob Aaron Kimball wrote: Hi David

Re: Eclipse plugin for Hadoop Pipes?

2009-07-22 Thread Aaron Kimball
If there is one, it's not contributed back to the public project. My guess is probably not :( - Aaron On Wed, Jul 22, 2009 at 10:00 AM, Alberto Luengo Cabanillas wrote: > Hi everyone! Does anybody know if there´s an Eclipse plugin for developing > programs in C/C++ and submit them as Jobs to a had

Re: Testing Mappers in Hadoop 0.20.0

2009-07-22 Thread Aaron Kimball
Hi David, I wrote a contrib module called MRUnit (http://issues.apache.org/jira/browse/hadoop-5518) designed to allow unit tests for mappers/reducers more easily. It's slated for inclusion in 0.21, not 0.20 unfortunately, but you can download the patch above as well as MAPREDUCE-680 and build it a

Re: Amazon Elastic MapReduce and S3

2009-07-22 Thread Todd Lipcon
On Wed, Jul 22, 2009 at 4:20 PM, Hitchcock, Andrew wrote: > We don't have hard numbers on S3 transfer rates. The cluster-wide transfer > rate depends on a number of factors such as instance type, cluster size, and > general network congestion. > Your mileage of course will vary based on the fact

Re: Amazon Elastic MapReduce and S3

2009-07-22 Thread Hitchcock, Andrew
We don't have hard numbers on S3 transfer rates. The cluster-wide transfer rate depends on a number of factors such as instance type, cluster size, and general network congestion. I'm curious why you think S3 won't work for your use case. Would you like to elaborate? As I described in the previ

Testing Mappers in Hadoop 0.20.0

2009-07-22 Thread David Hall
Hi, I'm a student working with Apache Mahout for the Google Summer of Code. We recently moved to 0.20.0, and I was porting my code to the new API. Unfortunately, I (and the whole project team) seem to have run into a problem when it comes to testing them. Historically, we would create a Mapper in

Re: How to make data available in 10 minutes.

2009-07-22 Thread Ariel Rabkin
They're designed to take a few minutes and seem to in operations here and at Yahoo. Details, of course, will vary depending on data volumes and hardware. More benchmarks welcome. :) --Ari On Mon, Jul 20, 2009 at 3:04 AM, zsongbo wrote: > Hi Ari, > > Thanks. > In Chukwa, how about the performance

JobTracker not started [ipc.RemoteException] when starting Hadoop 0.20.0 cluster on Amazon EC2 using contrib/ec2 scripts

2009-07-22 Thread Jeyendran Balakrishnan
Hello, I downloaded Hadoop 0.20.0 and used the src/contrib/ec2/bin scripts to launch a Hadoop cluster on Amazon EC2. To do so, I modified the bundled scripts above for my EC2 account, and then created my own Hadoop 0.20.0 AMI. The steps I followed for creating AMIs and launching EC2 Hadoop cluster

Re: generate task timeline figures like "Hadoop Sorts a Petabyte..." blog

2009-07-22 Thread Owen O'Malley
On Jul 22, 2009, at 8:22 AM, Rares Vernica wrote: Hello, I wonder how did the Yahoo! developers generate the Task Timeline figures in their "Hadoop Sorts a Petabyte..." blog post: The script is at: http://people.apache.org/~omalley/tera-2009/job_history_summary.py The input data is the job

Re: generate task timeline figures like "Hadoop Sorts a Petabyte..." blog

2009-07-22 Thread Miles Osborne
nope, if i recall the data is randomly generated (the task itself requires fixed-length binary strings to be sorted) Miles 2009/7/22 Harish Mallipeddi > On Wed, Jul 22, 2009 at 8:52 PM, Rares Vernica wrote: > > > Hello, > > > > I wonder how did the Yahoo! developers generate the Task Timeline

Re: generate task timeline figures like "Hadoop Sorts a Petabyte..." blog

2009-07-22 Thread Harish Mallipeddi
On Wed, Jul 22, 2009 at 8:52 PM, Rares Vernica wrote: > Hello, > > I wonder how did the Yahoo! developers generate the Task Timeline > figures in their "Hadoop Sorts a Petabyte..." blog post: > > > http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_sorts_a_petabyte_in_162.html > > I am intere

generate task timeline figures like "Hadoop Sorts a Petabyte..." blog

2009-07-22 Thread Rares Vernica
Hello, I wonder how did the Yahoo! developers generate the Task Timeline figures in their "Hadoop Sorts a Petabyte..." blog post: http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_sorts_a_petabyte_in_162.html I am interested to know the following two aspects: 1. How did they collect the da

Re: Benchmarks

2009-07-22 Thread Steve Loughran
JQ Hadoop wrote: I'm wondering where once can get the pagerank implementation for a try. Thanks, http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/extras/citerank/ works over the citeceer citation dataset

Re: best way to set memory

2009-07-22 Thread Fernando Padilla
But right now the script forcefully adds and extra -Xmx1000m even if you don't want it.. I guess I'll be submitting a patch for hadoop-daemon.sh later. :) :) thank you all On 7/22/09 2:25 AM, Amogh Vasekar wrote: I haven't played a lot with it, but you may want to check if setting HADOOP_NA

RE: Issue with HDFS Client when datanode is temporarily unavailable

2009-07-22 Thread Palleti, Pallavi
Hi all, In simple terms, Why is any output stream that failed to close when the datanodes weren't available fails when I try to close the same again when the datanodes are available? Could someone kindly help me to tackle this situation? Thanks Pallavi -Original Message- From: Palleti, P

Re: Benchmarks

2009-07-22 Thread JQ Hadoop
I'm wondering where once can get the pagerank implementation for a try. Thanks, -JQ On Wed, Jul 22, 2009 at 6:14 PM, Steve Loughran wrote: > Owen O'Malley wrote: > >> >> On Jul 21, 2009, at 8:28 AM, Ted Dunning wrote: >> >> There are already several such efforts. >>> >>> Pig has PigMix >>> >>>

Benchmarks

2009-07-22 Thread Steve Loughran
Owen O'Malley wrote: On Jul 21, 2009, at 8:28 AM, Ted Dunning wrote: There are already several such efforts. Pig has PigMix Hadoop has terasort and likely some others as well. Hadoop has the terasort, and grid mix. There is even a new version of the grid mix coming out. Look at: https:/

RE: best way to set memory

2009-07-22 Thread Amogh Vasekar
I haven't played a lot with it, but you may want to check if setting HADOOP_NAMENODE_OPTS, HADOOP_TASKTRACKER_OPTS help. Let me know if you find a way to do this :) Cheers! Amogh -Original Message- From: Fernando Padilla [mailto:f...@alum.mit.edu] Sent: Wednesday, July 22, 2009 9:47 AM

Re: Question on setting a new user variable with JobConf

2009-07-22 Thread Xine Jar
Great guys, thank you a lot it is working now. On Wed, Jul 22, 2009 at 3:55 AM, Aaron Kimball wrote: > And regarding your desire to set things on the command line: If your > program implements Tool and is launched via ToolRunner, you can > specify "-D myparam=myvalue" on the command line and it

Re: JobTracker crashing

2009-07-22 Thread Mathias De Maré
Hi, 2009/7/22 Mathias De Maré > I went over the steps, and it looks like I did the same (only I didn't > create a dedicated user and I didn't disable IPv6, since I can use it here). > Oh, and I noticed one more thing: when I start Hadoop by running > bin/start-dfs.sh, wait about 20 seconds for e

Re: JobTracker crashing

2009-07-22 Thread Mathias De Maré
Hi, On Fri, Jul 17, 2009 at 5:37 PM, Bogdan M. Maryniuk < bogdan.maryn...@gmail.com> wrote: > 2009/7/17 Mathias De Maré : > > I'm using Hadoop 0.20.0 (semidistributed mode, or whatever it's called -- > I > > can't look up the name, since the documentation on the site seems to be > > down), and I