Re: Problem running Hadoop 0.23.0

2011-11-28 Thread Tom White
Hi Nitin, It looks like you may be using the wrong port number - try 8088 for the resource manager UI. Cheers, Tom On Mon, Nov 28, 2011 at 4:02 AM, Nitin Khandelwal wrote: > Hi, > > I was trying to setup Hadoop 0.23.0 with help of > http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoo

Re: cannot use distcp in some s3 buckets

2011-10-13 Thread Tom White
On Thu, Oct 13, 2011 at 2:06 PM, Raimon Bosch wrote: > By the way, > > The url I'm trying has a '_' in the bucket name. Could be this the problem? Yes, underscores are not permitted in hostnames. Cheers, Tom > > 2011/10/13 Raimon Bosch > >> Hi, >> >> I've been having some problems with one of

Re: updated example

2011-10-11 Thread Tom White
JobConf and the old API are no longer deprecated in the forthcoming 0.20.205 release, so you can continue to use it without issue. The equivalent in the new API is setInputFormatClass() on org.apache.hadoop.mapreduce.Job. Cheers, Tom On Tue, Oct 11, 2011 at 9:18 AM, Keith Thompson wrote: > I se

Re: Distributed cluster filesystem on EC2

2011-08-31 Thread Tom White
You might consider Apache Whirr (http://whirr.apache.org/) for bringing up Hadoop clusters on EC2. Cheers, Tom On Wed, Aug 31, 2011 at 8:22 AM, Robert Evans wrote: > Dmitry, > > It sounds like an interesting idea, but I have not really heard of anyone > doing it before.  It would make for a goo

Re: 0.21.0 - Java Class Error

2011-04-08 Thread Tom White
Hi Witold, Is this on Windows? The scripts were re-structured after Hadoop 0.20, and looking at them now I notice that the cygwin path translation for the classpath seems to be missing. You could try adding the following line to the "if $cygwin" clause in bin/hadoop-config.sh: CLASSPATH=`cygpat

Re: hadoop installation problem(single-node)

2011-03-02 Thread Tom White
The instructions at http://hadoop.apache.org/common/docs/r0.20.2/quickstart.html should be what you need. Cheers, Tom On Wed, Mar 2, 2011 at 12:59 AM, Manish Yadav wrote: > Dear Sir/Madam >  I'm very new to hadoop. I'm trying to install hadoop on my computer. I > followed a weblink and try to in

Re: Missing files in the trunk ??

2011-02-28 Thread Tom White
These files are generated files. If you run "ant avro-generate eclipse" then Eclipse should file these files. Cheers, Tom On Mon, Feb 28, 2011 at 2:43 AM, bharath vissapragada wrote: > Hi all, > > I checked out the "map-reduce" trunk a few days back  and following > files are missing.. > > impor

Re: 0.21 found interface but class was expected

2010-11-15 Thread Tom White
Hi Steve, Sorry to hear about the problems you had. The issue you hit was a result of MAPREDUCE-954, and there was some discussion on that JIRA about compatibility. I believe the thinking was that the context classes are framework classes, so users don't extend/implement them in the normal course

Re: How to stop a mapper within a map-reduce job when you detect bad input

2010-10-21 Thread Tom White
On Thu, Oct 21, 2010 at 8:23 AM, ed wrote: > Hello, > > The MapRunner classes looks promising.  I noticed it is in the deprecated > mapred package but I didn't see an equivalent class in the mapreduce > package.  Is this going to ported to mapreduce or is it no longer being > supported?  Thanks!

Re: is there no streaming.jar file in hadoop-0.21.0??

2010-10-04 Thread Tom White
Hi Ed, The directory structure moved around as a result of the project splitting into three subprojects (Common, HDFS, MapReduce). The streaming jar is in mapred/contrib/streaming in the distribution. Cheers, Tom On Mon, Oct 4, 2010 at 8:03 PM, edward choi wrote: > Hi, > I've recently downloade

Re: problem viewing task logs on ec2 via socks proxy

2010-09-20 Thread Tom White
Hi John, This question really belongs on the Cloudera list (http://getsatisfaction.com/cloudera) or the Whirr user list, but I wonder if you're seeing this because you're not using the SOCKS proxy for DNS lookups? See bottom of https://docs.cloudera.com/display/DOC/Launching+a+Cluster. Cheers Tom

Re: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FsShell

2010-09-15 Thread Tom White
Hi Mike, What do you get if you type "./hadoop classpath"? Does it contain the Hadoop common JAR? To avoid the deprecation warning you should use "hadoop fs", not "hadoop dfs". Tom On Wed, Sep 15, 2010 at 12:53 PM, Mike Franon wrote: > Hi, > > I just setup 3 node hadoop cluster using the lates

Re: Hadoop 0.21.0 release Maven repo

2010-09-10 Thread Tom White
Hi Sonal, The 0.21.0 jars are not available in Maven yet, since the process for publishing them post split has changed. See HDFS-1292 and MAPREDUCE-1929. Cheers, Tom On Fri, Sep 10, 2010 at 1:33 PM, Sonal Goyal wrote: > Hi, > > Can someone please point me to the Maven repo for 0.21 release? Tha

Re: Ivy

2010-09-03 Thread Tom White
The 0.21.0 jars are not in the Apache Maven repos yet, since the process for publishing them post split has changed. HDFS-1292 and MAPREDUCE-1929 are the tickets to fix this. Cheers, Tom On Sat, Aug 28, 2010 at 9:10 PM, Mark wrote: >  On 8/27/10 9:25 AM, Owen O'Malley wrote: >> >> On Aug 27, 201

[ANNOUNCE] Apache Hadoop 0.21.0 released

2010-08-24 Thread Tom White
Hi everyone, I am pleased to announce that Apache Hadoop 0.21.0 is available for download from http://hadoop.apache.org/common/releases.html. Over 1300 issues have been addressed since 0.20.2; you can find details at http://hadoop.apache.org/common/docs/r0.21.0/releasenotes.html http://hadoop.ap

Re: Implementing S3FileSystem#append

2010-08-12 Thread Tom White
Hi Oleg, I don't know of any plans to implement this. However, since this is a block-based storage system which uses S3, I wonder whether an implementation could use some of the logic in HDFS for block storage and append in general. Cheers, Tom On Thu, Aug 12, 2010 at 8:34 AM, Aleshko, Oleg wro

Re: Next Release of Hadoop version number and Kerberos

2010-07-07 Thread Tom White
Hi Ananth, The next release of Hadoop will be 0.21.0, but it won't have Kerberos authentication in it (since it's not all in trunk yet). The 0.22.0 release later this year will have a working version of security in it. Cheers, Tom On Wed, Jul 7, 2010 at 8:09 AM, Ananth Sarathy wrote: > > is the

Re: Hadoop 0.21 :: job.getCounters() returns null?

2010-07-07 Thread Tom White
Hi Felix, Aaron Kimball hit the same problem - it's being discussed at https://issues.apache.org/jira/browse/MAPREDUCE-1920. Thanks for reporting this. Cheers, Tom On Tue, Jul 6, 2010 at 11:26 AM, Felix Halim wrote: > I tried hadoop 0.21 release candidate. > > job.waitForCompletion(true); > Co

Re: Cloudera EC2 scripts

2010-05-28 Thread Tom White
Hi Mark, You can find the latest version of the scripts at http://archive.cloudera.com/cdh/3/hadoop-0.20.2+228.tar.gz. Documentation is at http://archive.cloudera.com/docs/ec2.html. The source code is currently in src/contrib/cloud in Hadoop Common, but is in the process of moving to a new Incuba

Re: problem w/ data load

2010-05-03 Thread Tom White
Hi Susanne, Hadoop uses the file extension to detect that a file is compressed. I believe Hive does too. Did you store the compressed file in HDFS with a .gz extension? Cheers, Tom BTW It's best to send Hive questions like these to the hive-user@ list. On Sun, May 2, 2010 at 11:22 AM, Susanne L

Re: conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20

2010-04-29 Thread Tom White
ame in each mapper)? > > Yuanyuan > > Tom White ---04/29/2010 09:42:44 AM---Hi Yuanyuan, I think you've found a bug > - could you file a JIRA issue for this please? > > > From: > Tom White > To: > common-user@hadoop.apache.org > Date: > 04/29/2010 09:42 AM

Re: conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20

2010-04-29 Thread Tom White
Hi Yuanyuan, I think you've found a bug - could you file a JIRA issue for this please? Thanks, Tom On Wed, Apr 28, 2010 at 11:04 PM, Yuanyuan Tian wrote: > > > I have a problem in getting the input file name in the mapper  when uisng > MultipleInputs. I need to use MultipleInputs to support dif

Re: File permissions on S3FileSystem

2010-04-22 Thread Tom White
Hi Danny, S3FileSystem has no concept of permissions, which is why this check fails. The change that introduced the permissions check was introduced in https://issues.apache.org/jira/browse/MAPREDUCE-181. Could you file a bug for this please? Cheers, Tom On Thu, Apr 22, 2010 at 4:16 AM, Danny Le

Re: HashMap type output from mapper

2010-04-15 Thread Tom White
Have a look at org.apache.hadoop.io.MapWritable, which is a Map for storing Writable keys and values. Cheers, Tom On Thu, Apr 15, 2010 at 3:17 PM, Eric Sammer wrote: > You need to implement a custom Writable (the serialization interface > supported by Hadoop). If you want to use your own custom

Re: JobConf.setJobEndNotificationURI

2010-03-23 Thread Tom White
I think you can set the URI on the configuration object with the key JobContext.END_NOTIFICATION_URL. Cheers, Tom On Tue, Feb 23, 2010 at 12:02 PM, Ted Yu wrote: > Hi, > I am looking for counterpart to JobConf.setJobEndNotificationURI() in > org.apache.hadoop.mapreduce > > Please advise. > > Tha

Re: Cloudera AMIs

2010-03-15 Thread Tom White
Hi Sonal, You should use the one with the later date. The Cloudera AMIs don't actually have Hadoop installed on them, just Java and some other base packages. Hadoop is installed at start up time; you can find more information at http://archive.cloudera.com/docs/ec2.html. Cheers, Tom P.S. For Clo

Re: which hadoop-ec2 is preferred ( cloudera/hadoop ? )

2010-01-17 Thread Tom White
Hi Prasen, 2) is now in the Hadoop Common repository, in src/contrib/cloud. This is where the development effort is focused, and the older bash scripts (1) will be deprecated over time (HADOOP-6403). The new cloud scripts are designed to support multiple cloud providers, as well as advanced featur

Re: Is it possible to share a key across maps?

2010-01-14 Thread Tom White
Please submit a patch for the documentation change - perhaps at https://issues.apache.org/jira/browse/HADOOP-5973. Cheers, Tom On Wed, Jan 13, 2010 at 12:09 AM, Amogh Vasekar wrote: > +1 for the documentation change in mapred-tutorial. Can we do that and > publish using a normal apache account?

Re: Implementing VectorWritable

2009-12-29 Thread Tom White
Have a look at org.apache.hadoop.io.ArrayWritable. You may be able to use this class in your application, or at least use it as a basis for writing VectorWritable. Cheers, Tom On Tue, Dec 29, 2009 at 1:37 AM, bharath v wrote: > Can you please tell me , what is the functionality of those 2 method

Re: Configuration for Hadoop running on Amazon S3

2009-12-17 Thread Tom White
If you are using S3 as your file store then you don't need to run HDFS (and indeed HDFS will not start up if you try). Cheers, Tom 2009/12/17 Rekha Joshi : > Not sure what the whole error is, but you can always alternatively try this - > >  fs.default.name >  s3://BUCKET > > > >  fs.s3.awsAcce

Re: EC2 Hadoop 0.19 image is only 3 Gigs harddrive - too small

2009-11-25 Thread Tom White
Hi Mark, The root partition is small, but there is plenty of storage on the /mnt partition. See http://aws.amazon.com/ec2/instance-types/. Cheers, Tom On Wed, Nov 25, 2009 at 12:30 PM, Mark Kerzner wrote: > Hi, > > I have started the Apache distribution of hadoop-0.19, and I noticed that > this

Re: How do I reference S3 from an EC2 Hadoop cluster?

2009-11-25 Thread Tom White
ger instance than that of the slaves? No, this is not supported, but I can see it would be useful, particularly for larger clusters. Please consider opening a JIRA for it. Cheers, Tom > > Thank you, > Mark > > On Tue, Nov 24, 2009 at 11:20 PM, Tom White wrote: > >> Mark, >

Re: Master and slaves on hadoop/ec2

2009-11-25 Thread Tom White
Correct. The master runs the namenode and jobtracker, but not a datanode or tasktracker. Tom On Tue, Nov 24, 2009 at 4:57 PM, Mark Kerzner wrote: > Hi, > > do I understand it correctly that, when I launch a Hadoop cluster on EC2, > the master will not be doing any work, and it is just for organi

Re: How do I reference S3 from an EC2 Hadoop cluster?

2009-11-24 Thread Tom White
Mark, If the data was transferred to S3 outside of Hadoop then you should use the s3n filesystem scheme (see the explanation on http://wiki.apache.org/hadoop/AmazonS3 for the differences between the Hadoop S3 filesystems). Also, some people have had problems embedding the secret key in the URI, s

Re: Apache Hadoop and Fedora, or Clouder Hadoop and Ubuntu?

2009-11-15 Thread Tom White
.@apache.org). > > Thank you, > Mark > > On Sun, Nov 15, 2009 at 10:29 PM, Tom White wrote: > >> Hi Mark, >> >> HADOOP-6108 will add Cloudera's EC2 scripts to the Apache >> distribution, with the difference that they will run Apache Hadoop. >> T

Re: Apache Hadoop and Fedora, or Clouder Hadoop and Ubuntu?

2009-11-15 Thread Tom White
Hi Mark, HADOOP-6108 will add Cloudera's EC2 scripts to the Apache distribution, with the difference that they will run Apache Hadoop. The same scripts will also support Cloudera's Distribution for Hadoop, simply by using a different boot script on the instances. So I would suggest you use these s

Re: Multiple Input Paths

2009-11-08 Thread Tom White
MultipleInputs is available from Hadoop 0.19 onwards (in org.apache.hadoop.mapred.lib, or org.apache.hadoop.mapreduce.lib.input for the new API in later versions). Tom On Wed, Nov 4, 2009 at 8:07 AM, Mark Vigeant wrote: > Amogh, > > That sounds so awesome! Yeah I wish I had that class now. Do yo

Re: Confused by new API & MultipleOutputFormats using Hadoop 0.20.1

2009-11-08 Thread Tom White
Multiple outputs has been ported to the new API in 0.21. See https://issues.apache.org/jira/browse/MAPREDUCE-370. Cheers, Tom On Sat, Nov 7, 2009 at 6:45 AM, Xiance SI(司宪策) wrote: > I just fall back to old mapred.* APIs, seems MultipleOutputs only works for > the old API. > > wishes, > Xiance >

Re: Terminate Instances Terminating ALL EC2 Instances

2009-10-19 Thread Tom White
diate workaround you can avoid calling the Hadoop cluster "default", and make sure that you don't create non-Hadoop EC2 instances in the cluster group. Thanks, Tom > > Does this help at all?  Thanks. > > -Mark > > On Mon, Oct 19, 2009 at 11:52 AM, Tom White

Re: Terminate Instances Terminating ALL EC2 Instances

2009-10-19 Thread Tom White
Hi Mark, Sorry to hear that all your EC2 instances were terminated. Needless to say, this should certainly not happen. The scripts are a Python rewrite (see HADOOP-6108) of the bash ones so HADOOP-1504 is not applicable, but the behaviour should be the same: the terminate-cluster command lists th

Re: Cascading jobs in hadoop

2009-10-02 Thread Tom White
Have a look at the JobControl class - this allows you to set up chains of job dependencies. Tom On Fri, Oct 2, 2009 at 11:29 AM, bharath v wrote: > Hi all, > > I have a set of map red jobs which need to be cascaded ,i.e, output of MR > job1 is the input of MR job2. etc.. > > Can anyone point me

Re: Map/Reduce and sequence file metadata...

2009-10-02 Thread Tom White
On Thu, Oct 1, 2009 at 5:10 PM, Andy Sautins wrote: > >   Hi all. I'm struggling a bit to figure this out and wondering if anyone had > any  pointers. > >   I'm using SequenceFiles as output from a MapReduce job ( using > SequenceFileOutputFormat ) and then in a followup MapReduce job reading in

Re: JobTracker startup failure when starting hadoop-0.20.0 cluster on Amazon EC2 with contrib/ec2 scripts

2009-09-07 Thread Tom White
Hi Jeyendran, Were there any errors reported in the datanode logs? There could be a problem with datanodes contacting the namenode, caused by firewall configuration problems (EC2 security groups). Cheers, Tom On Fri, Sep 4, 2009 at 12:17 AM, Jeyendran Balakrishnan wrote: > I downloaded Hadoop 0.

Re: Can't find TestDFSIO

2009-08-24 Thread Tom White
Hi Cam, Looks like it's in hadoop-hdfs-hdfswithmr-test-0.21.0-dev.jar, which should be built with "ant jar-test". Cheers, Tom On Mon, Aug 24, 2009 at 8:22 PM, Cam Macdonell wrote: > > Thanks Danny, > > It currently does not show up hadoop-common-test, hadoop-hdfs-test or > hadoop-mapred-test wit

Re: File Chunk to Map Thread Association

2009-08-20 Thread Tom White
Hi Roman, Have a look at CombineFileInputFormat - it might be related to what you are trying to do. Cheers, Tom On Thu, Aug 20, 2009 at 10:59 AM, roman kolcun wrote: > On Thu, Aug 20, 2009 at 10:30 AM, Harish Mallipeddi < > harish.mallipe...@gmail.com> wrote: > >> On Thu, Aug 20, 2009 at 2:39 PM

Re: Status of 0.19.2

2009-08-03 Thread Tom White
I've now updated the news section, and the documentation on the website to reflect the 0.19.2 release. There were several reports of it being more stable than 0.19.1 in the voting thread: http://www.mail-archive.com/common-...@hadoop.apache.org/msg00051.html Cheers, Tom On Tue, Jul 28, 2009 at

Re: MapFile performance

2009-08-03 Thread Tom White
On Mon, Aug 3, 2009 at 3:09 AM, Billy Pearson wrote: > > > not sure if its still there but there was a parm in the hadoop-site conf > file that would allow you to skip x number if index when reading it in to > memory. This is io.map.index.skip (default 0), which will skip this number of keys for e

Re: Reading GZIP input files.

2009-07-31 Thread Tom White
That's for the case where you want to do the decompression yourself, explicitly, perhaps when you are reading the data out of HDFS (and not using MapReduce). When using compressed data as input to a MapReduce job, Hadoop will automatically decompress them for you. Tom On Fri, Jul 31, 2009 at 5:3

Re: Recovery following disk full

2009-07-20 Thread Tom White
Is this an area where the Offline Image Viewer might be able to help in the future? It's not available for 0.18.3, but seems like it would be possible to extend it as a tool to help with c) in Todd's description. Tom On Mon, Jul 20, 2009 at 8:30 PM, Todd Lipcon wrote: > Hi Arv, > > It sounds like

Re: Using JobControl in hadoop

2009-07-20 Thread Tom White
:45 PM, Rakhi Khatwani wrote: > Hi Tom, > >           in that case, can i kill the job by givin some command from the > API?? or i ll have 2 do it frm the command line? > > On Mon, Jul 20, 2009 at 8:55 PM, Tom White wrote: > >> Hi Raakhi, >> >> You can't su

Re: Using JobControl in hadoop

2009-07-20 Thread Tom White
e any way you can suspend the job in the java program??? > > > Regards, > Raakhi > > On Fri, Jul 17, 2009 at 2:48 PM, Tom White wrote: > >> Hi Raakhi, >> >> JobControl is designed to be run from a new thread: >> >> Thread t = new Thread(jobCon

Re: Using JobControl in hadoop

2009-07-17 Thread Tom White
Hi Raakhi, JobControl is designed to be run from a new thread: Thread t = new Thread(jobControl); t.start(); Then you can run a loop to poll for job completion and print out status: String oldStatus = null; while (!jobControl.allFinished()) { String status = getStatusString(jobCon

Re: access Configuration object in Partioner??

2009-07-14 Thread Tom White
It seems that > the org.apache.hadoop.mapred.Partitioner is deprecated and will be removed in > the futture. > Do you have some suggestions on this? > > Thanks, > Jianmin > > > > > ____ > From: Tom White > To: common-user@hadoop

Re: more than one reducer in standalone mode

2009-07-14 Thread Tom White
There's a Jira to fix this here: https://issues.apache.org/jira/browse/MAPREDUCE-434 Tom On Mon, Jul 13, 2009 at 12:34 AM, jason hadoop wrote: > If the jobtracker is set to local, there is no way to have more than 1 > reducer. > > On Sun, Jul 12, 2009 at 12:21 PM, Rares Vernica wrote: > >> Hello

Re: access Configuration object in Partioner??

2009-07-14 Thread Tom White
Hi Jianmin, Partitioner extends JobConfigurable, so you can implement the configure() method to access the JobConf. Hope that helps. Cheers, Tom On Tue, Jul 14, 2009 at 10:27 AM, Jianmin Woo wrote: > Hi, > > I am considering to implement a Partitioner that needs to access the > parameters in C

Re: Restarting a killed job from where it left

2009-07-13 Thread Tom White
Hi Akhil, Have a look at the mapred.jobtracker.restart.recover property. Cheers, Tom On Sun, Jul 12, 2009 at 12:06 AM, akhil1988 wrote: > > HI All, > > I am looking for ways to restart my hadoop job from where it left when the > entire cluster goes down or the job gets stopped due to some reason

Re: Can Namenode and JobTracker run in different server?

2009-06-30 Thread Tom White
The config looks fine, but you need to start the daemons on the relevant servers. You will need the same config on both server1 and server2. On server1: bin/start-dfs.sh On server2: bin/start-mapred.sh Hope this helps. Tom On Tue, Jun 30, 2009 at 7:53 AM, Eason.Lee wrote: > Just want to run