Re: How do I create a sequence file on my local harddrive?

2011-04-25 Thread David Rosenstrauch
On 04/22/2011 09:09 PM, W.P. McNeill wrote: I want to create a sequence file on my local harddrive. I want to write something like this: LocalFileSystem fs = new LocalFileSystem(); Configuration configuration = new Configuration(); Try doing this instead: Configuration

How to change logging level for an individual job

2011-04-13 Thread David Rosenstrauch
Is it possible to change the logging level for an individual job? (As opposed to the cluster as a whole.) E.g., is there some key that I can set on the job's configuration object that would allow me to bump up the logging from info to debug just for that particular job? Thanks, DR

Re: How to abort a job in a map task

2011-04-07 Thread David Rosenstrauch
On 04/06/2011 08:40 PM, Haruyasu Ueda wrote: Hi all, I'm writing M/R java program. I want to abort a job itself in a map task, when the map task found irregular data. I have two idea to do so. 1. execulte bin/hadoop -kill jobID in map task, from slave machine. 2. raise an IOException to

Re: Developing, Testing, Distributing

2011-04-07 Thread David Rosenstrauch
On 04/07/2011 03:39 AM, Guy Doulberg wrote: Hey, I have been developing Map/Red jars for a while now, and I am still not comfortable with the developing environment I gathered for myself (and the team) I am curious how other Hadoop developers out-there, are developing their jobs... What IDE

Re: What does Too many fetch-failures mean? How do I debug it?

2011-03-31 Thread David Rosenstrauch
On 03/31/2011 05:13 PM, W.P. McNeill wrote: I'm running a big job on my cluster and a handful of attempts are failing with a Too many fetch-failures error message. They're all on the same node, but that node doesn't appear to be down. Subsequent attempts succeed, so this looks like a transient

Re: CDH and Hadoop

2011-03-24 Thread David Rosenstrauch
They do, but IIRC, they recently announced that they're going to be discontinuing it. DR On Thu, March 24, 2011 8:10 pm, Rita wrote: Thanks everyone for your replies. I knew Cloudera had their release but never knew Y! had one too... On Thu, Mar 24, 2011 at 5:04 PM, Eli Collins

Re: Writable Class with an Array

2011-03-17 Thread David Rosenstrauch
I would try implementing this using an ArrayWritable, which contains an array of IntWritables. HTH, DR On 03/17/2011 05:04 PM, maha wrote: Hello, I'm stuck with this for two days now ...I found a previous post discussing this, but not with arrays. I know how to write Writable class with

Re: hadoop fs -rmr /*?

2011-03-16 Thread David Rosenstrauch
On 03/16/2011 01:35 PM, W.P. McNeill wrote: On HDFS, anyone can run hadoop fs -rmr /* and delete everything. Not sure how you have your installation set but on ours (we installed Cloudera CDH), only user hadoop has full read/write access to HDFS. Since we rarely either login as user hadoop,

Re: Custom Input format...

2011-02-11 Thread David Rosenstrauch
On 02/11/2011 05:43 AM, Nitin Khandelwal wrote: Hi, I want to give a folder as input path to Map Red. Each Task should read one file out of that folder at once . i was using it before in 0.19 using multiFileSplit Format and my own Input format extending it. can u plz tell how to do the same in

Re: why is it invalid to have non-alphabet characters as a result of MultipleOutputs?

2011-02-08 Thread David Rosenstrauch
On 02/08/2011 05:01 AM, Jun Young Kim wrote: Hi, Multipleoutputs supports to have named outputs as a result of a hadoop. but, it has inconvenient restrictions to have it. only, alphabet characters are valid as a named output. A ~ Z a ~ z 0 ~ 9 are only characters we can take. I believe if I

Re: Streaming data locality

2011-02-03 Thread David Rosenstrauch
On 02/03/2011 12:16 PM, Keith Wiley wrote: I've seen this asked before, but haven't seen a response yet. If the input to a streaming job is not actual data splits but simple HDFS file names which are then read by the mappers, then how can data locality be achieved. Likewise, is there any

Re: MultipleOutputs Performance?

2011-01-13 Thread David Rosenstrauch
On 12/10/2010 02:16 PM, Harsh J wrote: Hi, On Thu, Dec 2, 2010 at 10:40 PM, Matt Tanquarymatt.tanqu...@gmail.com wrote: I am using MultipleOutputs to split a mapper input into about 20 different files. Adding this split has had an extremely adverse effect on performance. Is MultipleOutputs

Re: monitor the hadoop cluster

2010-11-11 Thread David Rosenstrauch
On 11/11/2010 02:52 PM, Da Zheng wrote: Hello, I wrote a MapReduce program and ran it on a 3-node hadoop cluster, but its running time varies a lot, from 2 minutes to 3 minutes. I want to understand how time is used by the map phase and the reduce phase, and hope to find the place to improve

Re: jobtracker: Cannot assign requested address

2010-09-21 Thread David Rosenstrauch
On 09/21/2010 03:17 AM, Jing Tie wrote: I am still suffering from the problem... Did anyone encounter it before? Or any suggestions? Many thanks in advance! Jing On Fri, Sep 17, 2010 at 5:19 PM, Jing Tietiej...@gmail.com wrote: Dear all, I am having this exception when starting jobtracker,

Re: migrating from 0.18 to 0.20

2010-09-16 Thread David Rosenstrauch
It certainly is! I wasted a few hours on that a couple of weeks back. DR On 09/16/2010 02:58 AM, Lance Norskog wrote: After this, if you add anything to the conf object, it does not get added to the job. This is a source of confusion. Mark Kerzner wrote: Thanks! Mark On Wed, Sep 15, 2010

Re: Multiple DataNodes on a single machine

2010-09-15 Thread David Rosenstrauch
On 09/15/2010 11:50 AM, Arv Mistry wrote: Hi, Is it possible to run multiple data nodes on a single machine? I currently have a machine with multiple disks and enough disk capacity for replication across them. I don't need redundancy at the machine level but would like to be able to handle a

Re: Writable questions

2010-08-31 Thread David Rosenstrauch
On 08/31/2010 12:58 PM, Mark wrote: I have a question regarding outputting Writable objects. I thought all Writables know how to serialize themselves to output. For example I have an ArrayWritable of strings (or Texts) but when I output it to a file it shows up as

Re: Writable questions

2010-08-31 Thread David Rosenstrauch
On 08/31/2010 02:09 PM, Mark wrote: On 8/31/10 10:07 AM, David Rosenstrauch wrote: On 08/31/2010 12:58 PM, Mark wrote: I have a question regarding outputting Writable objects. I thought all Writables know how to serialize themselves to output. For example I have an ArrayWritable of strings

Re: Custom partitioner for hadoop

2010-08-25 Thread David Rosenstrauch
On 08/25/2010 12:40 PM, Mithila Nagendra wrote: In order to avoid this I was thinking of passing the range boundaries to the partitioner. How would I do that? Is there an alternative? Any suggestion would prove useful. We use a custom partitioner, for which we pass in configuration data that

Re: Custom partitioner for hadoop

2010-08-25 Thread David Rosenstrauch
If you define a Hadoop object as implementing Configurable, then its setConf() method will be called once, right after it gets instantiated. So each partitioner that gets instantiated will have its setConf() method called right afterwards. I'm taking advantage of that fact by calling my own

Viewing counters in history job

2010-08-23 Thread David Rosenstrauch
I had a job that I ran a few days ago that rolled over to the Job tracker history. Now when I go view it in the history viewer although I can see basic stats such as total # records in/out, I can no longer see all the counter values (i.e, most notably my own custom counter values). Is there

Re: DiskChecker$DiskErrorException: Could not find any valid local directory -- lots of free disk space

2010-08-12 Thread David Rosenstrauch
On 08/12/2010 01:42 PM, Rares Vernica wrote: I forgot to mention that in my cluster the HDFS replication is set to 1. I know this is not recommended but I only have 5 nodes in the cluster, there are no failures There will be! :-) DR

Fwd: Partitioner in Hadoop 0.20

2010-08-04 Thread David Rosenstrauch
Someone sent this email to the commons-user list a while back, but it seems like it slipped through the cracks. We're starting to dig into some hard-core Hadoop development and just came upon this same issue, though. Anyone know if there's any particular reason why the new Partitioner class

Re: Partitioner in Hadoop 0.20

2010-08-04 Thread David Rosenstrauch
On 08/04/2010 12:30 PM, Owen O'Malley wrote: On Aug 4, 2010, at 8:38 AM, David Rosenstrauch wrote: Anyone know if there's any particular reason why the new Partitioner class doesn't implement JobConfigurable? (And, if not, whether there's any plans to fix this omission?) We're working

Re: Partitioner in Hadoop 0.20

2010-08-04 Thread David Rosenstrauch
On 08/04/2010 01:55 PM, Wilkes, Chris wrote: On Aug 4, 2010, at 10:50 AM, David Rosenstrauch wrote: On 08/04/2010 12:30 PM, Owen O'Malley wrote: On Aug 4, 2010, at 8:38 AM, David Rosenstrauch wrote: Anyone know if there's any particular reason why the new Partitioner class doesn't

Re: Hadoop in fully-distributed mode

2010-07-14 Thread David Rosenstrauch
On 07/14/2010 06:58 AM, abc xyz wrote: Hi everyone, When hadoop is running in fully-distributed mode and I am not the cluster administrator, instead I just can execute my programs on the cluster, how can I get access to the log files of the programs that I run on the cluster? I want to see the

Re: Text files vs. SequenceFiles

2010-07-06 Thread David Rosenstrauch
Thanks much for the helpful responses everyone. This very much helped clarify our thinking on the code design. Sounds like all other things being equal, sequence files are the way to go. Again, thanks again for the advice, all. DR On 07/05/2010 03:47 AM, Aaron Kimball wrote: David, I

Text files vs. SequenceFiles

2010-07-02 Thread David Rosenstrauch
Our team is still new to Hadoop, and a colleague and I are trying to make a decision on file formats. The arguments are: * We should use a SequenceFile (binary) format as it's faster for the machine to read than parsing text, and the files are smaller. * We should use a text file format as

Re: Does hadoop need to have ZooKeeper to work?

2010-06-28 Thread David Rosenstrauch
On 06/28/2010 10:09 AM, legolas wrote: Hi, I am wondering whether Hadoop has some dependencies on ZooKeeper or not. I mean when I download http://apache.thelorne.com/hadoop/core/hadoop-0.20.2/hadoop-0.20.2.tar.gz does it has ZooKeeper with it or I should download zoo keeper separately.

Re: which node processed my job

2010-05-06 Thread David Rosenstrauch
On 05/06/2010 11:09 AM, Alan Miller wrote: Not sure if this is the right list for this question, but. Is it possible to determine which host actually processed my MR job? Regards, Alan I'm curious: why would you need to know? DR

Host name problem in Hadoop GUI

2010-04-23 Thread David Rosenstrauch
Having an issue with host names on my new Hadoop cluster. The cluster is currently 1 name node and 2 data nodes, running in a cloud vendor data center. All is well with general operations of the cluster - i.e., name node and data nodes can talk just fine, I can read/write to/from the HDFS,