Re: mini node in a cluster

2012-06-04 Thread Tom Melendez
Hi Pat, Sounds like you would just turn off the datanode and the tasktracker. Your config will still point to the Namenode and JT, so you can still launch jobs and read/write from HDFS. You'll probably want to replicate the data off first of course. Thanks, Tom On Mon, Jun 4, 2012 at 2:06 PM,

Re: mini node in a cluster

2012-06-04 Thread Tom Melendez
Hi Pat, Sounds like the trick. This node is a slave so it's datanode and tasktracker are started from the master.   - how do I start the cluster without starting the datanode and the tasktracker on the mini-node slave? Remove it from slaves? There's no main cluster software, just don't start

Re: how to unit test my RawComparator

2012-03-31 Thread Tom Melendez
Hi Chris and all, hope you don't mind if I inject a question in here. It's highly related IMO (famous last words). On Sat, Mar 31, 2012 at 2:18 PM, Chris White chriswhite...@gmail.com wrote: You can serialize your Writables to a ByteArrayOutputStream and then get it's underlying byte array:

Re: I am trying to run a large job and it is consistently failing with timeout - nothing happens for 600 sec

2012-01-18 Thread Tom Melendez
Sounds like mapred.task.timeout? The default is 10 minutes. http://hadoop.apache.org/common/docs/current/mapred-default.html Thanks, Tom On Wed, Jan 18, 2012 at 2:05 PM, Steve Lewis lordjoe2...@gmail.com wrote: The map tasks fail timing out after 600 sec. I am processing one 9 GB file with

Re: Question about accessing another HDFS

2011-12-08 Thread Tom Melendez
I'm hoping there is a better answer, but I'm thinking you could load another configuration file (with B.company in it) using Configuration, grab a FileSystem obj with that and then go forward. Seems like some unnecessary overhead though. Thanks, Tom On Thu, Dec 8, 2011 at 2:42 PM, Frank Astier

Re: Hadoop Streaming

2011-12-03 Thread Tom Melendez
So that code 126 should be kicked out by your program - do you know what that means? Your code can read from stdin? Thanks, Tom On Sat, Dec 3, 2011 at 7:09 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote: I have the following error in running hadoop streaming,

Re: Hadoop Streaming

2011-12-03 Thread Tom Melendez
Hi Daniel, I see from your other thread that your HADOOP script has a line like: #!/bin/shrm -f temp.txt I'm not sure what that is, exactly. I suspect the -f is reading from some file and the while loop you had listed read from stdin it seems. What does your input look like? I think what's

Re: Hadoop Streaming

2011-12-03 Thread Tom Melendez
Oh, I see the line wrapped. My bad. Either way, I think the NLineInputFormat is what you need. I'm assuming you want one line of input to execute on one mapper. Thanks, Tom On Sat, Dec 3, 2011 at 7:57 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote: TOM, What the HADOOP script do is

Re: How do I programmatically get total job execution time?

2011-12-02 Thread Tom Melendez
On Fri, Dec 2, 2011 at 9:57 AM, W.P. McNeill bill...@gmail.com wrote: After my Hadoop job has successfully completed I'd like to log the total amount of time it took. This is the Finished in statistic in the web UI. How do I get this number programmatically? Is there some way I can query the

questions regarding data storage and inputformat

2011-07-27 Thread Tom Melendez
Hi Folks, I have a bunch of binary files which I've stored in a sequencefile. The name of the file is the key, the data is the value and I've stored them sorted by key. (I'm not tied to using a sequencefile for this). The current test data is only 50MB, but the real data will be 500MB - 1GB. My

Re: questions regarding data storage and inputformat

2011-07-27 Thread Tom Melendez
3. Another idea might be create separate seq files for chunk of records and make them non-splittable, ensuring that they go to a single mapper.  Assuming I can get away with this, see any pros/cons with that approach? Separate sequence files would require the least amount of custom code.

Re: Custom FileOutputFormat / RecordWriter

2011-07-26 Thread Tom Melendez
/lib/MultipleOutputs.html (Also available for the new API, depending on which version/distribution of Hadoop you are on) On Tue, Jul 26, 2011 at 3:36 AM, Tom Melendez t...@supertom.com wrote: Hi Harsh, Thanks for the response.  Unfortunately, I'm not following your response.   :-) Could you

Re: Custom FileOutputFormat / RecordWriter

2011-07-25 Thread Tom Melendez
that you will never write to the same file from two different mappers or processes.  HDFS currently does not support writing to a single file from multiple processes. --Bobby On 7/25/11 3:25 PM, Tom Melendez t...@supertom.com wrote: Hi Folks, Just doing a sanity check here. I have a map

Re: Custom FileOutputFormat / RecordWriter

2011-07-25 Thread Tom Melendez
. --Bobby On 7/25/11 3:25 PM, Tom Melendez t...@supertom.com wrote: Hi Folks, Just doing a sanity check here. I have a map-only job, which produces a filename for a key and data as a value.  I want to write the value (data) into the key (filename) in the path specified when I run the job

Re: Custom FileOutputFormat / RecordWriter

2011-07-25 Thread Tom Melendez
are not comfortable writing your own code and maintaining it, I s'pose. Your approach is correct as well, if the question was specifically that. On Tue, Jul 26, 2011 at 1:55 AM, Tom Melendez t...@supertom.com wrote: Hi Folks, Just doing a sanity check here. I have a map-only job, which produces

Re: tips and tools to optimize cluster

2011-05-24 Thread Tom Melendez
jvm on the node.  I haven't looked into it in detail yet but it looks like Gangla only reports the last jvm record in each batch. Anyone else seen this? Chris On 24 May 2011 01:48, Tom Melendez t...@supertom.com wrote: Hi Folks, I'm looking for tips, tricks and tools to get at node

tips and tools to optimize cluster

2011-05-23 Thread Tom Melendez
Hi Folks, I'm looking for tips, tricks and tools to get at node utilization to optimize our cluster. I want answer questions like: - what nodes ran a particular job? - how long did it take for those nodes to run the tasks for that job? - how/why did Hadoop pick those nodes to begin with? More

Re: Linker errors with Hadoop pipes

2011-05-19 Thread Tom Melendez
I'm on Ubuntu and use pipes. These are my ssl packages, notice libssl and libssl-dev in particular: supertom@hadoop-2:~/h-v8$ dpkg -l |grep -i ssl ii libopenssl-ruby 4.2 OpenSSL interface for Ruby ii libopenssl-ruby1.8 1.8.7.249-2 OpenSSL interface for Ruby 1.8 ii

passing classpath through to datanodes?

2011-05-06 Thread Tom Melendez
Hi Folks, I'm having trouble getting a custom classpath through to the datanodes in my cluster. I'm using libhdfs and pipes, and the hdfsConnect call in libhdfs requires that the classpath is set. My code executes fine on a standalone machine, but when I take to the cluster, I can see that the