Hi Pat,
Sounds like you would just turn off the datanode and the tasktracker.
Your config will still point to the Namenode and JT, so you can still
launch jobs and read/write from HDFS.
You'll probably want to replicate the data off first of course.
Thanks,
Tom
On Mon, Jun 4, 2012 at 2:06 PM,
Hi Pat,
Sounds like the trick. This node is a slave so it's datanode and tasktracker
are started from the master.
- how do I start the cluster without starting the datanode and the
tasktracker on the mini-node slave? Remove it from slaves?
There's no main cluster software, just don't start
Hi Chris and all, hope you don't mind if I inject a question in here.
It's highly related IMO (famous last words).
On Sat, Mar 31, 2012 at 2:18 PM, Chris White chriswhite...@gmail.com wrote:
You can serialize your Writables to a ByteArrayOutputStream and then
get it's underlying byte array:
Sounds like mapred.task.timeout? The default is 10 minutes.
http://hadoop.apache.org/common/docs/current/mapred-default.html
Thanks,
Tom
On Wed, Jan 18, 2012 at 2:05 PM, Steve Lewis lordjoe2...@gmail.com wrote:
The map tasks fail timing out after 600 sec.
I am processing one 9 GB file with
I'm hoping there is a better answer, but I'm thinking you could load
another configuration file (with B.company in it) using Configuration,
grab a FileSystem obj with that and then go forward. Seems like some
unnecessary overhead though.
Thanks,
Tom
On Thu, Dec 8, 2011 at 2:42 PM, Frank Astier
So that code 126 should be kicked out by your program - do you know
what that means?
Your code can read from stdin?
Thanks,
Tom
On Sat, Dec 3, 2011 at 7:09 PM, Daniel Yehdego
dtyehd...@miners.utep.edu wrote:
I have the following error in running hadoop streaming,
Hi Daniel,
I see from your other thread that your HADOOP script has a line like:
#!/bin/shrm -f temp.txt
I'm not sure what that is, exactly. I suspect the -f is reading from
some file and the while loop you had listed read from stdin it seems.
What does your input look like? I think what's
Oh, I see the line wrapped. My bad.
Either way, I think the NLineInputFormat is what you need. I'm
assuming you want one line of input to execute on one mapper.
Thanks,
Tom
On Sat, Dec 3, 2011 at 7:57 PM, Daniel Yehdego
dtyehd...@miners.utep.edu wrote:
TOM,
What the HADOOP script do is
On Fri, Dec 2, 2011 at 9:57 AM, W.P. McNeill bill...@gmail.com wrote:
After my Hadoop job has successfully completed I'd like to log the total
amount of time it took. This is the Finished in statistic in the web UI.
How do I get this number programmatically? Is there some way I can query
the
Hi Folks,
I have a bunch of binary files which I've stored in a sequencefile.
The name of the file is the key, the data is the value and I've stored
them sorted by key. (I'm not tied to using a sequencefile for this).
The current test data is only 50MB, but the real data will be 500MB -
1GB.
My
3. Another idea might be create separate seq files for chunk of
records and make them non-splittable, ensuring that they go to a
single mapper. Assuming I can get away with this, see any pros/cons
with that approach?
Separate sequence files would require the least amount of custom code.
/lib/MultipleOutputs.html
(Also available for the new API, depending on which
version/distribution of Hadoop you are on)
On Tue, Jul 26, 2011 at 3:36 AM, Tom Melendez t...@supertom.com wrote:
Hi Harsh,
Thanks for the response. Unfortunately, I'm not following your response.
:-)
Could you
that you will never write to the same file from two different
mappers or processes. HDFS currently does not support writing to a single
file from multiple processes.
--Bobby
On 7/25/11 3:25 PM, Tom Melendez t...@supertom.com wrote:
Hi Folks,
Just doing a sanity check here.
I have a map
.
--Bobby
On 7/25/11 3:25 PM, Tom Melendez t...@supertom.com wrote:
Hi Folks,
Just doing a sanity check here.
I have a map-only job, which produces a filename for a key and data as
a value. I want to write the value (data) into the key (filename) in
the path specified when I run the job
are not comfortable writing your own code and maintaining
it, I s'pose. Your approach is correct as well, if the question was
specifically that.
On Tue, Jul 26, 2011 at 1:55 AM, Tom Melendez t...@supertom.com wrote:
Hi Folks,
Just doing a sanity check here.
I have a map-only job, which produces
jvm on the node. I haven't looked
into it in detail yet but it looks like Gangla only reports the last
jvm record in each batch. Anyone else seen
this?
Chris
On 24 May 2011 01:48, Tom Melendez t...@supertom.com wrote:
Hi Folks,
I'm looking for tips, tricks and tools to get at node
Hi Folks,
I'm looking for tips, tricks and tools to get at node utilization to
optimize our cluster. I want answer questions like:
- what nodes ran a particular job?
- how long did it take for those nodes to run the tasks for that job?
- how/why did Hadoop pick those nodes to begin with?
More
I'm on Ubuntu and use pipes. These are my ssl packages, notice libssl
and libssl-dev in particular:
supertom@hadoop-2:~/h-v8$ dpkg -l |grep -i ssl
ii libopenssl-ruby 4.2
OpenSSL interface for Ruby
ii libopenssl-ruby1.8 1.8.7.249-2
OpenSSL interface for Ruby 1.8
ii
Hi Folks,
I'm having trouble getting a custom classpath through to the datanodes
in my cluster.
I'm using libhdfs and pipes, and the hdfsConnect call in libhdfs
requires that the classpath is set. My code executes fine on a
standalone machine, but when I take to the cluster, I can see that the
19 matches
Mail list logo