Hi Humayun ,
Lets assume you have JT, TT1, TT2, TT3
Now you should configure the \etc\hosts like below examle
10.18.xx.1 JT
10.18.xx.2 TT1
10.18.xx.3 TT2
10.18.xx.4 TT3
Configure the same set in all the machines, so that all task trackers can
talk each other wi
Finally I've found why we can't use rumen in Hadoop-0.21, I've just seen
the logs of hadoop-0.20 and compared them with those of 0.21, then I found
there is a BIG difference between them in the format, that's why we can't
use rumen in 0.21.
BTW, the logs' format in 0.23 has also changed a little b
Hello people,
So I am trying to install hadoop .20.205 on 2 machines
Individually I am able to run hadoop on each machines.
Now when I am configuring one machine as slave and other as master,
and tryin to start hadoop, its not able to even execute hadoop-run
commands on slave machine
I am getting
When installing hadoop on slave machines, do we have to install hadoop
at same locations on each machine ?
Can we have hadoop installation at different location on different
machines at same cluster ?
If yes, what things we have to take care in that case
Thanks,
Praveenesh
Sure,
You could do that, but in doing so, you will make your life a living hell.
Literally.
Think about it... You will have to manually manage each nodes config files...
So if something goes wrong you will have a hard time diagnosing the issue.
Why make life harder?
Why not just do the simple t
Class project due?
Sorry, second set of questions on setting up a 2 node cluster...
Sent from my iPhone
On Dec 22, 2011, at 3:25 AM, "Humayun kabir" wrote:
> someone please help me to configure hadoop such as core-site.xml,
> hdfs-site.xml, mapred-site.xml etc.
> please provide some example. it
What I mean to say is, Does hadoop internally assumes that all
installations on each nodes need to be in same location.
I was having hadoop installed on different location on 2 different nodes.
I configured hadoop config files to be a part of same cluster.
But when I started hadoop on master, I sa
Hi folks,
I've just done a fresh install of Hadoop, Namenode and datanode are up,
Task/job Tracker also up, but when I run the Map reduce worcount exemple I got
this error on Task tracker:
2011-12-23 15:11:52,679 INFO org.apache.hadoop.mapred.JvmManager: JVM :
jvm_201112231511_0001_m_-165367
Hi,
take a look into the logs for the failed attempt at your Tasktracker.
Also check the system logs with dmesg or /var/log/kern*. Could be a
syskill (segfault).
- Alex
On Fri, Dec 23, 2011 at 3:32 PM, anthony garnier wrote:
>
> Hi folks,
>
> I've just done a fresh install of Hadoop, Namenode a
Hello All,
I have a situation to dump cassandra data to hadoop cluster for further
analytics. Lot of other relevant data which is not present in cassandra is
already available in hdfs for analysis. Both are independent clusters right
now.
Is there a suggested way to get the data periodically or co
Hey Ravi:
Hadoop newbie here, so pardon me if I am pointing out the obvious - have
you taken a look at this link -
http://wiki.apache.org/cassandra/HadoopSupport
Looks like Cassandra 0.6 onwards supports output to mapreduce.
Regards
Sanjeev
On Fri, 2011-12-23 at 07:13 -0800, ravikumar visweswar
Thank you for your reference. I have looked at Brisk. In our situation both
are disconnected clusters for various reasons and using different
distributions (i.e cloudera). Is there any other/similar way to inject data
to HDFS
R
On Fri, Dec 23, 2011 at 7:34 AM, Sanjeev Verma wrote:
> Hey Ravi:
>
For a hadoop cluster that starts medium size (50 nodes) but could grow to
hundred of nodes, what is the recommended network in the rack? 1gig or 10gig
We have machines with 8 cores, 4 X 1tb drive (could grow to 8 X 1b drive),
48 Gb ram per node.
We expect "balanced" usage of the cluster (both stora
Ok,
Here's the thing...
1) When building the cluster, you want to be consistent.
2) Location of $HADOOP_HOME is configurable. So you can place it anywhere.
Putting the software in two different locations isn't a good idea because you
now have to set it up with a unique configuration per node.
Hi,
recommend or optimum?
10G are the best for optimal rackawareness. If you plan to grow up
seriously, start with the best you can effort. Depends on your
available investment, I think.
- Alex
On Fri, Dec 23, 2011 at 6:23 PM, Koert Kuipers wrote:
> For a hadoop cluster that starts medium size
On Fri, Dec 23, 2011 at 12:23:59PM -0500, Koert Kuipers wrote:
> For a hadoop cluster that starts medium size (50 nodes) but could grow to
> hundred of nodes, what is the recommended network in the rack? 1gig or 10gig
> We have machines with 8 cores, 4 X 1tb drive (could grow to 8 X 1b drive),
> 48
One or two 1gig nics on a 10g backbone sound reasonable with "only" 4 1T
drives. 12*2T disks per node are getting more common and do not all have 10gig
network cards, even on 600+ node clusters.
Cheers,
Joep
Sent from my iPhone
On Dec 23, 2011, at 11:15 AM, Mads Toftum wrote:
> On Fri, Dec
Agreed that different locations is not a good idea.
However, the question was, can it be done? Yes, with some hacking I suppose.
Do I recommend hacking? No.
But, if you cannot help yourself, then having data nodes in a different
locations per slave: create a hdfs-site.xml per node (enjoy).
For the
Hi,
Installed 0.22.0 on CentOS 5.7. I can start dfs and mapred and see their
processes.
Ran the first grep example: bin/hadoop jar hadoop-*-examples.jar grep input
output 'dfs[a-z.]+'. It seems the correct jar name is
hadoop-mapred-examples-0.22.0.jar - there are no other hadoop*examples*.ja
i,
Installed 0.22.0 on CentOS 5.7. I can start dfs and mapred and see their
processes (3 dfs and 3 mapred).
Ran the first grep example: bin/hadoop jar hadoop-*-examples.jar grep input
output 'dfs[a-z.]+'. It seems the correct jar name is
hadoop-mapred-examples-0.22.0.jar - there are no other
Seems like you do not have "/user/MyId/input/conf" on HDFS.
Try this.
cd $HADOOP_HOME_DIR (this should be your hadoop root dir)
hadoop fs -put conf input/conf
And then run the MR job again.
-Prashant Kommireddi
On Fri, Dec 23, 2011 at 3:40 PM, Pat Flaherty wrote:
> Hi,
>
> Installed 0.22.0 o
Hey everyone:
I am going through the "hadoop in action" book, and I guess the version of
hadoop that book refers to is already old :-). The installation I have is
0.20.203.0, and in this version, a few key base classes have been
deprecated, like:
Interface InputSplit is deprecated in favor of Inp
These items have been un-deprecated in 0.20.205+, and is also supported in
0.22/0.23+. The deprecated APIs are now the stable ones again, and you
shouldn't carry further confusion while using it.
On 24-Dec-2011, at 6:53 AM, Sanjeev Verma wrote:
> Hey everyone:
>
> I am going through the "hadoo
Pat,
Perhaps for some reason your program isn't picking up the right filesystem as
it starts. What does "hadoop classpath" print?
As a workaround, you can also pass an explicit FS to your command:
input -> hdfs://host:port/user/path/to/input
output -> hdfs://host:port/user/path/to/output
And th
Bourne,
You have 14 million files, each taking up a single block or are these
files multi-blocked? What does the block count come up as in the live
nodes list of the NN web UI?
2011/12/23 bourne1900 :
> Sorry, a detailed description:
> I wanna know how many files a datanode can hold, so there is
I am probably stating the obvious again - have you looked at the
DBInputFormat class? Or, another option might be to programmatically move
data to hdfs using the FileSystem api.
On Dec 23, 2011 10:50 AM, "ravikumar visweswara"
wrote:
Thanks Harsh!
On Dec 23, 2011 11:21 PM, "Harsh J" wrote:
> These items have been un-deprecated in 0.20.205+, and is also supported in
> 0.22/0.23+. The deprecated APIs are now the stable ones again, and you
> shouldn't carry further confusion while using it.
>
> On 24-Dec-2011, at 6:53 AM, Sanjee
27 matches
Mail list logo