"If there was a
malicious process though, then I imagine it could talk to a datanode
directly and request a specific block."
I didn't understand usage of "malicuous" here,
but any process using HDFS api should first ask NameNode where the
file replications are.
Then - I assume - namenode returns t
@Amandeep
Hi,
I'm new to Hadoop and am trying to run a simple database connectivity
program on it. Could you please tell me how u went about it?? my mail id is
"sandys_cr...@yahoo.com" . A copy of your code that successfully connected
to MySQL will also be helpful.
Thanks,
Sandhiya
Enis Soztutar-
Hmmm - I checked all the /etc/hosts files, and they're all fine. Then I
switched the
conf/hadoop-site.xml to specify ip addresses instead of host names. Then oddly
enough
it starts working...
Now the funny thing is this: It's fine ssh-ing to the correct machines to start
up
datanodes, but when
Thanks, this did it. I changed my /etc/hosts file on each node from
127.0.0.1 localhost localhost.localdomain
127.0.0.1
to just switch the order with
127.0.0.1
127.0.0.1 localhost localhost.localdomain
This did the trick! I vaguely recall from somewhere that I
In general, yeah, the scripts can access any resource they want (within the
permissions of the user that the task runs as). It's also possible to access
HDFS from scripts because HDFS provides a FUSE interface that can make it
look like a regular file system on the machine. (The FUSE module in turn
I'm having difficulty capturing the output of any of the dfs commands
(either in Ruby or on the command line). Supposedly the output is being
sent to stdout yet just running any of the commands on the command line does
not display the output nor does redirecting to a file (e.g., hadoop dfs
-copyTo
I dont know much about Hadoop streaming and have a quick question here.
The snippets of code/programs that you attach into the map reduce job might
want to access outside resources (like you mentioned). Now these might not
need to go to the namenode right? For example a python script. How would it
Hey David --
In case if no one has pointed you to this, you can submit this through
JIRA.
Brian
On Feb 14, 2009, at 12:07 AM, David Alves wrote:
Hi
I ran into a use case where I need to keep two contexts for
metrics. One being ganglia and the other being a file context (to do
offline
I would capture the output of the dfs -copyToLocal command, because I still
think that is the most likely cause of the data not making it. I don't know
how to capture this output in Ruby but I'm sure it's possible. You want to
capture both standard out and standard error.
One other slim possibility
On Feb 15, 2009, at 3:21 AM, Deepak wrote:
Thanks Brain and Chen!
I finally sort that out why cluster is being stopped after running out
of space. Its because of master failure due to disk space.
Regarding automatic balancer, I guess in our case, rate of copying is
faster than balancer rate,
Nope, typically the JobTracker just starts the process, and the tasktracker
talks directly to the namenode to get a pointer to the datanode, and then
directly to the datanode.
On Sun, Feb 15, 2009 at 8:07 PM, Amandeep Khurana wrote:
> Alright.. Got it.
>
> Now, do the task trackers talk to the n
Alright.. Got it.
Now, do the task trackers talk to the namenode and the data node directly or
do they go through the job tracker for it? So, if my code is such that I
need to access more files from the hdfs, would the job tracker get involved
or not?
Amandeep Khurana
Computer Science Graduate
Normally, HDFS files are accessed through the namenode. If there was a
malicious process though, then I imagine it could talk to a datanode
directly and request a specific block.
On Sun, Feb 15, 2009 at 7:15 PM, Amandeep Khurana wrote:
> Ok. Got it.
>
> Now, when my job needs to access another f
okay,
i will heed the tip on the 127 address set. here is the result of ssh
192.168.0.2...
a...@node0:~$ ssh 192.168.0.2
ssh: connect to host 192.168.0.2 port 22: Connection timed out
a...@node0:~$
the boxes are just connected with a cat5 cable.
i have not done this with the hadoop account but
Ok. Got it.
Now, when my job needs to access another file, does it go to the Namenode to
get the block ids? How does the java process know where the files are and
how to access them?
Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz
On Sun, Feb 15, 2009 at
I mentioned this case because even jobs written in Java can use the HDFS API
to talk to the NameNode and access the filesystem. People often do this
because their job needs to read a config file, some small data table, etc
and use this information in its map or reduce functions. In this case, you
o
Fwiw, the extra references to 127.0.1.1 in each host file aren't necessary.
>From node0, does 'ssh 192.168.0.2' work? If not, then the issue isn't name
resolution -- take look at the network configs (eg., /etc/init.d/interfaces)
on each machine.
Norbert
On Sun, Feb 15, 2009 at 7:31 PM, zander10
Another question that I have here - When the jobs run arbitrary code and
access data from the HDFS, do they go to the namenode to get the block
information?
Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz
On Sun, Feb 15, 2009 at 6:00 PM, Amandeep Khurana
Assuming that the job is purely in Java and not involving streaming or
pipes, wouldnt the resources (files) required by the job as inputs be known
beforehand? So, if the map task is accessing a second file, how does it make
it different except that there are multiple files. The JobTracker would kno
This is good information! Thanks a ton. I'll take all this into account.
Amandeep
Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz
On Sun, Feb 15, 2009 at 4:47 PM, Matei Zaharia wrote:
> Typically the data flow is like this:1) Client submits a job descr
Typically the data flow is like this:1) Client submits a job description to
the JobTracker.
2) JobTracker figures out block locations for the input file(s) by talking
to HDFS NameNode.
3) JobTracker creates a job description file in HDFS which will be read by
the nodes to copy over the job's code e
okay,
i have changed /etc/hosts to look like this for node0...
127.0.0.1 localhost
127.0.1.1 node0
# /etc/hosts (for hadoop master and slave)
192.168.0.1 node0
192.168.0.2 node1
#end hadoop section
# The following lines are desirable for IPv6 capable hosts
::1 ip6-local
>
> i have commented out the 192. addresses and changed 127.0.1.1 for node0 and
> 127.0.1.2 for node0 (in /etc/hosts). with this done i can ssh from one
> machine to itself and to the other but the prompt does not change when i
> ssh
> to the other machine. i don't know if there is a firewall preve
A quick question here. How does a typical hadoop job work at the system
level? What are the various interactions and how does the data flow?
Amandeep
Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz
On Sun, Feb 15, 2009 at 3:20 PM, Amandeep Khurana wrote
Thanks Matei. If the basic architecture is similar to the Google stuff, I
can safely just work on the project using the information from the papers.
I am aware of the 4487 jira and the current status of the permissions
mechanism. I had a look at them earlier.
Cheers
Amandeep
Amandeep Khurana
Co
hi,
sshd is running on both machines. i am using the default ubuntu 8.10
workstation install with openssh-server installed via "apt-get install". i
have tried with the machines connected through both a switch and just
pluging the ethernet cable from one into the other. right now i have just
one
I was not able to determine the command shell return value for
hadoop dfs -copyToLocal #{s3dir} #{localdir}
but I did print out several variables after the call and determined that the
call apparently did not go through successfully. In particular, prior to my
processData(localdir) command I
Forgot to add, this JIRA details the latest security features that are being
worked on in Hadoop trunk: https://issues.apache.org/jira/browse/HADOOP-4487.
This document describes the current status and limitations of the
permissions mechanism:
http://hadoop.apache.org/core/docs/current/hdfs_permiss
I think it's safe to assume that Hadoop works like MapReduce/GFS at the
level described in those papers. In particular, in HDFS, there is a master
node containing metadata and a number of slave nodes (datanodes) containing
blocks, as in GFS. Clients start by talking to the master to list
directorie
just some more information:
hadoop fsck produces:
Status: HEALTHY
Total size: 0 B
Total dirs: 9
Total files: 0 (Files currently being written: 1)
Total blocks (validated): 0
Minimally replicated blocks: 0
Over-replicated blocks: 0
Under-replicated blocks: 0
Mis-replicated blocks: 0
Default
Hi Zander -
Two simple explanations come to mind:
* Is sshd is running on your boxes?
* If so, do you have a firewall preventing ssh access?
cheers,
-jw
On Sat, Feb 14, 2009 at 7:50 PM, zander1013 wrote:
>
> hi,
>
> am going through the tutorial on multinode cluster setup by m. noll...
>
> htt
Thanks Matie
I had gone through the architecture document online. I am currently working
on a project towards Security in Hadoop. I do know how the data moves around
in the GFS but wasnt sure how much of that does HDFS follow and how
different it is from GFS. Can you throw some light on that?
Sec
Thanks for your responses.
I checked in the namenode and jobtracker logs and both say:
INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 9000, call
delete(/Users/hadoop/hadoop-0.18.2/hadoop-hadoop/mapred/system, true) from
127.0.0.1:61086: error: org.apache.hadoop.dfs.SafeModeException:
1. They are related as one can use EC2 as a to serve computation part
for hadoop.
Refer: http://wiki.apache.org/hadoop/AmazonEC2
2. yes
Refer:
http://wiki.apache.org/hadoop/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)
3. you can use EC2 as a to serve computation part for hadoop.
--nites
Hi Amandeep,
Hadoop is definitely inspired by MapReduce/GFS and aims to provide those
capabilities as an open-source project. HDFS is similar to GFS (large
blocks, replication, etc); some notable things missing are read-write
support in the middle of a file (unlikely to be provided because few Hado
Hi
Is the HDFS architecture completely based on the Google Filesystem? If it
isnt, what are the differences between the two?
Secondly, is the coupling between Hadoop and HDFS same as how it is between
the Google's version of Map Reduce and GFS?
Amandeep
Amandeep Khurana
Computer Science Gradua
I followed these instructions
http://wiki.apache.org/hadoop/MountableHDFS
and was able to get things working with 0.19.0 on Fedora. The only problem I
ran into was the AMD64 issue on one of my boxes (see the note on the above
link); I edited the Makefile and set OSARCH as suggested but couldn't g
Try comment out te localhost definition in your /etc/hosts file.
2009/2/14 S D
> I'm reviewing the task trackers on the web interface (
> http://jobtracker-hostname:50030/) for my cluster of 3 machines. The names
> of the task trackers do not list real domain names; e.g., one of the task
> track
Hi,
I have multiple questions:
Does hadoop use some parallel technique for CopyFromLocal and CopyToLocal
(like DistCp) Or its simple ONE stream writing?
For Amazon S3 to Local system communication, Hadoop uses Rest service interface
or SOAP ?
Are there some new storage systems currently
Thanks Brain and Chen!
I finally sort that out why cluster is being stopped after running out
of space. Its because of master failure due to disk space.
Regarding automatic balancer, I guess in our case, rate of copying is
faster than balancer rate, we found balancer do start but couldn't
perform
hi:
What is the relationship between the hadoop and the amazon ec2 ?
Can hadoop run on the common pc (but not server ) directly ?
Why someone says hadoop run on the amazon ec2 ?
thanks!
--
View this message in context:
http://www.nabble.com/question-about-hadoop-and-amazon-ec2---tp22020652p2202
41 matches
Mail list logo