Uhhh... Alexey, did you really mean that you are running 100 mega bit per second network links?
That is going to make hadoop run *really* slowly. Also, putting RAID under any DFS, be it Hadoop or MapR is not a good recipe for performance. Not that it matters if you only have 10mega bytes per second available from the network. On Mon, Oct 15, 2012 at 6:56 PM, Andy Isaacson <a...@cloudera.com> wrote: > Also, note that JVM startup overhead, etc, means your -ls time is not > completely unreasonable. Using OpenJDK on a cluster of VMs, my "hdfs > dfs -ls" takes 1.88 seconds according to time (and 1.59 seconds of > user CPU time). > > I'd be much more concerned about your slow transfer times. On the > same cluster, I can easily push 4 MB/sec even with only a 100MB file > using "hdfs dfs -put - foo.txt". And of course using distcp or > multiple -put workloads HDFS can saturate multiple GigE links. > > -andy > > On Mon, Oct 15, 2012 at 5:22 PM, Vinod Kumar Vavilapalli > <vino...@hortonworks.com> wrote: > > Try picking up a single operation say "hadoop dfs -ls" and start > profiling. > > - Time the client JVM is taking to start. Enable debug logging on the > > client side by exporting HADOOP_ROOT_LOGGER=DEBUG,CONSOLE > > - Time between the client starting and the namenode audit logs showing > the > > read request. Also enable debug logging on the daemons too. > > - Also, you can wget the namenode web pages and see how fast they > return. > > > > To repeat what is already obvious, It is most likely related to your > network > > setup and/or configuration. > > > > Thanks, > > +Vinod > > > > On Oct 10, 2012, at 12:20 AM, Alexey wrote: > > > > ok, here you go: > > I have 3 servers: > > datanode on server 1, 2, 3 > > namenode on server 1 > > secondarynamenode on server 2 > > > > all servers are at the hetzner datacenter and connected through 100Mbit > > link, pings between them about 0.1ms > > > > each server has 24Gb ram and intel core i7 3Ghz CPU > > disk is 700Gb RAID > > > > the bindings related configuration is the following: > > server 1: > > core-site.xml > > -------------------------------------- > > <name>fs.default.name</name> > > <value>hdfs://5.6.7.11:8020</value> > > -------------------------------------- > > > > hdfs-site.xml > > -------------------------------------- > > <name>dfs.datanode.address</name> > > <value>0.0.0.0:50010</value> > > > > <name>dfs.datanode.http.address</name> > > <value>0.0.0.0:50075</value> > > > > <name>dfs.http.address</name> > > <value>5.6.7.11:50070</value> > > > > <name>dfs.secondary.https.port</name> > > <value>50490</value> > > > > <name>dfs.https.port</name> > > <value>50470</value> > > > > <name>dfs.https.address</name> > > <value>5.6.7.11:50470</value> > > > > <name>dfs.secondary.http.address</name> > > <value>5.6.7.12:50090</value> > > -------------------------------------- > > > > server 2: > > core-site.xml > > -------------------------------------- > > <name>fs.default.name</name> > > <value>hdfs://5.6.7.11:8020</value> > > -------------------------------------- > > > > hdfs-site.xml > > -------------------------------------- > > <name>dfs.datanode.address</name> > > <value>0.0.0.0:50010</value> > > > > <name>dfs.datanode.http.address</name> > > <value>0.0.0.0:50075</value> > > > > <name>dfs.http.address</name> > > <value>5.6.7.11:50070</value> > > > > <name>dfs.secondary.https.port</name> > > <value>50490</value> > > > > <name>dfs.https.port</name> > > <value>50470</value> > > > > <name>dfs.https.address</name> > > <value>5.6.7.11:50470</value> > > > > <name>dfs.secondary.http.address</name> > > <value>5.6.7.12:50090</value> > > -------------------------------------- > > > > server 3: > > core-site.xml > > -------------------------------------- > > <name>fs.default.name</name> > > <value>hdfs://5.6.7.11:8020</value> > > -------------------------------------- > > > > hdfs-site.xml > > -------------------------------------- > > <name>dfs.datanode.address</name> > > <value>0.0.0.0:50010</value> > > > > <name>dfs.datanode.http.address</name> > > <value>0.0.0.0:50075</value> > > > > <name>dfs.http.address</name> > > <value>127.0.0.1:50070</value> > > > > <name>dfs.secondary.https.port</name> > > <value>50490</value> > > > > <name>dfs.https.port</name> > > <value>50470</value> > > > > <name>dfs.https.address</name> > > <value>127.0.0.1:50470</value> > > > > <name>dfs.secondary.http.address</name> > > <value>5.6.7.12:50090</value> > > -------------------------------------- > > > > netstat output: > > server 1 > > > > tcp 0 0 5.6.7.11:8020 0.0.0.0:* > LISTEN > > 10870/java > > > > tcp 0 0 5.6.7.11:50070 0.0.0.0:* > LISTEN > > 10870/java > > > > tcp 0 0 0.0.0.0:50010 0.0.0.0:* > LISTEN > > 10997/java > > > > tcp 0 0 0.0.0.0:50075 0.0.0.0:* > LISTEN > > 10997/java > > > > tcp 0 0 0.0.0.0:50020 0.0.0.0:* > LISTEN > > 10997/java > > > > > > server 2 > > > > tcp 0 0 0.0.0.0:50010 0.0.0.0:* > LISTEN > > 23683/java > > > > tcp 0 0 0.0.0.0:50075 0.0.0.0:* > LISTEN > > 23683/java > > > > tcp 0 0 0.0.0.0:50020 0.0.0.0:* > LISTEN > > 23683/java > > > > tcp 0 0 5.6.7.12:50090 0.0.0.0:* > LISTEN > > 23778/java > > > > > > server 3 > > > > tcp 0 0 0.0.0.0:50010 0.0.0.0:* > LISTEN > > 894/java > > > > tcp 0 0 0.0.0.0:50075 0.0.0.0:* > LISTEN > > 894/java > > > > tcp 0 0 0.0.0.0:50020 0.0.0.0:* > LISTEN > > 894/java > > > > > > if I'm transferring big files between servers I'm getting about 9Mb/s > > and even 10Mb/s with rsync > > > > On 10/09/12 11:56 PM, Harsh J wrote: > > > > Hi, > > > > > > OK, can you detail your network infrastructure used here, and also > > > > make sure your daemons are binding to the right interfaces as well > > > > (use netstat to check perhaps)? What rate of transfer do you get for > > > > simple file transfers (ftp, scp, etc.)? > > > > > > On Wed, Oct 10, 2012 at 12:24 PM, Alexey <alexx...@gmail.com> wrote: > > > > Hello Harsh, > > > > > > I notices such issues from the start. > > > > Yes, I mean dfs.balance.bandwidthPerSec property, I set this property to > > > > 5000000. > > > > > > On 10/09/12 11:50 PM, Harsh J wrote: > > > > Hey Alexey, > > > > > > Have you noticed this right from the start itself? Also, what exactly > > > > do you mean by "Limited replication bandwidth between datanodes - > > > > 5Mb." - Are you talking of dfs.balance.bandwidthPerSec property? > > > > > > On Wed, Oct 10, 2012 at 10:53 AM, Alexey <alexx...@gmail.com> wrote: > > > > Additional info: I also tried to use openjdk instead of sun's - issue > > > > still persists > > > > > > On 10/09/12 03:12 AM, Alexey wrote: > > > > Hi, > > > > > > I have an issues with hadoop dfs, I have 3 servers (24Gb RAM on each). > > > > The servers are not overloaded, they just have hadoop installed. One > > > > have datanode and namenode, second - datanode only, third - datanode and > > > > secondarynamenode. > > > > > > Hadoop datanodes have a max memory limit 8Gb. Default replication factor > > > > - 2. Limited replication bandwidth between datanodes - 5Mb. > > > > > > I've setupped hadoop to communicate between nodes by IP address. > > > > Everything is works - I can read/write files on each datanode, etc. But > > > > the issue is that hadoop dfs commands are executing very slow, even > > > > "hadoop dfs -ls /" takes about 3 seconds to execute, but it have only > > > > one folder /user in it. > > > > Files are also uploading to the hdfs very slow - hundreds > kilobytes/second. > > > > > > I'm using Debian stable x86-64 distribution and hadoop running through > > > > sun-java6-jdk 6.26-0squeeze1 > > > > > > Please give me any suggestions what I need to adjust/check to arrange > > > > this issue. > > > > > > As I said before - overall hdfs configuration is correct, because > > > > everything works except performance. > > > > > > -- > > > > Best regards > > > > Alexey > > > > > > > > -- > > > > Best regards > > > > Alexey > > > > > > > > > > > > -- > > > > Best regards > > > > Alexey > > > > > > > > > > > > -- > > Best regards > > Alexey > > > > >