inline... On Wed, Dec 7, 2011 at 3:14 PM, Andrei Savu <[email protected]> wrote:
> You are more than welcome! Thanks for adding WHIRR-445. > > I think it's best if you start by contributing fixes around your pain > points (e.g. 445, hive > as a service etc.) It makes a lot of sense to work on issues that directly > affect your research. > Will work on it :-) > > Can you elaborate on how are you planning to use Whirr and for what kind > of applications? > In the past, I have been involved in setting up hadoop clusters on raw machines..locally. Setting up clusters on EC2 is new to me. I am planning to use Whirr to primarily create Hadoop clusters. I plan to use Hive, Flume, Sqoop along with it. The application is about analytics on subscriber/ISP data. Will be using Mahout / R sooner or later. > > I am available to assist you as much as possible via the email list or IRC > on #whirr > Thanks. Is there a "jumpstart" guide that explains: - How/where to get the latest SVN code base - The recommeded way to build (ant/maven etc) - Basically, how to setup a local environment to run/test...etc. I have never done this before. I will also google around and try to find out. - After making a patch, what is the procedure to submit.. Thanks, Srini. > > Cheers, > > -- Andrei Savu > > On Thu, Dec 8, 2011 at 1:02 AM, Periya.Data <[email protected]> wrote: > >> Dear Andrei, >> Greetings. As you suggested, I created a Jira bug report on the >> JAVA_HOME stuff : https://issues.apache.org/jira/browse/WHIRR-445 >> >> I would like to contribute to Whirr (even though I am facing some initial >> problems). Maybe I can start with some documentation and fixing minor bugs. >> May need you assistance. >> >> Please let me know your thoughts. >> >> -Srini. (aka PD). >> >> >> On Wed, Dec 7, 2011 at 7:15 AM, Andrei Savu <[email protected]>wrote: >> >>> See inline. >>> >>> On Wed, Dec 7, 2011 at 7:14 AM, Periya.Data <[email protected]>wrote: >>> >>>> Thanks ! A few observations: >>>> >>>> - After I do export conf dir and execute "hadoop fs -ls /", I see a >>>> different dir structure from what I see when I ssh into the machine and >>>> execute it as root. See outputs below. >>>> >>>> sri@PeriyaData:~$ >>>> sri@PeriyaData:~$ export HADOOP_CONF_DIR=/\$HOME/.whirr/HadoopCluster/ >>>> sri@PeriyaData:~$ >>>> sri@PeriyaData:~$ hadoop fs -ls / >>>> Found 25 items >>>> -rw------- 1 root root 4767328 2011-11-02 12:55 /vmlinuz >>>> drwxr-xr-x - root root 12288 2011-12-03 10:49 /etc >>>> dr-xr-xr-x - root root 0 2011-12-02 03:28 /proc >>>> drwxrwxrwt - root root 4096 2011-12-05 18:07 /tmp >>>> drwxr-xr-x - root root 4096 2011-04-25 15:50 /srv >>>> -rw-r--r-- 1 root root 13631900 2011-11-01 22:46 /initrd.img.old >>>> drwx------ - root root 4096 2011-11-23 22:27 /root >>>> drwxr-xr-x - root root 4096 2011-04-21 09:50 /mnt >>>> drwxr-xr-x - root root 4096 2011-12-02 09:01 /var >>>> drwxr-xr-x - root root 4096 2011-10-01 19:14 /cdrom >>>> -rw------- 1 root root 4766528 2011-10-07 14:03 /vmlinuz.old >>>> drwxr-xr-x - root root 780 2011-12-02 16:28 /run >>>> drwxr-xr-x - root root 4096 2011-10-23 18:27 /usr >>>> drwx------ - root root 16384 2011-10-01 19:05 /lost+found >>>> drwxr-xr-x - root root 4096 2011-11-22 22:26 /bin >>>> drwxr-xr-x - root root 4096 2011-04-25 15:50 /opt >>>> drwxr-xr-x - root root 4096 2011-10-01 19:21 /home >>>> drwxr-xr-x - root root 4320 2011-12-02 11:29 /dev >>>> drwxr-xr-x - root root 4096 2011-03-21 01:26 /selinux >>>> drwxr-xr-x - root root 4096 2011-11-22 22:31 /boot >>>> drwxr-xr-x - root root 0 2011-12-02 03:28 /sys >>>> -rw-r--r-- 1 root root 13645361 2011-11-22 22:31 /initrd.img >>>> drwxr-xr-x - root root 4096 2011-11-22 22:28 /lib >>>> drwxr-xr-x - root root 4096 2011-12-03 10:49 /media >>>> drwxr-xr-x - root root 12288 2011-11-22 22:29 /sbin >>>> sri@PeriyaData:~$ >>>> sri@PeriyaData:~$ >>> >>> >>> This is no different from the output you get when running "ls -l /" and >>> this is happening because Hadoop >>> is not able to find the config file. Try: >>> >>> $ export HADOOP_CONF_DIR=~/.whirr/HadoopCluster/ >>> >>> When running "hadoop fs -ls /" you should get the same output as bellow. >>> >>> Note: make sure the SOCKS proxy is running. >>> >>> % . ~/.whirr/HadoopCluster/hadoop-proxy.sh >>> >>> >>> *After SSH-ing into the master node:* >>>> >>>> sri@ip-10-90-131-240:~$ sudo su >>>> root@ip-10-90-131-240:/home/users/sri# >>>> >>>> root@ip-10-90-131-240::/home/users/jtv# jps >>>> 2860 Jps >>>> 2667 JobTracker >>>> 2088 NameNode >>>> root@ip-10-90-131-240::/home/users/jtv# hadoop fs -ls / >>>> Error: JAVA_HOME is not set. >>>> root@ip-10-90-131-240::/home/users/jtv# >>>> >>>> *After editing (setting java home) in the .bashrc file and sourcing it >>>> , i get the expected dir structure:* >>>> >>>> root@ip-10-90-131-240:/home/users/sri# hadoop fs -ls / >>>> Found 3 items >>>> drwxr-xr-x - hadoop supergroup 0 2011-12-05 23:09 /hadoop >>>> drwxrwxrwx - hadoop supergroup 0 2011-12-05 23:08 /tmp >>>> drwxrwxrwx - hadoop supergroup 0 2011-12-06 01:16 /user >>>> root@ip-10-90-131-240:/home/users/sri# >>>> root@ip-10-90-131-240:/home/users/sri# >>>> >>>> Is the above normal behavior? >>>> >>> >>> It looks normal to me. I think you should be able to load data & run MR >>> jobs as expected. Can you open an issue >>> so that we can make sure that JAVA_HOME is exported as expected by the >>> install script? >>> >>> >>>> >>>> Thanks, >>>> PD/ >>>> >>>> >>>> >>>> *Questions:* >>>>>> >>>>>> 1. Assuming everything is fine, where does Hadoop gets installed >>>>>> on the EC2 instance? What is the path? >>>>>> >>>>>> >>>>> Run jps as root and you should see the daemons running. >>>>> >>>>>> >>>>>> 1. Even if Hadoop is successfully installed on the EC2 instance, >>>>>> are the env variables properly changed on that instance? Like, path >>>>>> must be >>>>>> updated either on its .bashrc or .bash_profile ...right? >>>>>> >>>>>> >>>>> Try to run "hadoop fs -ls /" as root. >>>>> >>>>> >>>> >>> >> >
