Re: contribution to Whirr...

Periya.Data Wed, 07 Dec 2011 17:07:05 -0800

inline...

On Wed, Dec 7, 2011 at 3:14 PM, Andrei Savu <[email protected]> wrote:


> You are more than welcome! Thanks for adding WHIRR-445.
>
> I think it's best if you start by contributing fixes around your pain
> points (e.g. 445, hive
> as a service etc.) It makes a lot of sense to work on issues that directly
> affect your research.
>

Will work on it :-)


>
> Can you elaborate on how are you planning to use Whirr and for what kind
> of applications?
>

In the past, I have been involved in setting up hadoop clusters on raw
machines..locally. Setting up clusters on EC2 is new to me. I am planning
to use Whirr to primarily create Hadoop clusters. I plan to use Hive,
Flume, Sqoop along with it. The application is about analytics on
subscriber/ISP data. Will be using Mahout / R sooner or later.


>
> I am available to assist you as much as possible via the email list or IRC
> on #whirr
>

Thanks. Is there a "jumpstart" guide that explains:
- How/where to get the latest SVN code base
- The recommeded way to build (ant/maven etc)
- Basically, how to setup a local environment to run/test...etc. I have
never done this before. I will also google around and try to find out.
- After making a patch, what is the procedure to submit..


Thanks,
Srini.



>
> Cheers,
>
> -- Andrei Savu
>
> On Thu, Dec 8, 2011 at 1:02 AM, Periya.Data <[email protected]> wrote:
>
>> Dear Andrei,
>>    Greetings. As you suggested, I created a Jira bug report on the
>> JAVA_HOME stuff : https://issues.apache.org/jira/browse/WHIRR-445
>>
>> I would like to contribute to Whirr (even though I am facing some initial
>> problems). Maybe I can start with some documentation and fixing minor bugs.
>> May need you assistance.
>>
>> Please let me know your thoughts.
>>
>> -Srini. (aka PD).
>>
>>
>> On Wed, Dec 7, 2011 at 7:15 AM, Andrei Savu <[email protected]>wrote:
>>
>>> See inline.
>>>
>>> On Wed, Dec 7, 2011 at 7:14 AM, Periya.Data <[email protected]>wrote:
>>>
>>>> Thanks ! A few observations:
>>>>
>>>>    - After I do export conf dir and execute "hadoop fs -ls /", I see a
>>>>    different dir structure from what I see when I ssh into the machine and
>>>>    execute it as root. See outputs below.
>>>>
>>>> sri@PeriyaData:~$
>>>> sri@PeriyaData:~$ export HADOOP_CONF_DIR=/\$HOME/.whirr/HadoopCluster/
>>>> sri@PeriyaData:~$
>>>> sri@PeriyaData:~$ hadoop fs -ls /
>>>> Found 25 items
>>>> -rw-------   1 root root    4767328 2011-11-02 12:55 /vmlinuz
>>>> drwxr-xr-x   - root root      12288 2011-12-03 10:49 /etc
>>>> dr-xr-xr-x   - root root          0 2011-12-02 03:28 /proc
>>>> drwxrwxrwt   - root root       4096 2011-12-05 18:07 /tmp
>>>> drwxr-xr-x   - root root       4096 2011-04-25 15:50 /srv
>>>> -rw-r--r--   1 root root   13631900 2011-11-01 22:46 /initrd.img.old
>>>> drwx------   - root root       4096 2011-11-23 22:27 /root
>>>> drwxr-xr-x   - root root       4096 2011-04-21 09:50 /mnt
>>>> drwxr-xr-x   - root root       4096 2011-12-02 09:01 /var
>>>> drwxr-xr-x   - root root       4096 2011-10-01 19:14 /cdrom
>>>> -rw-------   1 root root    4766528 2011-10-07 14:03 /vmlinuz.old
>>>> drwxr-xr-x   - root root        780 2011-12-02 16:28 /run
>>>> drwxr-xr-x   - root root       4096 2011-10-23 18:27 /usr
>>>> drwx------   - root root      16384 2011-10-01 19:05 /lost+found
>>>> drwxr-xr-x   - root root       4096 2011-11-22 22:26 /bin
>>>> drwxr-xr-x   - root root       4096 2011-04-25 15:50 /opt
>>>> drwxr-xr-x   - root root       4096 2011-10-01 19:21 /home
>>>> drwxr-xr-x   - root root       4320 2011-12-02 11:29 /dev
>>>> drwxr-xr-x   - root root       4096 2011-03-21 01:26 /selinux
>>>> drwxr-xr-x   - root root       4096 2011-11-22 22:31 /boot
>>>> drwxr-xr-x   - root root          0 2011-12-02 03:28 /sys
>>>> -rw-r--r--   1 root root   13645361 2011-11-22 22:31 /initrd.img
>>>> drwxr-xr-x   - root root       4096 2011-11-22 22:28 /lib
>>>> drwxr-xr-x   - root root       4096 2011-12-03 10:49 /media
>>>> drwxr-xr-x   - root root      12288 2011-11-22 22:29 /sbin
>>>> sri@PeriyaData:~$
>>>> sri@PeriyaData:~$
>>>
>>>
>>> This is no different from the output you get when running "ls -l /" and
>>> this is happening because Hadoop
>>> is not able to find the config file. Try:
>>>
>>> $ export HADOOP_CONF_DIR=~/.whirr/HadoopCluster/
>>>
>>> When running "hadoop fs -ls /" you should get the same output as bellow.
>>>
>>> Note: make sure the SOCKS proxy is running.
>>>
>>> % . ~/.whirr/HadoopCluster/hadoop-proxy.sh
>>>
>>>
>>> *After SSH-ing into the master node:*
>>>>
>>>> sri@ip-10-90-131-240:~$ sudo su
>>>> root@ip-10-90-131-240:/home/users/sri#
>>>>
>>>> root@ip-10-90-131-240::/home/users/jtv# jps
>>>> 2860 Jps
>>>> 2667 JobTracker
>>>> 2088 NameNode
>>>> root@ip-10-90-131-240::/home/users/jtv# hadoop fs -ls /
>>>> Error: JAVA_HOME is not set.
>>>> root@ip-10-90-131-240::/home/users/jtv#
>>>>
>>>> *After editing (setting java home) in the .bashrc file and sourcing it
>>>> , i get the expected dir structure:*
>>>>
>>>> root@ip-10-90-131-240:/home/users/sri# hadoop fs -ls /
>>>> Found 3 items
>>>> drwxr-xr-x   - hadoop supergroup          0 2011-12-05 23:09 /hadoop
>>>> drwxrwxrwx   - hadoop supergroup          0 2011-12-05 23:08 /tmp
>>>> drwxrwxrwx   - hadoop supergroup          0 2011-12-06 01:16 /user
>>>> root@ip-10-90-131-240:/home/users/sri#
>>>> root@ip-10-90-131-240:/home/users/sri#
>>>>
>>>> Is the above normal behavior?
>>>>
>>>
>>> It looks normal to me. I think you should be able to load data & run MR
>>> jobs as expected. Can you open an issue
>>> so that we can make sure that JAVA_HOME is exported as expected by the
>>> install script?
>>>
>>>
>>>>
>>>> Thanks,
>>>> PD/
>>>>
>>>>
>>>>
>>>>  *Questions:*
>>>>>>
>>>>>>    1. Assuming everything is fine, where does Hadoop gets installed
>>>>>>    on the EC2 instance? What is the path?
>>>>>>
>>>>>>
>>>>> Run jps as root and you should see the daemons running.
>>>>>
>>>>>>
>>>>>>    1. Even if Hadoop is successfully installed on the EC2 instance,
>>>>>>    are the env variables properly changed on that instance? Like, path 
>>>>>> must be
>>>>>>    updated either on its .bashrc or .bash_profile ...right?
>>>>>>
>>>>>>
>>>>> Try to run "hadoop fs -ls /" as root.
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: contribution to Whirr...

Reply via email to