Re: Nutch hadoop integration

2012-06-08 Thread Nitin Pawar
may be this will help you if you have not already checked it http://wiki.apache.org/nutch/NutchHadoopTutorial On Fri, Jun 8, 2012 at 1:29 PM, abhishek tiwari < abhishektiwari.u...@gmail.com> wrote: > how can i integrate hadood and nutch ..anyone please brief me . > -- Nitin Pawar

Re: Nutch hadoop integration

2012-06-08 Thread Biju Balakrishnan
> how can i integrate hadood and nutch ..anyone please brief me . > Just configure hadoop cluster. Configure nutch path to store the nuth crawl index and crawl list to hdfs. Thats it. -- *Biju*

Hadoop-Git-Eclipse

2012-06-08 Thread Prajakta Kalmegh
Hi I have done MapReduce programming using Eclipse before but now I need to learn the Hadoop code internals for one of my projects. I have forked Hadoop from github (https://github.com/apache/hadoop-common ) and need to configure it to work with Eclipse. All the links I could find list steps

Re: Nutch hadoop integration

2012-06-08 Thread shashwat shriparv
Check out these links : http://wiki.apache.org/nutch/NutchHadoopTutorial http://wiki.apache.org/nutch/NutchTutorial http://joey.mazzarelli.com/2007/07/25/nutch-and-hadoop-as-user-with-nfs/ http://stackoverflow.com/questions/5301883/run-nutch-on-existing-hadoop-cluster Regards ∞ Shashwat Shripar

Re: Hadoop-Git-Eclipse

2012-06-08 Thread shashwat shriparv
Check out this link: http://www.cloudera.com/blog/2009/04/configuring-eclipse-for-hadoop-development-a-screencast/ Regards ∞ Shashwat Shriparv On Fri, Jun 8, 2012 at 1:32 PM, Prajakta Kalmegh wrote: > Hi > > I have done MapReduce programming using Eclipse before but now I need to > learn the

Hadoop command not found:hdfs and yarn

2012-06-08 Thread Prajakta Kalmegh
Hi I am trying to execute the following commands for setting up Hadoop: # Format the namenode hdfs namenode -format # Start the namenode hdfs namenode # Start a datanode hdfs datanode yarn resourcemanager yarn nodemanager It gives me a "Hadoop Command not found." error for all the commands. When

Re: Nutch hadoop integration

2012-06-08 Thread abhishek tiwari
http://wiki.apache.org/nutch/NutchHadoopTutorial above tutorial is not working for me .. i am using nutch 1.4 .. can u give the steps.. what property i have to set in nutch-site.xml On Fri, Jun 8, 2012 at 1:34 PM, shashwat shriparv wrote: > Check out these links : > > http://wiki.apache.org/nut

AUTO: Prabhat Pandey is out of the office (returning 06/28/2012)

2012-06-08 Thread Prabhat Pandey
I am out of the office until 06/28/2012. I am out of the office until 06/28/2012. For any issues please contact Dispatcher: dbqor...@us.ibm.com Thanks. Prabhat Pandey Note: This is an automated response to your message "Nutch hadoop integration" sent on 06/08/2012 1:59:22. This is the only

Re: Hadoop command not found:hdfs and yarn

2012-06-08 Thread Jagat Singh
Hello , Can you quickly review your hadoop install with below page may be you get some hints to install. http://jugnu-life.blogspot.in/2012/05/hadoop-20-install-tutorial-023x.html The depreciated warning is correct as hadoop jobs have been divided now. Regards, Jagat Singh On Fri, Jun 8, 2012

InvalidJobConfException

2012-06-08 Thread huanchen.zhang
Hi, Here I'm developing a MapReduce web crawler which reads url lists and writes html to MongoDB. So, each map read one url list file, get the html and insert to MongoDB. There is no reduce and no output of map. So, how to set the output directory in this case? If I do not set the output direct

Re: InvalidJobConfException

2012-06-08 Thread Harsh J
Hi Huanchen, Just set your output format class to NullOutputFormat http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/lib/output/NullOutputFormat.html if you don't need any direct outputs to HDFS/etc. from your M/R classes. On Fri, Jun 8, 2012 at 4:16 PM, huanchen.zhang

RE: InvalidJobConfException

2012-06-08 Thread Devaraj k
By default it uses the TextOutputFomat(subclass of FileOutputFormat) which checks for output path. You can use NullOuputFormat or your custom output format which doesn't do any thing for your job. Thanks Devaraj From: huanchen.zhang [huanchen.zh...@i

Re: Hadoop-Git-Eclipse

2012-06-08 Thread Deniz Demir
I did not find that screencast useful. This one worked for me: http://wiki.apache.org/hadoop/EclipseEnvironment Best, Deniz On Jun 8, 2012, at 1:08 AM, shashwat shriparv wrote: > Check out this link: > http://www.cloudera.com/blog/2009/04/configuring-eclipse-for-hadoop-development-a-screencast/

Re: Hadoop-Git-Eclipse

2012-06-08 Thread Prajakta Kalmegh
Hi Yes I did configure using the wiki link at http://wiki.apache.org/hadoop/EclipseEnvironment. I am facing a new problem while setting up Hadoop in Psuedo-distributed mode on my laptop. I am trying to execute the following commands for setting up Hadoop: hdfs namenode -format hdfs namenode hdfs

Re: Hadoop-Git-Eclipse

2012-06-08 Thread shashwat shriparv
Check out these thread : http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/22976 http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201012.mbox/%3c4cff292d.3090...@corp.mail.ru%3E On Fri, Jun 8, 2012 at 6:24 PM, Prajakta Kalmegh wrote: > Hi > > Yes I did configure using

decommissioning datanodes

2012-06-08 Thread Chris Grier
Hello, I'm in the trying to figure out how to decommission data nodes. Here's what I do: In hdfs-site.xml I have: dfs.hosts.exclude /opt/hadoop/hadoop-1.0.0/conf/exclude Add to exclude file: host1 host2 Then I run 'hadoop dfsadmin -refreshNodes'. On the web interface the two nodes n

Re: decommissioning datanodes

2012-06-08 Thread Serge Blazhiyevskyy
Your nodes need to be in include and exclude file in the same time Do you use both files? On 6/8/12 11:46 AM, "Chris Grier" wrote: >Hello, > >I'm in the trying to figure out how to decommission data nodes. Here's >what >I do: > >In hdfs-site.xml I have: > > >dfs.hosts.exclude >/opt/had

Re: decommissioning datanodes

2012-06-08 Thread Chris Grier
Do you mean the file specified by the 'dfs.hosts' parameter? That is not currently set in my configuration (the hosts are only specified in the slaves file). -Chris On Fri, Jun 8, 2012 at 11:56 AM, Serge Blazhiyevskyy < serge.blazhiyevs...@nice.com> wrote: > Your nodes need to be in include and

Re: decommissioning datanodes

2012-06-08 Thread Serge Blazhiyevskyy
Your config should be something like this: > >dfs.hosts.exclude >/opt/hadoop/hadoop-1.0.0/conf/exclude > > >dfs.hosts.include >/opt/hadoop/hadoop-1.0.0/conf/include > > >Add to exclude file: > >host1 >host2 > Add to include file >host1 >host2 Plus the rest of the nodes O

Re: decommissioning datanodes

2012-06-08 Thread Chris Grier
Thanks, this seems to work now. Note that the parameter is 'dfs.hosts' instead of 'dfs.hosts.include'. (Also, the normal caveats like hostnames are case sensitive). -Chris On Fri, Jun 8, 2012 at 12:19 PM, Serge Blazhiyevskyy < serge.blazhiyevs...@nice.com> wrote: > Your config should be somethi

hbase client security (cluster is secure)

2012-06-08 Thread Tony Dean
Hi all, I have created a hadoop/hbase/zookeeper cluster that is secured and verified. Now a simple test is to connect an hbase client (e.g, shell) to see its behavior. Well, I get the following message on the hbase master: AccessControlException: authentication is required. Looking at the co

Sync and Data Replication

2012-06-08 Thread Mohit Anchlia
I am wondering the role of sync in replication of data to other nodes. Say client writes a line to a file in Hadoop, at this point file handle is open and sync has not been called. In this scenario is data also replicated as defined by the replication factor to other nodes as well? I am wondering i

memory usage tasks

2012-06-08 Thread Koert Kuipers
silly question, but i have our hadoop slave boxes configured with 7 mappers each, yet i see java 14 process for user mapred on each box. and each process takes up about 2GB, which is equals to my memory allocation (mapred.child.java.opts=-Xmx2048m). so it is using twice as much memory as i expected

Compile Hadoop 1.0.3 native library failed on mac 10.7.4

2012-06-08 Thread Yongwei Xing
Hello I am trying to compile the hadoop native library on mac os. My Mac OS X is 10.7.4. My Hadoop is 1.0.3 I have installed the zlib 1.2.7 and lzo 2.0.6 like below: ./configure -shared --prefix=/usr/local/[zlib/lzo] make make install I check the /usr/local/zlib-1.2.7 and /usr/local/lzo-2.0