which versions of pig,nutch and hadoop are requeired to run at once

2009-08-15 Thread venkata ramanaiah anneboina
Hi I am using pig 2.0 and nutch 1.0; but it dont have common hadoop verion. what is common hadoop verion for both pig and hadoop; GIVE the pig version, nutch version and hadoo please can any one help on this thanks ramanaiah

Re: Nutch and Hadoop not working proper

2009-06-25 Thread MilleBii
Did another test and got this error: 2009-06-25 21:19:44,663 ERROR mapred.EagerTaskInitializationListener - Job initialization failed: java.lang.IllegalArgumentException: Pathname /d:/Bii/nutch/logs/history/user/_logs/history/localhost_1245956549829_job_200906252102_0001_pc-xxx%xxx_inject+urls fro

Re: Nutch and Hadoop not working proper

2009-06-25 Thread MilleBii
2009/6/24 Andrzej Bialecki > MilleBii wrote: > >> What's also i have discovered >> + hadoop (script) works with unix like paths and works fine on windows >> + nutch (script) works with Windows paths >> > > bin/nutch works with Windows paths? I think this could happen only by > accident - both scr

Re: Nutch and Hadoop not working proper

2009-06-24 Thread Andrzej Bialecki
MilleBii wrote: What's also i have discovered + hadoop (script) works with unix like paths and works fine on windows + nutch (script) works with Windows paths bin/nutch works with Windows paths? I think this could happen only by accident - both scripts work with Cygwin paths. On the other hand

Re: Nutch and Hadoop not working proper

2009-06-24 Thread MilleBii
What's also i have discovered + hadoop (script) works with unix like paths and works fine on windows + nutch (script) works with Windows paths Could it be that there is some incompatibility because one works unix like paths and not the other ??? 2009/6/24 MilleBii > Actually tried and it fail

Re: Nutch and Hadoop not working proper

2009-06-24 Thread MilleBii
Actually tried and it fails but this is what I found : bin/hadoop-config.sh does the conversion from relative to absolute path this="$0" while [ -h "$this" ]; do ls=`ls -ld "$this"` link=`expr "$ls" : '.*-> \(.*\)$'` if expr "$link" : '.*/.*' > /dev/null; then this="$link" else th

Re: Nutch and Hadoop not working proper

2009-06-24 Thread MilleBii
Yes I'm using both relative path & cygwin under windows. so /d: is not introduced by me, but either nutch or hadoop. Regarding the cygwin path you are righ... actually where I lost quite some time. OK will try absolute paths and let you know. -MilleBii- 2009/6/24 Andrzej Bialecki > MilleBii

Re: Nutch and Hadoop not working proper

2009-06-24 Thread Andrzej Bialecki
MilleBii wrote: HLPPP !!! Stuck for 3 days on not able to start any nutch job. hdfs works fine, ie I can put & look at files. When i start nutch crawl, I get the following error Job initialization failed: java.lang.IllegalArgumentException: Pathname /d:/Bii/nutch/logs/history/user/_log

Re: Nutch and Hadoop not working proper

2009-06-23 Thread MilleBii
HLPPP !!! Stuck for 3 days on not able to start any nutch job. hdfs works fine, ie I can put & look at files. When i start nutch crawl, I get the following error Job initialization failed: java.lang.IllegalArgumentException: Pathname /d:/Bii/nutch/logs/history/user/_logs/history/localhos

Re: Nutch and Hadoop not working proper

2009-06-21 Thread MilleBii
Looks like I just needed to transfer from the local filesystem to hdfs: Is it safe to transfer a crawl directory (and subs) from the local file system to hdfs and start crawling again ? 1. hadoop fs -put crawl crawl 2. nutch generate crawl/crawldb crawl/segments -topN 500 (where now it should use

Nutch and Hadoop not working proper

2009-06-21 Thread MilleBii
I have newly installed hadoop in a distributed single node configuration. When I run nutch commands it is looking for files my user home directory and not at the nutch directory ? How can I change this ? -- -MilleBii-

Job not finished on nutch and hadoop

2009-05-14 Thread Bartosz Gadzimski
Hello, Problem is partialy solved but I still write it :) Usuing bin/nutch commands (inject, generate, fetch etc.) is working. Only bin/nutch crawl is not -- I have successfully setup hadoop cluster on 6 nodes (1 namenode

Re: CRAWLING USING LATEST NUTCH AND HADOOP

2008-07-17 Thread brainstorm
https://issues.apache.org/jira/browse/NUTCH-637 2008/7/17 宫照 <[EMAIL PROTECTED]>: > Hi, > > I have the same problems. Because there are some bugs with hadoop-0.12.2,I > want to change to hadoop-0.17.0, but the api changed,we can't use it > directly. If your find the way to solve this problem. let

Re: CRAWLING USING LATEST NUTCH AND HADOOP

2008-07-16 Thread 宫照
Hi, I have the same problems. Because there are some bugs with hadoop-0.12.2,I want to change to hadoop-0.17.0, but the api changed,we can't use it directly. If your find the way to solve this problem. let me know. Regards, gong zhao 2008/7/15 kranthi reddy <[EMAIL PROTECTED]>: > Hi, > > I am

CRAWLING USING LATEST NUTCH AND HADOOP

2008-07-14 Thread kranthi reddy
Hi, I am using nutch-0.9 version and it has default hadoop-0.12.2 . Now hadoop-0.17.0 being the latest version i want my nutch to run using this hadoop . So i have replaced the hadoop-0.12.2.jar file in lib with hadoop-0.17.0.jar file. And as usual errors creep up with some functions/methods be

Re: Nutch and Hadoop

2008-02-12 Thread payo
message in context: http://www.nabble.com/Nutch-and-Hadoop-tp15136744p15436273.html Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Nutch and Hadoop

2008-02-11 Thread payo
this [EMAIL PROTECTED] search]$ ./bin/hadoop dfs -put urls urls put: Connection refused thanks -- View this message in context: http://www.nabble.com/Nutch-and-Hadoop-tp15136744p15414607.html Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Nutch and Hadoop

2008-02-08 Thread payo
./bin/hadoop dfs -put urls urls put: Connection refused what is the problem? thanks -- View this message in context: http://www.nabble.com/Nutch-and-Hadoop-tp15136744p15365523.html Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Nutch and Hadoop

2008-02-07 Thread Barry Haddow
Hi You need to add your public key to the .ssh/authorized_keys on the master as well as the slave. Also, make sure that this file is not writable by anyone else but you. regards Barry On Thursday 07 February 2008, payo wrote: > i created my ssh keys and i can login over ssh without being pro

Re: Nutch and Hadoop

2008-02-07 Thread payo
-emcvaalkm01.estafeta.com.out [EMAIL PROTECTED]'s password: what is the problem -- View this message in context: http://www.nabble.com/Nutch-and-Hadoop-tp15136744p15343807.html Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Nutch and Hadoop

2008-02-05 Thread payo
-and-Hadoop-tp15136744p15300083.html Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Nutch and Hadoop

2008-02-05 Thread payo
i resolved the problem!! i change in conf/context.xsl by this is correct? i read this http://www.openrdf.org/doc/sesame/users/ch09.html#d0e3707 -- View this message in context: http://www.nabble.com/Nutch-and-Hadoop-tp15136744p15296614.html Sent from the Nutch - User mailing list

Re: Nutch and Hadoop

2008-02-05 Thread payo
-- View this message in context: http://www.nabble.com/Nutch-and-Hadoop-tp15136744p15295369.html Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Nutch and Hadoop

2008-02-01 Thread payo
i am trying configure nutch and hadoop in two pc, but i have questions: 1.- i have install nutch in the two pcs or only in the master node? 2.- hadoop helpme to reduce times in my crawl? 3.- Only i need create keys for communication with my pcs thanks -- View this message in context: http

Re: Nutch and Hadoop

2008-01-28 Thread John Mendenhall
> i am working with nutch-0.8.1 and i am trying configure hadoop but my > questions are: > > -in the directory bin exist the files: > > hadoop, hadoop-daemon, hadoop-daemons, nutch, rcc, slaves, start-all, > start-dfs, start-mapred, stop-all, stop-dfs, stop-mapred > > this files are necesary f

Nutch and Hadoop

2008-01-28 Thread payo
with hadoop? my base is http://wiki.apache.org/nutch/NutchHadoopTutorial thanks or i have download hadoop and make install? -- View this message in context: http://www.nabble.com/Nutch-and-Hadoop-tp15136744p15136744.html Sent from the Nutch - User mailing list archive at Nabble.com.

Support Hardware and OS for nutch and hadoop

2008-01-04 Thread Developer Developer
Hello Frens, I am gathering information on supoorted hardware and OS for nutch and hadoop . I did not find any conclusive information by going thru Nutch wiki. If I want to build a cluster of nodes using nutch/hadoop for crawling then what are my options for H/W and OS ?

Re: Possible public applications with nutch and hadoop

2007-10-18 Thread xu xiong
> I can only conclude that the way to succeed as a search startup is to > CRAWL DIFFERENTLY. Focus on websites in specific regions, specific > topics, specific data types. Crawl into the corners of websites that > contain interesting nuggets of data (listings, calendars, etc) that > won't ever have

Re: Possible public applications with nutch and hadoop

2007-10-16 Thread Matt Kangas
Andrzej, before I dive into your specific questions... I want to step back to the original topic: what applications are possible with Nutch? The specialization that I focused on was a _listings_ crawler. There are any number of listings types that one could potentially crawl: - events (what

Re: Possible public applications with nutch and hadoop

2007-10-16 Thread Andrzej Bialecki
Matt Kangas wrote: Hi Andrzej (and everyone else), A few weeks ago, I intended to chime in on your "Scoring API issues" thread, but this new thread is perhaps an even better place to speak up. Time to stop lurking and contribute. :) Thanks a lot for sharing your thoughts. Your post touches a

Re: Possible public applications with nutch and hadoop

2007-10-15 Thread Matt Kangas
Hi Andrzej (and everyone else), A few weeks ago, I intended to chime in on your "Scoring API issues" thread, but this new thread is perhaps an even better place to speak up. Time to stop lurking and contribute. :) First, I want echo Stefan Groschupf's comment several months ago that the N

Re: Possible public applications with nutch and hadoop

2007-10-15 Thread Andrzej Bialecki
Berlin Brown wrote: Yea, you are right. You have to have a constrained set of domains to search and to be honest, that works pretty well. The only thing, I still get a lot of junk links. I would say that 30% are valid or interesting links while the other is kind of worthless. I guess it is a

Re: Possible public applications with nutch and hadoop

2007-10-14 Thread Berlin Brown
Yea, you are right. You have to have a constrained set of domains to search and to be honest, that works pretty well. The only thing, I still get a lot of junk links. I would say that 30% are valid or interesting links while the other is kind of worthless. I guess it is a matter of studying spa

Re: Possible public applications with nutch and hadoop

2007-10-13 Thread Pike
Hi > My question; have you build a general site to crawl the internet and > how did you find links that people would be interested in as opposed > to capturing a lot of the junk out there. interesting question. are you planning to build a new google ? if you are planning to crawl without any limi

Possible public applications with nutch and hadoop

2007-10-13 Thread Berlin Brown
I really like the concept of nutch and hadoop but I haven't been able to build an application with them. Most of the apps I like building are targetted at the public, anyone on the internet. I built a crawler of top sites like the NYtimes and Slate but I couldn't filter out the sites

Re: Separating nutch and hadoop configurations.

2007-07-11 Thread Briggs
asn't hard at all. Though, I needed to replace hadoop-12.whatever.jar to the lastest within the nutch build. It seems to be working. yay. Thanks. On 7/11/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: Briggs wrote: > I am currently trying to figure out how to deploy Nutch and Hado

Re: Separating nutch and hadoop configurations.

2007-07-11 Thread Andrzej Bialecki
Briggs wrote: I am currently trying to figure out how to deploy Nutch and Hadoop separately. I want to configure Hadoop outside of Nutch and have Nutch use that service, rather than configuring hadoop within nutch. I would think all that Nutch should need to know is the urls to connect to

Separating nutch and hadoop configurations.

2007-07-11 Thread Briggs
I am currently trying to figure out how to deploy Nutch and Hadoop separately. I want to configure Hadoop outside of Nutch and have Nutch use that service, rather than configuring hadoop within nutch. I would think all that Nutch should need to know is the urls to connect to Hadoop, but can&#

Re: nutch and hadoop: can't launch properly the name node

2007-05-03 Thread cybercouf
02 17:27:58,922 INFO conf.Configuration - parsing >> file:/home/nutch/search/conf/hadoop-site.xml >> 2007-05-02 17:27:58,989 WARN mapred.JobTracker - Starting tracker >> java.net.ConnectException: Connection refused >> at java.net.PlainSocketImpl.socketConnect(Native Method) &

Re: nutch and hadoop: can't launch properly the name node

2007-05-02 Thread Dennis Kubes
Is your hadoop jar in the lib directory named "hadoop-0.4.0-patched.jar!" with the exclamation point? If it is, that may be causing the error. Also let me know if you can ping the namenode from any of the data nodes. Dennis Kubes cybercouf wrote: I tried both with "localhost" or "myhostnam

Re: nutch and hadoop: can't launch properly the name node

2007-05-02 Thread cybercouf
What errors are you seeing in your hadoop-namenode and datanode logs? Dennis Kubes -- View this message in context: http://www.nabble.com/nutch-and-hadoop%3A-can%27t-launch-properly-the-name-node-tf3680311.html#a10289878 Sent from the Nutch - User mailing list archive at Nabble.com.

Re: nutch and hadoop: can't launch properly the name node

2007-05-02 Thread Dennis Kubes
What errors are you seeing in your hadoop-namenode and datanode logs? Dennis Kubes cybercouf wrote: Yes it is. Here more details: $ cat /etc/hosts 127.0.0.1 localhost 84.x.x.xmyhostname.mydomain.com myhostname # ping myhostname PING myhostname.mydomain.com (84.x.x.x) 56(84) bytes o

Re: nutch and hadoop: can't launch properly the name node

2007-05-02 Thread cybercouf
looks like the namenode is running (when I stop it I have the >> message >> "stopping namenode"), but why I can't access it ? (is this ip from the >> log >> correct? 0.0.0.0:50070) >> all is on the same machine, and my conf file looks ok: >> fs.default.

Re: nutch and hadoop: can't launch properly the name node

2007-05-02 Thread Dennis Kubes
Make sure your hosts file on your namenode is setup correctly: 127.0.0.1 localhost.localdomain localhost 10.x.x.xmyhostname.mydomain.com myhostname As opposed to: 127.0.0.1 localhost.localdomain localhost myhostname.mydomain.com myhostname The prob

nutch and hadoop: can't launch properly the name node

2007-05-02 Thread cybercouf
mapred.local.dir /home/nutch/filesystem/mapreduce/local dfs.replication 1 -- View this message in context: http://www.nabble.com/nutch-and-hadoop%3A-can%27t-launch-properly-the-name-node-tf3680311.html#a10285097 Sent from the Nutch - User mailing list archive at Nabble.com.

RE: Nutch and Hadoop Tutorial Finished

2006-03-20 Thread Dennis Kubes
It is under NutchHadoopTutorial under Nutch Administration. Dennis -Original Message- From: Chris Mattmann [mailto:[EMAIL PROTECTED] Sent: Monday, March 20, 2006 5:21 PM To: nutch-user@lucene.apache.org Subject: Re: Nutch and Hadoop Tutorial Finished Hi Dennis, Thanks for your hard

Re: Nutch and Hadoop Tutorial Finished

2006-03-20 Thread Chris Mattmann
inal Message- > From: Vanderdray, Jacob [mailto:[EMAIL PROTECTED] > Sent: Monday, March 20, 2006 12:49 PM > To: nutch-user@lucene.apache.org > Subject: RE: Nutch and Hadoop Tutorial Finished > > Sorry. Go to http://wiki.apache.org/nutch/ and click on the > "login&qu

RE: Nutch and Hadoop Tutorial Finished

2006-03-20 Thread Dennis Kubes
The NutchHadoop tutorial is now up on the wiki. Dennis -Original Message- From: Vanderdray, Jacob [mailto:[EMAIL PROTECTED] Sent: Monday, March 20, 2006 12:49 PM To: nutch-user@lucene.apache.org Subject: RE: Nutch and Hadoop Tutorial Finished Sorry. Go to http

RE: Nutch and Hadoop Tutorial Finished

2006-03-20 Thread Dennis Kubes
I will add in your changes and then put it up on the wiki. Dennis -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Monday, March 20, 2006 2:41 PM To: nutch-user@lucene.apache.org Subject: Re: Nutch and Hadoop Tutorial Finished Dennis Kubes wrote: > Here it is

Re: Nutch and Hadoop Tutorial Finished

2006-03-20 Thread Doug Cutting
Dennis Kubes wrote: Here it is for the list, I will try to put it on the wiki as well. Thanks for writing this! I've added a few comments below. Some things are assumed for this tutorial. First, you will need root level access to all of the boxes you are deploying to. Root access should n

Re: Nutch and Hadoop Tutorial Finished

2006-03-20 Thread Michael Plax
Dennis, Thank you very very much for tutorial. Michael - Original Message - From: "Dennis Kubes" <[EMAIL PROTECTED]> To: Sent: Monday, March 20, 2006 10:46 AM Subject: RE: Nutch and Hadoop Tutorial Finished Here it is for the list, I will try to put it on the wiki

RE: Nutch and Hadoop Tutorial Finished

2006-03-20 Thread Dennis Kubes
Here it is for the list, I will try to put it on the wiki as well. Dennis How to Setup Nutch and Hadoop After searching the web and mailing lists, it seems that there is very little information on how to setup

RE: Nutch and Hadoop Tutorial Finished

2006-03-20 Thread Vanderdray, Jacob
If you have any trouble, just shout. Jake. -Original Message- From: Dennis Kubes [mailto:[EMAIL PROTECTED] Sent: Monday, March 20, 2006 1:37 PM To: nutch-user@lucene.apache.org Subject: RE: Nutch and Hadoop Tutorial Finished Not to act dumb, but how do I add it to the wiki? Denni

RE: Nutch and Hadoop Tutorial Finished

2006-03-20 Thread Dennis Kubes
Not to act dumb, but how do I add it to the wiki? Dennis -Original Message- From: Vanderdray, Jacob [mailto:[EMAIL PROTECTED] Sent: Monday, March 20, 2006 12:20 PM To: nutch-user@lucene.apache.org Subject: RE: Nutch and Hadoop Tutorial Finished Dennis, How 'bout the

Re: Nutch and Hadoop Tutorial Finished

2006-03-20 Thread NamNH
May you send me one? Thank you. On 3/21/06, Dennis Kubes <[EMAIL PROTECTED]> wrote: > > All, > > I have finished a lengthy tutorial on how to setup a distributed > implementation of nutch and hadoop. Should I post it on this list or is > there a better place for it? &

RE: Nutch and Hadoop Tutorial Finished

2006-03-20 Thread Vanderdray, Jacob
Dennis, How 'bout the wiki. Jake. -Original Message- From: Dennis Kubes [mailto:[EMAIL PROTECTED] Sent: Monday, March 20, 2006 1:01 PM To: nutch-user@lucene.apache.org Subject: Nutch and Hadoop Tutorial Finished All, I have finished a lengthy tutorial on how to se

Nutch and Hadoop Tutorial Finished

2006-03-20 Thread Dennis Kubes
All, I have finished a lengthy tutorial on how to setup a distributed implementation of nutch and hadoop. Should I post it on this list or is there a better place for it? Dennis